Convolutional Neural Network based Image Classification for Google Bangkit's Final Project Assignment
This project aims to classify images of people wearing and not wearing mask using a TF-based CNN model.
- Python 3.X
- TensorFlow 2.X
- The dataset
Dataset taken from Mask Dataset V1 with total of 1100 images.
The dataset is image dataset of people in close-up photo with different poses, background, race, gender, and age.
Train-Validation set contains two labels:
- Mask
- No Mask
The dataset is pre-conditioned (already separated into folders based on its class and into training set and validation set), structured like this:
- Train (750 images)
- Mask (350 images)
- No_mask (400 images)
- Validation (350 images)
- Mask (150 images)
- No_mask (200 images)
Mask_NoMask_Classification.ipynb is the baseline model.
For the baseline model, we use Convolutional Neural Network with combination of convolutional, pooling, and dense layers, specified as follows:
model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(16, (3,3), activation='relu', input_shape=(150, 150, 3)),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])We also use the ImageDataGenerator function from tensorflow.keras.preprocessing.image module to augment the training set. The augmentation that we apply to the training set are: rotation, shear, zoom, flip, and width & height shift.
When training the baseline model, we get the graph of the metrics as follows:

Even though the metrics value is good, the validation accuracy is higher than the training accuracy. This is probably because of "easier" images in the validation set than that of the training set (source).
The prediction performance is also bad:
mobilevnet_maskprediction_model.ipynb is the improved model.
To overcome the "easier images" problem in the validation set, we re-shuffle the images to anticipate bias in the pre-conditioned dataset.
We do the re-shuffling procedures as follows:
- Mix all the Mask images from the pre-defined training and validation folder. Same procedure is applied to the No Mask images.
- Shuffle both Mask images and No Mask images.
- Re-distribute the shuffled Mask and No Mask images to training and validation folder.
To improve the prediction performance, we adopt the pre-trained model from MobileNetV2 to our model. More on the transfer learning using TensorFlow can be accessed through here.
Here are the graph of the metrics and the prediction performance of the improved model:

We serve the model as an Android app to make it easy to utilize and available on mobile device. The TFLite apps that we used as a template can be obtained through here.
To deploy the model, first the model is converted to TFLite using Python API. The .tflite file produced from the converting process then placed into the asset folder along with the label in .txt format.
The feature of the Android app:
- Live Camera object classifier
Activating camera and automatically taking picture if it detect a human face, then classify whether the person is wearing a mask or not.
Examples:


- Static Image object classifier
Open the gallery to select image (of human face), then classify whether the person is wearing a mask or not.
Examples:


Presented in 2020, by:

