Testing a CNN in a subset of the SIGNS dataset
Using batch normalization and data augmentation
The 1200 samples were divided as follows:
- Training: 864 samples
- Validation: 216 samples
- Test: 120 samples
The Training and Validation sets are used during the hyper parameter Tuning of the model, and the Test set only for the final test of the model accuracy.
Python 3 was utilized as the programming language, the TensorFlow 2.0 library for creating the models and the KerasTuner for the hyper parameter Tuning. Other libraries were used as support, like the numpy, matplotlib and sklearn.
The ImageDataGenerator class was used to generate more samples for the training, you can learn more about it here.
The main structure of the model with convolution is the following, using a sequential approach:
- 1ª Layer: Convolution Layer
- 2ª Layer: Batch Normalization Layer
- 3ª Layer: Pooling Layer
- 4ª Layer: Convolution Layer
- 5ª Layer: Batch Normalization Layer
- 6ª Layer: Pooling Layer
- 7ª Layer: Flatten Layer
- 8ª Layer: Dense Layer
- 9ª Layer: Dense Layer
The Layers from 1 to 8 use the ReLu as activation function while the 9ª Layer use the softmax function as this is a multi classification problem. The accuracy and loss metrics used were, respectively, CategoricalCrossentropy and CategoricalAccuracy present in the tensorflow 2.0.
The following hyper parameters and they possibles options were considered:
- Size of the kernel used in the first convolution layer: 2 to 5
- Number of units in the first convolution layer (filters to ‘find’): 16 to 32 units, with an increment of 8 per try
- Size of pool used in the first pooling layer: 2 or 3
- Size of the kernel used in the second convolution layer: 2 or 3
- Number of units in the second convolution layer (filters to ‘find’): 8 to 16 units, with an increment of 4 per try
- Size of pool used in the second pooling layer: 2 or 3
- Number of units in the dense layer: 64 to 128 units, with an increment of 32
- Learning rate: minimum of 110–4, maximum of 110–2, with a logarithmic scale
- Beta 1: 0.90 to 0.99, with an increment of 0.01
- Beta 2: 0.95 to 0.99, with an increment of 0.01
To execute the Hyper Parameter Tuning was used a BayesianOptimization from the KerasTuner library, a total of a hundred trials were executed, with one execution per trial, an alpha of 0.5 and beta of 0.7 were used to create the Bayesian Optimizer.
For the search of the best hyperparameters, a hundred epochs of training on each model were used with a batch size of 32. Here the class for generating new images was called, to create new samples for training in the hyperparameter search, the same batch size was used here 32.
After the end of the execution, the best hyper parameters found were:
- ‘kernel_conv_1’: 3
- ‘Initial_CP_units’: 32
- ‘size_pool_1’: 2
- ‘kernel_conv_2’: 3
- ‘2_CP_units’: 8
- ‘size_pool_2’: 2
- ‘FC_units’: 96
- ‘beta1’: 0.9900000000000001
- ‘beta2’: 0.98
- ‘lrate’: 0.0014273888177808397
With this parameters the summary of the model give us:
After recovering the best model the loss and the accuracy of this model is calculated to the training, validation and test sets.
- Train loss: 1.052 — acc: 0.994
- Validation loss: 1.061 — acc: 0.981
- Test loss: 1.069 — acc: 0.983
At least, using the classification report from sklearn, the precision, recall and f1-score are exhibited.
When these results are compared with the metrics obtained with a model using a fully connected layer, the time spent to realize the training is increased, as is the number of epochs needed to the model to converge to a good solution.
The structure of the model used to compare follows:
- 1ª Layer: Flatten Layer
- 2ª Layer: Dense Layer (without activation function)
- 3ª Layer: Batch Normalization Layer
- 4ª Layer: Activation Layer
- 5ª Layer: Dense Layer
The Hyper Parameters selected for tuning are:
- Number of units in the First dense layer: 64 to 128 units, with an increment of 32
- learning rate: minimum of 110–4, maximum of 110–2, with a logarithmic scale
- beta 1: 0.90 to 0.99, with an increment of 0.01
- beta 2: 0.95 to 0.99, with an increment of 0.01
The same configuration was used for create the BayesianTuning, a total of … was spent for tuning this model
After the end of the execution, the best hyper parameters found were:
- ‘FC_units’: 64
- ‘beta1’: 0.9800000000000001
- ‘beta2’: 0.98
- ‘lrate’: 0.00010149506885369505
The summary of this model returns:
After the tuning the same process used previously is applied to get the accuracy on the training, validation and test sets of data, the accuracy obtained in each set was:
- Train loss: 1.244 — acc: 0.878
- Validation loss: 1.272 — acc: 0.852
- Test loss: 1.232 — acc: 0.883
As done previously, the classification report was used, the result was:
The code used can be found in this colab notebook.