Testing a CNN in a subset of the SIGNS dataset

Using batch normalization and data augmentation

4 min readNov 20, 2020

Source: https://upload.wikimedia.org/wikipedia/commons/6/63/Typical_cnn.png

The 1200 samples were divided as follows:

Training: 864 samples
Validation: 216 samples
Test: 120 samples

The Training and Validation sets are used during the hyper parameter Tuning of the model, and the Test set only for the final test of the model accuracy.

Python 3 was utilized as the programming language, the TensorFlow 2.0 library for creating the models and the KerasTuner for the hyper parameter Tuning. Other libraries were used as support, like the numpy, matplotlib and sklearn.

The ImageDataGenerator class was used to generate more samples for the training, you can learn more about it here.

The main structure of the model with convolution is the following, using a sequential approach:

1ª Layer: Convolution Layer
2ª Layer: Batch Normalization Layer
3ª Layer: Pooling Layer
4ª Layer: Convolution Layer
5ª Layer: Batch Normalization Layer
6ª Layer: Pooling Layer
7ª Layer: Flatten Layer
8ª Layer: Dense Layer
9ª Layer: Dense Layer

The Layers from 1 to 8 use the ReLu as activation function while the 9ª Layer use the softmax function as this is a multi classification problem. The accuracy and loss metrics used were, respectively, CategoricalCrossentropy and CategoricalAccuracy present in the tensorflow 2.0.

The following hyper parameters and they possibles options were considered:

Size of the kernel used in the first convolution layer: 2 to 5
Number of units in the first convolution layer (filters to ‘find’): 16 to 32 units, with an increment of 8 per try
Size of pool used in the first pooling layer: 2 or 3
Size of the kernel used in the second convolution layer: 2 or 3
Number of units in the second convolution layer (filters to ‘find’): 8 to 16 units, with an increment of 4 per try
Size of pool used in the second pooling layer: 2 or 3
Number of units in the dense layer: 64 to 128 units, with an increment of 32
Learning rate: minimum of 110–4, maximum of 110–2, with a logarithmic scale
Beta 1: 0.90 to 0.99, with an increment of 0.01
Beta 2: 0.95 to 0.99, with an increment of 0.01

To execute the Hyper Parameter Tuning was used a BayesianOptimization from the KerasTuner library, a total of a hundred trials were executed, with one execution per trial, an alpha of 0.5 and beta of 0.7 were used to create the Bayesian Optimizer.

For the search of the best hyperparameters, a hundred epochs of training on each model were used with a batch size of 32. Here the class for generating new images was called, to create new samples for training in the hyperparameter search, the same batch size was used here 32.

After the end of the execution, the best hyper parameters found were:

‘kernel_conv_1’: 3
‘Initial_CP_units’: 32
‘size_pool_1’: 2
‘kernel_conv_2’: 3
‘2_CP_units’: 8
‘size_pool_2’: 2
‘FC_units’: 96
‘beta1’: 0.9900000000000001
‘beta2’: 0.98
‘lrate’: 0.0014273888177808397

With this parameters the summary of the model give us:

After recovering the best model the loss and the accuracy of this model is calculated to the training, validation and test sets.

Train loss: 1.052 — acc: 0.994
Validation loss: 1.061 — acc: 0.981
Test loss: 1.069 — acc: 0.983

At least, using the classification report from sklearn, the precision, recall and f1-score are exhibited.

Print of the result of the classification report

When these results are compared with the metrics obtained with a model using a fully connected layer, the time spent to realize the training is increased, as is the number of epochs needed to the model to converge to a good solution.

The structure of the model used to compare follows:

1ª Layer: Flatten Layer
2ª Layer: Dense Layer (without activation function)
3ª Layer: Batch Normalization Layer
4ª Layer: Activation Layer
5ª Layer: Dense Layer

The Hyper Parameters selected for tuning are:

Number of units in the First dense layer: 64 to 128 units, with an increment of 32
learning rate: minimum of 110–4, maximum of 110–2, with a logarithmic scale
beta 1: 0.90 to 0.99, with an increment of 0.01
beta 2: 0.95 to 0.99, with an increment of 0.01

The same configuration was used for create the BayesianTuning, a total of … was spent for tuning this model

After the end of the execution, the best hyper parameters found were:

‘FC_units’: 64
‘beta1’: 0.9800000000000001
‘beta2’: 0.98
‘lrate’: 0.00010149506885369505

The summary of this model returns:

After the tuning the same process used previously is applied to get the accuracy on the training, validation and test sets of data, the accuracy obtained in each set was:

Train loss: 1.244 — acc: 0.878
Validation loss: 1.272 — acc: 0.852
Test loss: 1.232 — acc: 0.883

As done previously, the classification report was used, the result was:

The code used can be found in this colab notebook.

Testing a CNN in a subset of the SIGNS dataset

Using batch normalization and data augmentation

Written by João Vítor Venceslau