Testing a CNN in a subset of the SIGNS dataset

Using batch normalization and data augmentation

João Vítor Venceslau
4 min readNov 20, 2020
Source: https://upload.wikimedia.org/wikipedia/commons/6/63/Typical_cnn.png

The 1200 samples were divided as follows:

  • Training: 864 samples
  • Validation: 216 samples
  • Test: 120 samples

The Training and Validation sets are used during the hyper parameter Tuning of the model, and the Test set only for the final test of the model accuracy.

Python 3 was utilized as the programming language, the TensorFlow 2.0 library for creating the models and the KerasTuner for the hyper parameter Tuning. Other libraries were used as support, like the numpy, matplotlib and sklearn.

The ImageDataGenerator class was used to generate more samples for the training, you can learn more about it here.

The main structure of the model with convolution is the following, using a sequential approach:

  • 1ª Layer: Convolution Layer
  • 2ª Layer: Batch Normalization Layer
  • 3ª Layer: Pooling Layer
  • 4ª Layer: Convolution Layer
  • 5ª Layer: Batch Normalization Layer
  • 6ª Layer: Pooling Layer
  • 7ª Layer: Flatten Layer
  • 8ª Layer: Dense Layer
  • 9ª Layer: Dense Layer

The Layers from 1 to 8 use the ReLu as activation function while the 9ª Layer use the softmax function as this is a multi classification problem. The accuracy and loss metrics used were, respectively, CategoricalCrossentropy and CategoricalAccuracy present in the tensorflow 2.0.

The following hyper parameters and they possibles options were considered:

  • Size of the kernel used in the first convolution layer: 2 to 5
  • Number of units in the first convolution layer (filters to ‘find’): 16 to 32 units, with an increment of 8 per try
  • Size of pool used in the first pooling layer: 2 or 3
  • Size of the kernel used in the second convolution layer: 2 or 3
  • Number of units in the second convolution layer (filters to ‘find’): 8 to 16 units, with an increment of 4 per try
  • Size of pool used in the second pooling layer: 2 or 3
  • Number of units in the dense layer: 64 to 128 units, with an increment of 32
  • Learning rate: minimum of 110–4, maximum of 110–2, with a logarithmic scale
  • Beta 1: 0.90 to 0.99, with an increment of 0.01
  • Beta 2: 0.95 to 0.99, with an increment of 0.01

To execute the Hyper Parameter Tuning was used a BayesianOptimization from the KerasTuner library, a total of a hundred trials were executed, with one execution per trial, an alpha of 0.5 and beta of 0.7 were used to create the Bayesian Optimizer.

For the search of the best hyperparameters, a hundred epochs of training on each model were used with a batch size of 32. Here the class for generating new images was called, to create new samples for training in the hyperparameter search, the same batch size was used here 32.

After the end of the execution, the best hyper parameters found were:

  • ‘kernel_conv_1’: 3
  • ‘Initial_CP_units’: 32
  • ‘size_pool_1’: 2
  • ‘kernel_conv_2’: 3
  • ‘2_CP_units’: 8
  • ‘size_pool_2’: 2
  • ‘FC_units’: 96
  • ‘beta1’: 0.9900000000000001
  • ‘beta2’: 0.98
  • ‘lrate’: 0.0014273888177808397

With this parameters the summary of the model give us:

Print of the result of the summary

After recovering the best model the loss and the accuracy of this model is calculated to the training, validation and test sets.

  • Train loss: 1.052 — acc: 0.994
  • Validation loss: 1.061 — acc: 0.981
  • Test loss: 1.069 — acc: 0.983

At least, using the classification report from sklearn, the precision, recall and f1-score are exhibited.

Print of the result of the classification report

When these results are compared with the metrics obtained with a model using a fully connected layer, the time spent to realize the training is increased, as is the number of epochs needed to the model to converge to a good solution.

The structure of the model used to compare follows:

  • 1ª Layer: Flatten Layer
  • 2ª Layer: Dense Layer (without activation function)
  • 3ª Layer: Batch Normalization Layer
  • 4ª Layer: Activation Layer
  • 5ª Layer: Dense Layer

The Hyper Parameters selected for tuning are:

  • Number of units in the First dense layer: 64 to 128 units, with an increment of 32
  • learning rate: minimum of 110–4, maximum of 110–2, with a logarithmic scale
  • beta 1: 0.90 to 0.99, with an increment of 0.01
  • beta 2: 0.95 to 0.99, with an increment of 0.01

The same configuration was used for create the BayesianTuning, a total of … was spent for tuning this model

After the end of the execution, the best hyper parameters found were:

  • ‘FC_units’: 64
  • ‘beta1’: 0.9800000000000001
  • ‘beta2’: 0.98
  • ‘lrate’: 0.00010149506885369505

The summary of this model returns:

Print of the result of the summary

After the tuning the same process used previously is applied to get the accuracy on the training, validation and test sets of data, the accuracy obtained in each set was:

  • Train loss: 1.244 — acc: 0.878
  • Validation loss: 1.272 — acc: 0.852
  • Test loss: 1.232 — acc: 0.883

As done previously, the classification report was used, the result was:

Print of the result of the classification report

The code used can be found in this colab notebook.

--

--