Convolutional Neural Networks

A brief introduction

5 min readJan 15, 2021

In this article, we’ll talk about-

Neural Networks: structure, working
Convolutional Neural Networks: different layers, their workings, purpose
Implementation of a CNN to classify the CIFAR-10 data set.

Neural Networks

A neural network is a group of connected I/O units where each connection has a weight associated with its computer programs. We can build predictive models from large databases using neural networks. With this model, we can conduct image understanding, human learning, computer speech, etc.

A general diagram of a neural network with one hidden layer.

Each node is connected to every node in the next layer.
A weight ‘w’ is associated with each connection between two nodes.
A bias ‘b’ is associated with every node
These biases and weights for each layer are represented by vectors B and W respectively.
If the input is the vector ‘V’ and the first layer which has weight vector W1 and bias vector B1, then the output to the first layer is calculated by Z(V.W1 — B1). ‘Z’ is an activation function.

Convolutional Neural Networks

A Convolutional Neural Network is a type of neural network that is usually used to process data that has a grid like structure, for example, images. Images can be represented in grids with each cell of a grid representing a pixel of the image.

A convolutional Neural Network has the following layers -

Convolutional Layer
Pooling Layer
Fully Connected Layer

We now go, step by step through a CNN and explain the function of each layer. Consider an input image which is represented by a matrix(grid), each entry in the matrix represents one pixel of the image.

Convolutional Layer

The image is taken as an input to the convolutional layer.
There is a matrix called the ‘kernel’ which is associated with the convolutional layer.
In the convolutional layer, a dot product is performed between the kernel and a portion of the input matrix, i.e the kernel(matrix) slides across the input matrix and the dot product is computed and stored in a new matrix.
The dot product is computed for each possible sub-matrix of the input matrix, one by one and is stored in the new matrix.

Here,

Matrix on the left = Input matrix(Image)
Matrix on the middle = kernel
Matrix on the right = New matrix

Pooling Layer

The pooling layer is used for down sampling of the features.

The two types of pooling operations are called max and average pooling, where the maximum and average value of features is taken, respectively.

The output matrix of the convolutional layer is taken as input of the pooling layer.
Average or Maximum value of every contiguous sub-matrix of the input matrix is computed and stored to a new matrix.
The output matrix has smaller dimensions.

Here,the maximum value in each 2X2 sub-matrix is taken and stored in a output matrix.

Fully Connected Layer

The output from the pooling layer is taken as the input.
The input is flattened, i.e each layer is taken end to end and taken is input.
This input is taken and weights are applied to predict the correct label.
In the output layer, output size = number of labels(classes)

The output from the pooling layer is flattened, sent as input into the fully connected layer which predicts the label.

So, summing up all these layers gives us this-

convolutional neural network: a general architecture

Implementation of a CNN

We now implement the above concepts to classify the images in CIFAR-10 dataset.

The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

The class labels and their associated integer values are-

0: airplane
1: automobile
2: bird
3: cat
4: deer
5: dog
6: frog
7: horse
8: ship
9: truck

We use TensorFlow and keras, which are very convenient as they have all the components needed for a CNN predefined.

Here we initialize and load the CIFAR-10 dataset into tuples, one set of tuples for training and one set for testing.

Both the training images and testing images are reshaped and the pixel values are normalized.reshape(50000, 32, 32, 3)

In reshape(50000, 32, 32, 3) the 50000 signifies the 50,000 images in the training dataset, 32 and 32 indicate the size of an individual image in the dataset(32X32) and 3 indicates the three colours red ,blue and green , one for each colour.