Hello đź‘˝,
Convolutional Neural Networks(CNN) is a type of Neural Network that uses convolution layers. CNN is mostly used to process grid data like images so it’s key in computer vision tasks.
Since it’s been a while, I’d like us to brush up on our knowledge of Neural Networks here. A neural network has three layers: the input, hidden, and output layers. The main difference between different types of neural networks is what happens in the hidden layer. In our previous neural network chapter(21), we learned about the most basic type of neural network: the FeedForward Neural Network.
Convolution is a mathematical way of combining two signals to form a third signal, if you want to learn more about the philosophy of convolution you can read this paper here.
In CNN, the hidden layer of the neural network is made up of several convolution layers. Each layer detects a specific pattern. For example, if we want to detect cars, the hidden layer of our CNN model would include filters for tires, glass windows, windshields, etc.
How it works
Convolutional layers are simply grid data, for example, 3×3, 4×4, etc. The grid data contains a value that would be used to transform the image at each layer. For example, let’s say we are trying to detect images of numbers 0 to 9. First, we’ll need to be able to represent that image as a grid of numbers, as shown below with the number 7.
Then we can say we want to use 4 convolutional layers in our hidden layer, to detect the edges, top, left, bottom, and right. We'll then create a 3×3 grid of numbers that we’ll use to transform the image at each layer as shown in the image below. Each filter detects edges from which we’ll be able to the predict number.
At each layer, we’ll multiply each 3×3 grid with the convolutional layer matrix.
So for example in the grid of the number 7 above, using filter 1 we can transform it like so:
When we complete the transformation at that convolutional layer it’ll look like so:
The red part shows the top edge.
There’s a very nice interactive demo illustrating the concept of CNN made by DeepLizard you can check it out here. For more technical details you can check this YouTube video here as well. Now it’s worthy of note that convolutional layers in CNN can get complex for real-world image detection.
After the convolutional layer, there’s the pooling layer and the fully connected layer. The Pooling Layer reduces the number of parameters by reducing dimensionality (Remember Dimensionality Reduction right?), which simply means using fewer parameters to illustrate the grid data. The fully connected layer performs classification based on the features extracted through the previous layers (Convolutional layers and Pooling Layer) and their different filters. It decides whether the image is a number 9, a car, a bicycle, etc.
Convolutional Neural Networks are used mainly for image processing and computer vision tasks because they are good at grid data pattern recognition. As we know, all images can be illustrated as a grid of numbers; that’s how computers show images natively.
Conclusion
If you have been following our series, we always practically build each model we learn. We’ll still do that, so keep tight. We’ll first learn about the different types of Neural Networks and then build and deploy each one to solve a problem.
In our next chapter, we’ll talk about Recurrent Neural Networks, so keep tight, keep practicing, and keep safe 🕶️