Introduction to Neural Networks

Hello, Happy anniversary of our 20th chapter!🥂😎

In this chapter, we'll learn about a topic befitting our anniversary, Neural Networks! 🤖🦾

Neural Networks are a fundamental concept in deep learning and AI, inspired by the structure and function of the human brain. They are also known as Artificial Neural Networks to distinguish them from the original human brain Neural Network.

They are the algorithms that deep learning runs on, we've learned extensively about the concept of deep learning in our previous chapter, and you can find it here.

In this chapter, we are going to learn about Neural Networks and how they work. Let's demystify Neural Networks!👽

A Neural Network(NN) is made up of 3 layers. The Input, hidden, and the output layers. To train an NN model, features(x) are weighted by some value(w) and the sum goes into a value called the neuron. Next a bias(alter/shift value of the neuron by some value). The neuron then passes through an activation function, which then produces the output.

The output is then compared with the original input iteratively and the weights are adjusted until it gets the right weights and bias for each feature. NN uses the techniques known as forward and backward propagation with gradient descent to learn by adjusting weights and biases in each iteration.

Without the activation function, the neural network would simply become a linear model.

A simple NN model would look like this:

Where x is the feature(s) and w is the weight(s).

Some activation functions used in NN training are:

Sigmoid
Tanh
RELU

The image below shows the functions on 2D planes. RELU is based on the principle that anything less than zero is zero and it is preferred right now by most ML engineers. We have also used the Sigmoid function when learning Logistic Regression, you can catch up here.

I know this is a lot to take, so let us use an illustration to understand these concepts.

Illustration

Let us consider a classic NN model that learns how to recognize handwritten digits, like the ones below.

It is a very difficult problem for our statistical models(Such as the ones we've learned like KNN, Random Forest, etc) to solve because the shape of say 2 can be written in a hundred different ways but our brain can recognize it as 2, but the statistical models would struggle greatly, believe me, I have tried lol. So how would an NN go about this?

Let's first assume that each image dataset we'll use for training and prediction has a fixed number of pixels, (pixels are the small cells that make images, like on Televisions). The image below shows an example of an image, with 28 by 28 pixels, showing how each pixel's brightness determines the image collectively.

These 784 pixels, would be flattened to a one-dimensional array and represent the first layer of the NN, the Input layer. From the image below, we can see the input layer with 784 neurons, the hidden layers in the middle, and the last layer is the output layer that makes the final decision.

The hidden layer of the NN can consist of several layers each with a particular purpose. Using our illustration of predicting handwritten digits, one layer can be responsible for marking circular shapes, another for straight lines, etc. So for example when a digit like 9 is passed the hidden layers would be able to determine that a 9 should have a circular part up top and a curve below with a line joining them.

The image above shows how each layer neuron value is calculated. We can see that each pixel(a) has an associated weight(w) and a bias. This uses the sigmoid function as the activation function. Each layer in the NN affects/determines the next layer.

The image below shows how complex these NN can be even for a very basic task such as handwritten digit recognition, we'll need about 13 thousand weights and biases. The model learns by finding the right weights and biases for each digit so that when it wants to make a prediction it'll apply those to each pixel and each layer to get the right prediction.

So when you pass a 9 to the model to predict, it'll pass it through several layers, each comparing weights and biases till it reaches the output layer, where it would predict the figure, based on what it has learned about figure 9 during training.

I believe by now we have demystified NN by some good measure lol. It's all about the network of layers and the values in each neuron, right? uhum. We'll get to understand this more as we go on, I'll attach materials for you at the end of the chapter to learn further, and I'll advise you to check them out.

Types of Neural Networks

There are several types of NN and more are been invented as I type this, but we'll pick a few important ones to learn about, in subsequent chapters.

Feedforward Neural Networks
Convolutional Neural Networks
Recurrent Neural Networks
Generative Adversarial Networks
Transformers - This is the tech making all the buzz in recent times, it is used by Large Language Models. We might have used them on products like ChatGPT.

To learn NN in more nitty gritty detail I'll recommend a few materials for you:

A great open-source(free) book, by Micheal Nielsen: Neural Networks and Deep Learning.2
Neural Networks an article by 3Blue1Brown.
A well-detailed NN playlist by 3Blue1Brown is on YouTube; watch it here.
A YouTube video explaining backpropagation in Neural Networks by IBM, watch it here.

Whew! Neural Networks is easily the biggest game changer in the world of Artificial Intelligence and we know a bit or two about it already!🤗

We'll learn even more when we take on Feedforward Neural Networks in our next chapter.

Take care 👽

⬅️ Previous Chapter

Next Chapter ➡️

Introduction to Neural Networks

AI Series - Chapter 20