Hello š¶ļø,
A Generative Adversarial Network (GAN) is a type of Neural Network in which two neural network models compete to improve each other iteratively. GANs are the most versatile neural networks. They are used in generative AI to create new samples such as images, models, videos, etc.
GAN was developed by Ian Goodfellow (who worked at Google Brain, Deepmind, Apple, and OpenAI š) and his team at the University of Montreal in 2014 in their research paper, Generative Adversarial Nets.
GNNs architecture consist of two neural network models:
Generator: The Generator in GNN tries to create fake samples of a specific domain we want to train our neural network model on (for example to be able to generate pictures of cars).
Discriminator: The Discriminator model on the other hand tries to spot fake samples of the same domain.
Let us illustrate how GNNs work with an example.
Letās say we want to train a GNN model to be able to create new images of cats. Weāll take the following steps:
Train the Discriminator: The Discriminator is trained using a neural network to learn how to spot real samples, and then trained to spot fake samples. Samples here can be images of cats or numbers.
Neural networks, such as the Convolutional Neural Network (CNN) or the Recurrent Neural Network (RNN) we learned in the previous two chapters, are mostly used to build the Discriminator.
Train the Generator: The Generator model is trained (mostly with CNN or RNN) to create fake images of the samples, eg of numbers as shown below.
Zero-sum game training: Here the Generator creates a fake sample and sends it to the Discriminator to decide if the sample is real or fake. If the Discriminator model is able to predict that the sample is fake it wins, if not the Generator model wins. Itās a zero-sum game, which means that one party needs to lose for another party to win and thereās always a winner and a loser.
In this type of training the loser updates its model, if the loser is the Generator itāll update its model to try to make better fakes to pass the Discriminator tests. On the other hand, if the loser is the Discrimator itāll update its model to try to spot fake samples better. This is an iterative process where both models are set against each other to make each better by comparing outputs.
Deploy the model: The zero-sum game training ends when the Generator keeps producing very good fakes that the Discriminator can not spot them out to be faked. The Generator model is then used to create fake replicas of samples (cats in our case) which you most probably wouldnāt be able to spot as fakes too šŗ.
Observe this group of images below, do they look real?
What was your guess??
All the images in the group of pictures above are fake š, generated by the NVIDIA StyleGAN research team in a conference paper. I donāt think we should be that surprised right, I mean weāve heard of Deepfakes for a few years now, right? GANs are used to generate Deepfakeās images/videos š¤.
Applications of GANs
I am pretty sure you have seen an image or video generated by GAN. It is versatile and the applications only keep increasing to new fields. Some of the applications are listed below, can you think of one by yourself? or a field you think GANs can be applied? Leave a comment, please š½
Image & Video Generation
Image/Video enhancement: This can be seen in low-resolution images or images with some damage. The model can create a new image in a higher resolution. Low-resolution videos, too, are increasingly being optimized to higher resolutions. You might have seen a movie made in the 1990s that looks very clear on Netflix; it has been enhanced using models like GNN.
Image Inpainting: This is a process of filling in missing parts of an image. This is used to restore images with damages, the GAN can fill in the damaged parts to restore the image.
Deepfakes: This involves advanced video manipulation to create new realistic videos. Examples can be seen in using face swapping and lip-syncing to make a video made by Person A look and sound like it was made by Person B.
Natural Language Processing (NLP)
- Text-to-speech: Creating new images from text like Openartās text-to-image generation tool.
Gaming & Animation
- Game Assets: Create realistic textures, characters, and environments
Medical & Scientific Applications
- Organ Image Reconstruction ā Rebuilding damaged organ scans for analysis, like in damaged brain analysis.
Fashion & E-Commerce
Product Photography ā Creating photorealistic product images without a photo shoot.
AI-Generated Fashion Designs ā Designing new clothing styles.
Robotics & Simulation
- Simulated Training Environments ā Creating synthetic worlds to train robots and autonomous vehicles. You can watch a 20-second YouTube video showing such type of virtual environment here.
The Generative Adversarial Network (GAN) is a big breakthrough in neural networks and machine learning. I believe we understand how GANs work at a high level now, and when we come to the hands-on chapters, weāll practically build one ourselves.
Our next series chapter will be on the biggest invention in the field of Machine Learning and Artificial Intelligence, Transformers. See ya š¶ļø