Generative Adversarial Networks

Generative Adversarial Networks

AI Series - Chapter 24

Ā·

5 min read

Hello šŸ•¶ļø,

A Generative Adversarial Network (GAN) is a type of Neural Network in which two neural network models compete to improve each other iteratively. GANs are the most versatile neural networks. They are used in generative AI to create new samples such as images, models, videos, etc.

GAN was developed by Ian Goodfellow (who worked at Google Brain, Deepmind, Apple, and OpenAI šŸ˜) and his team at the University of Montreal in 2014 in their research paper, Generative Adversarial Nets.

GNNs architecture consist of two neural network models:

  1. Generator: The Generator in GNN tries to create fake samples of a specific domain we want to train our neural network model on (for example to be able to generate pictures of cars).

  2. Discriminator: The Discriminator model on the other hand tries to spot fake samples of the same domain.

Let us illustrate how GNNs work with an example.

Letā€™s say we want to train a GNN model to be able to create new images of cats. Weā€™ll take the following steps:

  1. Train the Discriminator: The Discriminator is trained using a neural network to learn how to spot real samples, and then trained to spot fake samples. Samples here can be images of cats or numbers.

    Neural networks, such as the Convolutional Neural Network (CNN) or the Recurrent Neural Network (RNN) we learned in the previous two chapters, are mostly used to build the Discriminator.

  2. Train the Generator: The Generator model is trained (mostly with CNN or RNN) to create fake images of the samples, eg of numbers as shown below.

  3. Zero-sum game training: Here the Generator creates a fake sample and sends it to the Discriminator to decide if the sample is real or fake. If the Discriminator model is able to predict that the sample is fake it wins, if not the Generator model wins. Itā€™s a zero-sum game, which means that one party needs to lose for another party to win and thereā€™s always a winner and a loser.

    In this type of training the loser updates its model, if the loser is the Generator itā€™ll update its model to try to make better fakes to pass the Discriminator tests. On the other hand, if the loser is the Discrimator itā€™ll update its model to try to spot fake samples better. This is an iterative process where both models are set against each other to make each better by comparing outputs.

  4. Deploy the model: The zero-sum game training ends when the Generator keeps producing very good fakes that the Discriminator can not spot them out to be faked. The Generator model is then used to create fake replicas of samples (cats in our case) which you most probably wouldnā€™t be able to spot as fakes too šŸ˜ŗ.

Observe this group of images below, do they look real?

What was your guess??

All the images in the group of pictures above are fake šŸ˜, generated by the NVIDIA StyleGAN research team in a conference paper. I donā€™t think we should be that surprised right, I mean weā€™ve heard of Deepfakes for a few years now, right? GANs are used to generate Deepfakeā€™s images/videos šŸ¤–.

Applications of GANs

I am pretty sure you have seen an image or video generated by GAN. It is versatile and the applications only keep increasing to new fields. Some of the applications are listed below, can you think of one by yourself? or a field you think GANs can be applied? Leave a comment, please šŸ‘½

  1. Image & Video Generation

    • Image/Video enhancement: This can be seen in low-resolution images or images with some damage. The model can create a new image in a higher resolution. Low-resolution videos, too, are increasingly being optimized to higher resolutions. You might have seen a movie made in the 1990s that looks very clear on Netflix; it has been enhanced using models like GNN.

    • Image Inpainting: This is a process of filling in missing parts of an image. This is used to restore images with damages, the GAN can fill in the damaged parts to restore the image.

    • Deepfakes: This involves advanced video manipulation to create new realistic videos. Examples can be seen in using face swapping and lip-syncing to make a video made by Person A look and sound like it was made by Person B.

  2. Natural Language Processing (NLP)

    • Text-to-speech: Creating new images from text like Openartā€™s text-to-image generation tool.
  3. Gaming & Animation

    • Game Assets: Create realistic textures, characters, and environments
  4. Medical & Scientific Applications

    • Organ Image Reconstruction ā€“ Rebuilding damaged organ scans for analysis, like in damaged brain analysis.
  5. Fashion & E-Commerce

    • Product Photography ā€“ Creating photorealistic product images without a photo shoot.

    • AI-Generated Fashion Designs ā€“ Designing new clothing styles.

  6. Robotics & Simulation

    • Simulated Training Environments ā€“ Creating synthetic worlds to train robots and autonomous vehicles. You can watch a 20-second YouTube video showing such type of virtual environment here.

The Generative Adversarial Network (GAN) is a big breakthrough in neural networks and machine learning. I believe we understand how GANs work at a high level now, and when we come to the hands-on chapters, weā€™ll practically build one ourselves.

Our next series chapter will be on the biggest invention in the field of Machine Learning and Artificial Intelligence, Transformers. See ya šŸ•¶ļø

ā¬…ļø Previous Chapter

Ā