Deep learning algorithms work with almost any kind of data and require large amounts of computing power and information to solve complicated issues. Now, let us, deep-dive, into one of the most famous deep learning algorithms: Generative adversarial networks (GANs).
Generative adversarial networks (GANs) are an exciting (and relatively) recent innovation in machine learning and deep learning. GANs are generative deep learning algorithms that create new data instances that resemble the training data. GAN has two components: A generator, which learns to generate fake data, and a discriminator, which learns from that false information.
GANs are also the engine behind DALL-E, a recent breakthrough from OpenAI that can generate images from any text description.
If you are interested in learning how to use Python for deep learning with generative adversarial networks (GANs); which are a powerful technique for creating realistic and diverse synthetic data, this article is perfect for you.
Before we begin, let’s see the remarkable ability of DALL-E (GAN) to generate a seemingly scientific photo of an atom:
Table of Contents
What is Deep Learning?
Deep learning, a branch of machine learning, addresses intricate problems through the utilization of artificial neural networks. These networks consist of interconnected nodes organized in multiple layers, extracting features from input data. Extensive datasets are employed to train these models, enabling them to identify patterns and correlations that might be challenging or impossible for humans to perceive.
The impact of deep learning on artificial intelligence has been substantial. It has paved the way for the development of intelligent systems capable of independent learning, adaptation, and decision-making. Deep learning has led to remarkable advancements in various domains, encompassing image and speech recognition, natural language processing, machine translation, text generation, image generation (as would be reviewed in this article), autonomous driving, and numerous others.
Why Python for Deep Learning?
Python has gained widespread popularity as a programming language due to its versatility and ease of use in diverse domains of computer science, especially in the field of deep learning. Thanks to its extensive range of libraries and frameworks specially tailored for deep learning, Python has emerged as a top choice among many machine learning professionals.
Python has emerged as the language of choice for deep learning, and here are some of the reasons why:
1. Simple to learn and use:
Python is a high-level programming language that is easy to learn and use, even for those who are new to programming. Its concise and uncomplicated syntax makes it easy to write and understand. This allows developers to concentrate on solving problems without worrying about the details of the language.
2. Abundant libraries and frameworks:
Python has a vast ecosystem of libraries and frameworks that cater specifically to deep learning. Some of these libraries include TensorFlow, PyTorch, Keras, and Theano. These libraries provide pre-built functions and modules that simplify the development process, reducing the need to write complex code from scratch.
3. Strong community support:
Python has a large and active community of developers contributing to its development, maintenance, and improvement. This community offers support and guidance to beginners, making it easier to learn and use Python for deep learning.
4. Platform independence:
Python is platform-independent, which means that code written on one platform can be easily executed on another platform without any modification. This makes it easier to deploy deep learning models on different platforms and devices.
5. Easy integration with other languages:
Python can be easily integrated with other programming languages, such as Delphi, C++, and Java, making it ideal for building complex systems that require integrating different technologies.
Overall, Python’s ease of use, an abundance of libraries and frameworks, strong community support, platform independence, and ease of integration with other languages make it an indispensable tool for machine learning practitioners. Its popularity continues to soar as a result.
What is DALL-E? A revolutionary image generation model
DALL-E is a remarkable GAN model that can generate images from text descriptions (even an intricate one), such as “a cat wearing a bow tie” or “a painting of a landscape in the style of Van Gogh”. It is based on a large-scale dataset of text-image pairs, and a transformer architecture that can encode both text and image modalities.
DALL-E can create plausible and diverse images for a wide range of concepts, such as animals, objects, scenes, and transformations, and control their attributes, viewpoints, and perspectives. DALL-E can also combine multiple concepts, such as “an armchair in the shape of an avocado” or “a snail made of a harp”, and generate novel and creative images that do not exist in the real world. DALL-E demonstrates the power and potential of GANs for image synthesis and multimodal understanding.
Applications beyond DALL-E: GANs in various domains
DALL-E is based on a large-scale dataset of text-image pairs, and a transformer architecture that can encode both text and image modalities. DALL-E demonstrates the power and potential of GANs for image synthesis and multimodal understanding.
However, DALL-E is not the only application of GANs. GANs have been successfully applied to various domains and tasks, such as computer vision, natural language generation, audio synthesis, video prediction, and more. Some of the examples of GAN applications are:
1. Image generation
GANs can generate realistic and diverse images of objects, scenes, faces, animals, and more, from random noise or text descriptions.
2. Image-to-image translation
GANs can transform images from one domain to another, such as changing the style, season, or content of the images.
3. Image enhancement
GANs can improve the quality and resolution of images, such as super-resolution, deblurring, denoising, inpainting, and colorization.
4. Text generation
GANs can generate realistic and diverse texts, such as stories, poems, reviews, captions, and more, from random noise or keywords.
5. Text-to-speech
GANs can synthesize natural and expressive speech from text, such as voice cloning, style transfer, and emotion modulation.
6. Speech enhancement
GANs can improve the quality and intelligibility of speech, such as noise reduction, dereverberation, and bandwidth extension.
7. Video generation
GANs can generate realistic and diverse videos, such as animations, simulations, and future predictions, from random noise or text descriptions.
8. Video-to-video translation
GANs can transform videos from one domain to another, such as changing the style, content, or viewpoint of the videos.
What is Generative Adversarial Networks (GANs)?
Generative Adversarial Networks (GANs) are a breakthrough innovation in deep learning that can generate realistic and diverse data from random noise or text descriptions. GANs have many applications in various domains, such as computer vision, natural language generation, audio synthesis, and more. GANs can also enable creativity, accessibility, and fairness by generating novel and inclusive data that do not exist in the real world.
GANs consist of two neural networks that compete with each other in a game-like scenario:
1. Discriminator
A discriminator that tries to distinguish between real and fake data.
More formally, given a set of data instances X
and a set of labels Y
: Discriminative models capture the conditional probability p(Y | X)
.
Illustration of the discriminative model in the handwritten digits generation use cases (we will explore the hands-on of it in the next sections):
2. Generator
A generator that tries to create fake data.
More formally, given a set of data instances X
and a set of labels Y
: Generative models capture the joint probability p(X, Y)
, or just p(X)
if there are no labels.
Illustration of generative model in the handwritten digits generation use cases (we will explore the hands-on of it in the next sections):
The discriminator and the generator are trained simultaneously, in an adversarial manner, until they reach an equilibrium, where the generator can fool the discriminator about half the time. GANs can learn to produce high-quality and diverse data, such as images, text, audio, and video, by leveraging large-scale datasets and advanced network architectures.
Here’s a picture of the whole GAN system:
What are Python tools and libraries needed for GAN development?
Python is one of the most popular and widely used programming languages for machine learning and artificial intelligence, especially for developing generative adversarial networks (GANs). Python offers a rich set of tools and libraries that can help you implement and train GAN models with ease and efficiency.
Some of the most useful and popular Python tools and libraries for GAN development are:
1. PyTorch
PyTorch is an open-source deep learning framework that provides a flexible and dynamic way of building and running GAN models. PyTorch supports automatic differentiation, GPU acceleration, distributed training, and various GAN architectures and loss functions. PyTorch also has a large and active community that contributes to the development and improvement of the framework[10].
2. TensorFlow
TensorFlow is another open-source deep learning framework that offers a comprehensive and scalable platform for building and deploying GAN models. TensorFlow supports eager execution, graph optimization, tensor operations, and various GAN architectures and loss functions. TensorFlow also has a high-level API called Keras, which simplifies the process of creating and training GAN models[10].
3. PyGAN
PyGAN is a Python library that implements GANs and its variants, such as conditional GANs, adversarial auto-encoders, and energy-based GANs. PyGAN allows you to design generative models based on statistical machine learning problems and optimize them using various algorithms and metrics[10].
4. TorchGAN
TorchGAN is a Python library that provides a collection of GAN models, loss functions, and evaluation metrics, built on top of PyTorch. TorchGAN enables you to easily create and customize your own GAN models, as well as reproduce the results of existing GAN papers[10].
5. VeGANs
VeGANs is another Python library that provides a variety of GAN models, loss functions, and evaluation metrics, built on top of PyTorch. VeGANs aims to make GAN development accessible and user-friendly, by offering a simple and consistent interface, as well as tutorials and examples[10].
Without further ado, let’s get our hands dirty with the hands-on GANs with Python, with three different use cases: Numerical mathematics (approximate a plot of a sine function), generating handwritten digits, and generating realistic human faces.
Hands-On GAN 1: Generate random numbers using GAN, to approximate sine plot
In this section, we will explore how GANs can be used to generate data that follows a simple sine function, between interval 0
and 2π
. We will implement a GAN using PyTorch and show how the generator and the discriminator networks interact and improve over time. We will also demonstrate the results of our GAN by comparing the generated data with the original sine function data.
The following is the complete Python code to automatically generate random numbers using GAN, to approximate sine plot:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 |
import torch from torch import nn import math import matplotlib.pyplot as plt torch.manual_seed(111) train_data_length = 1024 train_data = torch.zeros((train_data_length, 2)) train_data[:, 0] = 2 * math.pi * torch.rand(train_data_length) train_data[:, 1] = torch.sin(train_data[:, 0]) train_labels = torch.zeros(train_data_length) train_set = [ (train_data[i], train_labels[i]) for i in range(train_data_length) ] # Plot training data plt.plot(train_data[:, 0], train_data[:, 1], ".") plt.show() # Create a PyTorch data loader batch_size = 32 train_loader = torch.utils.data.DataLoader( train_set, batch_size=batch_size, shuffle=True ) # Implementing the Discriminator class Discriminator(nn.Module): def __init__(self): super().__init__() self.model = nn.Sequential( nn.Linear(2, 256), nn.ReLU(), nn.Dropout(0.3), nn.Linear(256, 128), nn.ReLU(), nn.Dropout(0.3), nn.Linear(128, 64), nn.ReLU(), nn.Dropout(0.3), nn.Linear(64, 1), nn.Sigmoid(), ) def forward(self, x): output = self.model(x) return output # Instantiate a Discriminator object discriminator = Discriminator() # Implementing the Generator class Generator(nn.Module): def __init__(self): super().__init__() self.model = nn.Sequential( nn.Linear(2, 16), nn.ReLU(), nn.Linear(16, 32), nn.ReLU(), nn.Linear(32, 2), ) def forward(self, x): output = self.model(x) return output generator = Generator() # Training the models lr = 0.001 num_epochs = 300 loss_function = nn.BCELoss() # Create the optimizers using Adam optimizer optimizer_discriminator = torch.optim.Adam(discriminator.parameters(), lr=lr) optimizer_generator = torch.optim.Adam(generator.parameters(), lr=lr) # Implement a training loop in which training samples are fed to the models, and their weights are updated to minimize the loss function: for epoch in range(num_epochs): for n, (real_samples, _) in enumerate(train_loader): # Data for training the discriminator real_samples_labels = torch.ones((batch_size, 1)) latent_space_samples = torch.randn((batch_size, 2)) generated_samples = generator(latent_space_samples) generated_samples_labels = torch.zeros((batch_size, 1)) all_samples = torch.cat((real_samples, generated_samples)) all_samples_labels = torch.cat( (real_samples_labels, generated_samples_labels) ) # Training the discriminator discriminator.zero_grad() output_discriminator = discriminator(all_samples) loss_discriminator = loss_function( output_discriminator, all_samples_labels) loss_discriminator.backward() optimizer_discriminator.step() # Data for training the generator latent_space_samples = torch.randn((batch_size, 2)) # Training the generator generator.zero_grad() generated_samples = generator(latent_space_samples) output_discriminator_generated = discriminator(generated_samples) loss_generator = loss_function( output_discriminator_generated, real_samples_labels ) loss_generator.backward() optimizer_generator.step() # Show loss if epoch % 10 == 0 and n == batch_size - 1: print(f"Epoch: {epoch} Loss D.: {loss_discriminator}") print(f"Epoch: {epoch} Loss G.: {loss_generator}") # Checking the samples generated by the GAN latent_space_samples = torch.randn(100, 2) generated_samples = generator(latent_space_samples) generated_samples = generated_samples.detach() plt.plot(generated_samples[:, 0], generated_samples[:, 1], ".") plt.show() |
To execute the code above seamlessly without any errors, we can utilize the PyScripter IDE.
What did the code above do?
Let’s break down the important parts of the code above:
1. Data generation:
train_data_length
specifies the number of data points to be generated.train_data
is a tensor of shape(train_data_length, 2)
where the first column represents random values between0
and2π
, and the second column is thesine
of the first column.train_labels
is a tensor of zeros.train_set
is a list of tuples, each containing a data point and its corresponding label.
2. Data visualization:
- The generated training data is plotted using
matplotlib
.
3. Data loader:
batch_size
is set to32
.train_loader
is a PyTorch data loader that shuffles and batches the training data.
4. Discriminator model:
- The
Discriminator
class is defined as a subclass ofnn.Module
. - It consists of a feedforward neural network with layers of sizes
2
(input) →256
→128
→64
→1
, followed by aSigmoid
activation. Dropout
layers with a dropout probability of0.3
are added for regularization.
5. Generator model:
- The
Generator
class is defined, similar to theDiscriminator
. - It is a neural network with layers of sizes
2
(input) →16
→32
→2
.
6. Model initialization:
- Instances of the
Discriminator
andGenerator
classes are created.
7. Training configuration:
- Learning rate (
lr
) is set to0.001
. num_epochs
is set to300
.- Binary Cross Entropy Loss (
nn.BCELoss()
) is used as the loss function.
8. Optimizer initialization:
Adam
optimizers are created for both thediscriminator
andgenerator
.
9. Training loop:
- The code runs a training loop for the specified number of epochs.
- For each epoch, it iterates through batches of data from the
train_loader
. - For the discriminator:
- Real and generated samples are combined.
- The discriminator is trained to distinguish between real and generated samples.
- For the generator:
- The generator is trained to generate samples that the discriminator classifies as real.
- Losses for the discriminator and generator are printed every 10 epochs.
10. Generated samples visualization:
- After training, 100 samples are generated using the trained generator, and they are plotted.
In summary, the code above implements a simple Generative Adversarial Network (GAN) where the generator and discriminator are trained adversarially to generate realistic samples. The generator generates fake samples to try and fool the discriminator, while the discriminator learns to distinguish between real and fake samples.
Here are a few selected outputs from all the process above:
Selected outputs:
Examine the training data by plotting each point (x₁, x₂):
Plot the generated samples. We show you the screenshot of the plotting results in epoch 300
, which almost perfectly resemble the sine plot:
To see the progression between epoch (from 0
to 300
) more clearly, please watch the following video:
Hands-On GAN 2: Generate handwritten digits using GAN
In this section, we will explore how GANs can be used to generate realistic images of handwritten digits. For that, you’ll train the models using the MNIST dataset of handwritten digits, which is included in the torchvision
package. We will implement a GAN using PyTorch and show how the generator will produce fake images and the discriminator will try to tell them apart.
The following is the complete Python code to automatically generate handwritten digits using GAN:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
import torch from torch import nn, optim import torchvision import torchvision.transforms as transforms import math import matplotlib.pyplot as plt import os torch.manual_seed(111) device = '' if torch.cuda.is_available(): device = torch.device('cuda') else: device = torch.device('cpu') transform = transforms.Compose( [transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))]) train_set = torchvision.datasets.MNIST(root='.', train=True, download=True, transform=transform) batch_size = 32 train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size, shuffle=True) plt.figure(dpi=150) real_samples, mnist_labels = next(iter(train_loader)) for i in range(16): ax = plt.subplot(4, 4, i+1) plt.imshow(real_samples[i].reshape(28, 28), cmap='gray_r') plt.xticks([]) plt.yticks([]) plt.tight_layout() class Discriminator(nn.Module): def __init__(self): super().__init__() self.model = nn.Sequential( nn.Linear(784, 1024), nn.ReLU(), nn.Dropout(0.3), nn.Linear(1024, 512), nn.ReLU(), nn.Dropout(0.3), nn.Linear(512, 256), nn.ReLU(), nn.Dropout(0.3), nn.Linear(256, 1), nn.Sigmoid() ) def forward(self, x): x = x.view(x.size(0), 784) output = self.model(x) return output discriminator = Discriminator().to(device=device) class Generator(nn.Module): def __init__(self): super().__init__() self.model = nn.Sequential( nn.Linear(100, 256), nn.ReLU(), nn.Linear(256, 512), nn.ReLU(), nn.Linear(512, 1024), nn.ReLU(), nn.Linear(1024, 784), nn.Tanh() ) def forward(self, x): #x = x.view(x.size(0), 100) output = self.model(x) output = output.view(x.size(0), 1, 28, 28) return output generator = Generator().to(device=device) lr = 0.0001 num_epochs = 50 loss_function = nn.BCELoss() optimizer_discriminator = torch.optim.Adam(discriminator.parameters(), lr=lr) optimizer_generator = torch.optim.Adam(generator.parameters(), lr=lr) latent_space_samples_plot = torch.randn((16, 100)).to(device=device) # Load trained NN when it exists, or train a new NN if os.path.isfile('discriminator.pt') and os.path.isfile('generator.pt'): discriminator.load_state_dict(torch.load('./discriminator.pt')) generator.load_state_dict(torch.load('./generator.pt')) else: for epoch in range(num_epochs): for n, (real_samples, mnist_labels) in enumerate(train_loader): # Data for training the discriminator real_samples = real_samples.to(device=device) real_samples_labels = torch.ones((batch_size, 1)).to(device=device) latent_space_samples = torch.randn((batch_size, 100)).to(device=device) generated_samples = generator(latent_space_samples) generated_samples_labels = torch.zeros( (batch_size, 1)).to(device=device) all_samples = torch.cat((real_samples, generated_samples)) all_samples_labels = torch.cat( (real_samples_labels, generated_samples_labels)) # Training the discriminator discriminator.zero_grad() output_discriminator = discriminator(all_samples) loss_discriminator = loss_function( output_discriminator, all_samples_labels) loss_discriminator.backward() optimizer_discriminator.step() # Data for training the generator latent_space_samples = torch.randn((batch_size, 100)).to(device=device) # Training the generator generator.zero_grad() generated_samples = generator(latent_space_samples) output_discriminator_generated = discriminator(generated_samples) loss_generator = loss_function( output_discriminator_generated, real_samples_labels) loss_generator.backward() optimizer_generator.step() # Show loss if n == batch_size - 1: print(f"Epoch: {epoch} Loss D.: {loss_discriminator}") print(f"Epoch: {epoch} Loss G.: {loss_generator}") latent_space_samples = torch.randn(batch_size, 100).to(device=device) generated_samples = generator(latent_space_samples) generated_samples = generated_samples.cpu().detach() plt.figure(dpi=150) for i in range(16): ax = plt.subplot(4, 4, i+1) plt.imshow(generated_samples[i].reshape(28, 28), cmap='gray_r') plt.xticks([]) plt.yticks([]) plt.tight_layout() # Save trained NN parameters torch.save(generator.state_dict(), 'generator.pt') torch.save(discriminator.state_dict(), 'discriminator.pt') |
To execute the code provided seamlessly, without any errors, we can utilize the PyScripter IDE.
What did the code above do?
Let’s break down the important parts of the code above:
1. Importing necessary libraries
torch
for PyTorch,nn
for neural network modules,optim
for optimizers,torchvision
for handling datasets like MNIST,transforms
for data transformations,math
for mathematical functions,matplotlib.pyplot
for plotting, andos
for operating system related functions.
2. Checking if a CUDA-enabled GPU is available and setting the device accordingly.
3. Defining a data transformation pipeline using transforms.Compose
.
It converts the images to PyTorch tensors and normalizes them.
4. Loading the MNIST dataset for training,
specifying the root directory, setting it for training, downloading it if not available, and applying the defined transformation.
5. Creating a PyTorch data loader to handle batching and shuffling of the training data.
6. Plotting 16 real samples from the MNIST dataset using matplotlib.
7. Defining the Discriminator
class,
which is a neural network with several fully connected layers, ReLU
activations, and Dropout
layers. The final layer has a Sigmoid
activation.
8. Implementing the forward method for the Discriminator
class
and creating an instance of the Discriminator
class, moving it to the specified device.
9. Defining the Generator
class,
which is another neural network with fully connected layers, ReLU
activations, and a hyperbolic tangent (Tanh
) activation.
10. Implementing the forward method for the Generator
class
and creating an instance of the Generator
class, moving it to the specified device.
11. Setting hyperparameters:
- learning rate (
lr
), - number of epochs (
num_epochs
), and using - Binary Cross Entropy Loss (
nn.BCELoss()
). - Initializing
Adam
optimizers for both thediscriminator
andgenerator
.
12. Generating random samples in the latent space for visualization.
13. Loading pre-trained models if available,
otherwise training the models for a specified number of epochs.
14. Generating and plotting 16 samples from the generator.
15. Saving the trained model parameters for future use.
Here are a few selected outputs from all the process above:
Selected outputs:
Download and extract the dataset:
Train the model with seed=111
:
The following is the visualization of the excerpt of the MNIST dataset:
vs the results of generated handwriting by GAN in epoch 50
:
To see the progression between epoch (from 0
to 50
) more clearly, please see the following video:
Hands-On 3: Generate realistic human faces using GAN
In this section, we will learn how to generate realistic human faces using GANs. The GAN consists of two competing networks: A generator that creates fake images from random noise, and a discriminator that distinguishes real images from fake ones. We will use a large dataset of celebrity images to train our GAN and produce high-quality and diverse faces. However, as this task consumes large computational power, we will perform it using Kaggle’s GPU, while we will also show you the limitations and challenges of using a regular laptop.
Introduction to Kaggle’s GPU Options: P100 vs. T4
Kaggle offers its users a 30-hour weekly time cap for GPU access, allowing them to choose between NVIDIA T4 and P100 GPUs. However, many Kaggle users may lack clarity on which GPU is best suited for their specific needs.
In general, the T4 GPU is an optimal choice for inference workloads that demand high throughput and low power consumption. On the other hand, the P100 GPU excels in handling training workloads, thanks to its superior performance and increased memory capacity.[11].
It’s important to note that TPUs (Tensor Processing Units) are not part of this comparison, as they represent a distinct type of hardware accelerator designed by Google. When considering GPUs, the P100 is recommended for training tasks, while both the GPU P100 and GPU T4 can be utilized for inference purposes. Selecting the appropriate GPU depends on the specific requirements of the given machine learning task.[11].
The complete code for generating realistic human faces using GAN, and what did it do?
The following is the complete Python code to automatically generate realistic human faces using GAN:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 |
import numpy as np # Linear algebra import pandas as pd # Data processing, CSV file I/O (e.g. pd.read_csv) import os from matplotlib import pyplot as plt from tqdm import tqdm from PIL import Image as Img from keras import Input from keras.layers import Dense, Reshape, LeakyReLU, Conv2D, Conv2DTranspose, Flatten, Dropout from keras.models import Model from keras.optimizers import RMSprop for dirname, _, filenames in os.walk('/kaggle/input'): for filename in filenames: print(os.path.join(dirname, filename)) # Load the data and resize the images PIC_DIR = f'../input/celeba-dataset/img_align_celeba/img_align_celeba/' IMAGES_COUNT = 10000 ORIG_WIDTH = 178 ORIG_HEIGHT = 208 diff = (ORIG_HEIGHT - ORIG_WIDTH) // 2 WIDTH = 128 HEIGHT = 128 crop_rect = (0, diff, ORIG_WIDTH, ORIG_HEIGHT - diff) images = [] for pic_file in tqdm(os.listdir(PIC_DIR)[:IMAGES_COUNT]): pic = Image.open(PIC_DIR + pic_file).crop(crop_rect) pic.thumbnail((WIDTH, HEIGHT), Image.ANTIALIAS) images.append(np.uint8(pic)) #Image shape images = np.array(images) / 255 print(images.shape) #Display first 25 images plt.figure(1, figsize=(10, 10)) for i in range(25): plt.subplot(5, 5, i+1) plt.imshow(images[i]) plt.axis('off') plt.show() # Create Generator LATENT_DIM = 32 CHANNELS = 3 def create_generator(): gen_input = Input(shape=(LATENT_DIM, )) x = Dense(128 * 16 * 16)(gen_input) x = LeakyReLU()(x) x = Reshape((16, 16, 128))(x) x = Conv2D(256, 5, padding='same')(x) x = LeakyReLU()(x) x = Conv2DTranspose(256, 4, strides=2, padding='same')(x) x = LeakyReLU()(x) x = Conv2DTranspose(256, 4, strides=2, padding='same')(x) x = LeakyReLU()(x) x = Conv2DTranspose(256, 4, strides=2, padding='same')(x) x = LeakyReLU()(x) x = Conv2D(512, 5, padding='same')(x) x = LeakyReLU()(x) x = Conv2D(512, 5, padding='same')(x) x = LeakyReLU()(x) x = Conv2D(CHANNELS, 7, activation='tanh', padding='same')(x) generator = Model(gen_input, x) return generator # Create Discriminator def create_discriminator(): disc_input = Input(shape=(HEIGHT, WIDTH, CHANNELS)) x = Conv2D(256, 3)(disc_input) x = LeakyReLU()(x) x = Conv2D(256, 4, strides=2)(x) x = LeakyReLU()(x) x = Conv2D(256, 4, strides=2)(x) x = LeakyReLU()(x) x = Conv2D(256, 4, strides=2)(x) x = LeakyReLU()(x) x = Conv2D(256, 4, strides=2)(x) x = LeakyReLU()(x) x = Flatten()(x) x = Dropout(0.4)(x) x = Dense(1, activation='sigmoid')(x) discriminator = Model(disc_input, x) optimizer = RMSprop( lr=.0001, clipvalue=1.0, decay=1e-8 ) discriminator.compile( optimizer=optimizer, loss='binary_crossentropy' ) return discriminator # Define a GAN Model from IPython.display import Image from keras.utils.vis_utils import model_to_dot generator = create_generator() generator.summary() Image(model_to_dot(generator, show_shapes=True).create_png()) discriminator = create_discriminator() discriminator.trainable = False discriminator.summary() Image(model_to_dot(discriminator, show_shapes=True).create_png()) gan_input = Input(shape=(LATENT_DIM, )) gan_output = discriminator(generator(gan_input)) gan = Model(gan_input, gan_output) optimizer = RMSprop(lr=.0001, clipvalue=1.0, decay=1e-8) gan.compile(optimizer=optimizer, loss='binary_crossentropy') gan.summary() # Training the GAN model import time iters = 15000 batch_size = 16 RES_DIR = 'res2' FILE_PATH = '%s/generated_%d.png' if not os.path.isdir(RES_DIR): os.mkdir(RES_DIR) CONTROL_SIZE_SQRT = 6 control_vectors = np.random.normal(size=(CONTROL_SIZE_SQRT**2, LATENT_DIM)) / 2 start = 0 d_losses = [] a_losses = [] images_saved = 0 for step in range(iters): start_time = time.time() latent_vectors = np.random.normal(size=(batch_size, LATENT_DIM)) generated = generator.predict(latent_vectors) real = images[start:start + batch_size] combined_images = np.concatenate([generated, real]) labels = np.concatenate([np.ones((batch_size, 1)), np.zeros((batch_size, 1))]) labels += .05 * np.random.random(labels.shape) d_loss = discriminator.train_on_batch(combined_images, labels) d_losses.append(d_loss) latent_vectors = np.random.normal(size=(batch_size, LATENT_DIM)) misleading_targets = np.zeros((batch_size, 1)) a_loss = gan.train_on_batch(latent_vectors, misleading_targets) a_losses.append(a_loss) start += batch_size if start > images.shape[0] - batch_size: start = 0 if step % 50 == 49: gan.save_weights('/gan.h5') print('%d/%d: d_loss: %.4f, a_loss: %.4f. (%.1f sec)' % (step + 1, iters, d_loss, a_loss, time.time() - start_time)) control_image = np.zeros((WIDTH * CONTROL_SIZE_SQRT, HEIGHT * CONTROL_SIZE_SQRT, CHANNELS)) control_generated = generator.predict(control_vectors) for i in range(CONTROL_SIZE_SQRT ** 2): x_off = i % CONTROL_SIZE_SQRT y_off = i // CONTROL_SIZE_SQRT control_image[x_off * WIDTH:(x_off + 1) * WIDTH, y_off * HEIGHT:(y_off + 1) * HEIGHT, :] = control_generated[i, :, :, :] im = Img.fromarray(np.uint8(control_image * 255))#.save(StringIO(), 'jpeg') im.save(FILE_PATH % (RES_DIR, images_saved)) images_saved += 1 plt.figure(1, figsize=(12, 8)) plt.subplot(121) plt.plot(d_losses, color='red') plt.xlabel('epochs') plt.ylabel('discriminant losses') plt.subplot(122) plt.plot(a_losses) plt.xlabel('epochs') plt.ylabel('adversary losses') plt.show() |
To execute the code provided seamlessly, without any errors, we can utilize the PyScripter IDE.
Let’s break down the important parts of the code above:
1. Importing necessary libraries for:
- numerical operations (
numpy
), - data processing (
pandas
), - plotting (
matplotlib
), tqdm
for progress bars, and- components from
keras
for building a Generative Adversarial Network (GAN).
2. Walking through the Kaggle input directory and printing the filenames.
3. Setting up parameters for loading and resizing images from the CelebA dataset.
Cropping and resizing images to a specified size.
4. Converting the list of images to a NumPy array
and normalizing pixel values to the range [0, 1]
.
5. Displaying the first 25 images from the dataset.
6. Defining the generator model architecture using Keras.
7. Defining the discriminator model architecture using Keras.
8. Creating an instance of the generator and displaying its summary.
9. Creating an instance of the discriminator and displaying its summary.
Setting trainable
to False
to prevent it from being trained during the GAN training phase.
10. Creating the GAN model by connecting the generator
and discriminator
.
Compiling the GAN model with a binary cross-entropy loss.
11. Setting up parameters for training the GAN model and creating directories for saving generated images.
12. Training the GAN model,
- saving weights periodically, and plotting the losses during training.
- Images are also generated and saved for visualization.
Selected outputs:
Load and resize CelebA dataset:
Show image shape:
Display first 25 images:
Generator summary:
Generator scheme (as generated automatically using model_to_dot
function from keras.utils.vis_utils
):
Discriminator summary:
Discriminator scheme (also generated automatically using model_to_dot
function from keras.utils.vis_utils
):
GAN summary:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
_________________________________________________________________ Model: "model_2" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_3 (InputLayer) [(None, 32)] 0 _________________________________________________________________ model (Functional) (None, 128, 128, 3) 14953987 _________________________________________________________________ model_1 (Functional) (None, 1) 4211713 ================================================================= Total params: 19,165,700 Trainable params: 14,953,987 Non-trainable params: 4,211,713 _________________________________________________________________ |
After the step above, further training and output production would require a sufficient GPU, so, for the next steps, I move on using Kaggle (with GPU P100 as accelerator).
Here is the screenshot of the last step that can be done without GPU using PyScripter IDE (if you have your own GPU, you can continue run the code on your PyScripter IDE seamlessly):
Plot of Discriminant and Adversary losses:
Quoting Reference [3] to help in interpreting the plot of Discriminant and Adversary losses: “GAN convergence is hard to identify. As the generator improves with training, the discriminator performance gets worse because the discriminator can’t easily tell the difference between real and fake. If the generator succeeds perfectly, then the discriminator has a 50% accuracy. In effect, the discriminator flips a coin to make its prediction.
This progression poses a problem for convergence of the GAN as a whole: the discriminator feedback gets less meaningful over time. If the GAN continues training past the point when the discriminator is giving completely random feedback, then the generator starts to train on junk feedback, and its quality may collapse.”
After 300
epochs, the code would produce the following realistic human faces:
- On Kaggle
/output
:
- On my computer:
Image output for the first epoch (0
) vs the 300th epoch (299
):
To see the progression between epoch (from 0
to 299
) more clearly, please see the following video:
Out of curiosity, I retrain the model until epoch 599
, which you can see the results (from epoch 300
to 599
) in the following video:
Conclusion
GANs are a powerful and versatile class of deep generative models that can produce realistic and diverse data, such as images, text, audio, and video, from random noise. You also learned how GANs can generate realistic and diverse data using text descriptions as demonstrated by DALL-E.
This article has highlighted and demonstrated the potential use of deep learning, specifically within the context of the GANs architecture in the domain of numerical mathematics (approximate a plot of a function), generating handwritten digits, and generating realistic human faces. All are implemented with hands-on Python examples.
I hope this article was successful in giving you a comprehensive and accessible introduction to GANs, and a solid understanding and workflow of how to implement GANs to your domains and project goals, so, it would inspire you to learn more and experiment with GANs yourself.
Check out the full repository here:
github.com/Embarcadero/DL_Python05_GAN
Click here to get started with PyScripter, a free, feature-rich, and lightweight Python IDE.
Download RAD Studio to create more powerful Python GUI Windows Apps in 5x less time.
Check out Python4Delphi, which makes it simple to create Python GUIs for Windows using Delphi.
Also, look into DelphiVCL, which makes it simple to create Windows GUIs with Python.
References & further readings
[1] Biswal, A. (2023).
Top 10 Deep Learning Algorithms You Should Know in 2023. Simplilearn. simplilearn.com/tutorials/deep-learning-tutorial/deep-learning-algorithm.
[2] Candido, R. (2021).
Generative Adversarial Networks: Build Your First Models. Real Python. realpython.com/generative-adversarial-networks.
[3] Chauhan, N. S. (2021).
Generate Realistic Human Face using GAN. Kaggle. kaggle.com/code/nageshsingh/generate-realistic-human-face-using-gan.
[4] Google for Developers. (2024).
Generative Adversarial Networks. Advanced courses, machine learning, Google for Developers. developers.google.com/machine-learning/gan.
[5] LeCun, Y. (1998).
The MNIST database of handwritten digits. yann.lecun.com/exdb/mnist.
[6] Liu, Z., Luo, P., Wang, X., & Tang, X. (2018).
Large-scale celebfaces attributes (celeba) dataset. Retrieved August, 15(2018), 11. mmlab.ie.cuhk.edu.hk/projects/CelebA.html.
[7] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … & Bengio, Y. (2014).
Generative adversarial nets. Advances in neural information processing systems, 27.
[8] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … & Bengio, Y. (2020).
Generative adversarial networks. Communications of the ACM, 63(11), 139-144.
[9] Ramesh, A., Pavlov, M., Goh, G., Gray, S., Chen, M., Child, R., Misra, V., Mishkin, P., Krueger, G., Agarwal, S., & Sutskever, I. (2015-2023).
DALL·E: Creating images from text. DALL-E, OpenAI research. openai.com/research/dall-e.
[10] Sagar, R. (2020).
Top Libraries For Quick Implementation Of GANs. Analytics India Magazine. analyticsindiamag.com/generative-adversarial-networks-python-libraries.
[11] Siddhartha. (2023).
GPU T4 vs GPU P100 | Kaggle | GPU. Medium. siddhartha01writes.medium.com/gpu-t4-vs-gpu-p100-kaggle-gpu-cd852d56022c.