CodeIDEProjectsPythonWindows

Unlock the Power of Python for Deep Learning with Generative Adversarial Networks (GANs) – The Engine behind DALL-E

Unlock the Power of Python for Deep Learning with Generative Adversarial Networks GANs The Engine behind DALL E

Deep learning algorithms work with almost any kind of data and require large amounts of computing power and information to solve complicated issues. Now, let us, deep-dive, into one of the most famous deep learning algorithms: Generative adversarial networks (GANs).

Generative adversarial networks (GANs) are an exciting (and relatively) recent innovation in machine learning and deep learning. GANs are generative deep learning algorithms that create new data instances that resemble the training data. GAN has two components: A generator, which learns to generate fake data, and a discriminator, which learns from that false information.

GANs are also the engine behind DALL-E, a recent breakthrough from OpenAI that can generate images from any text description.

If you are interested in learning how to use Python for deep learning with generative adversarial networks (GANs); which are a powerful technique for creating realistic and diverse synthetic data, this article is perfect for you. 

Before we begin, let’s see the remarkable ability of DALL-E (GAN) to generate a seemingly scientific photo of an atom:

first ever photo of an atom electron microscope by dall e
Prompt First ever photo of an atom electron microscope photo Created by Niko × DALLE

Table of Contents

What is Deep Learning?

Deep learning, a branch of machine learning, addresses intricate problems through the utilization of artificial neural networks. These networks consist of interconnected nodes organized in multiple layers, extracting features from input data. Extensive datasets are employed to train these models, enabling them to identify patterns and correlations that might be challenging or impossible for humans to perceive.

The impact of deep learning on artificial intelligence has been substantial. It has paved the way for the development of intelligent systems capable of independent learning, adaptation, and decision-making. Deep learning has led to remarkable advancements in various domains, encompassing image and speech recognition, natural language processing, machine translation, text generation, image generation (as would be reviewed in this article), autonomous driving, and numerous others.

thispersondoesnotexist GANs outputs
Examples of AI generated realistic human faces images using deep learning GAN that generated using thispersondoesnotexistcom

Why Python for Deep Learning?

Python has gained widespread popularity as a programming language due to its versatility and ease of use in diverse domains of computer science, especially in the field of deep learning. Thanks to its extensive range of libraries and frameworks specially tailored for deep learning, Python has emerged as a top choice among many machine learning professionals.

Python has emerged as the language of choice for deep learning, and here are some of the reasons why:

1. Simple to learn and use:

Python is a high-level programming language that is easy to learn and use, even for those who are new to programming. Its concise and uncomplicated syntax makes it easy to write and understand. This allows developers to concentrate on solving problems without worrying about the details of the language.

2. Abundant libraries and frameworks:

Python has a vast ecosystem of libraries and frameworks that cater specifically to deep learning. Some of these libraries include TensorFlow, PyTorch, Keras, and Theano. These libraries provide pre-built functions and modules that simplify the development process, reducing the need to write complex code from scratch.

3. Strong community support:

Python has a large and active community of developers contributing to its development, maintenance, and improvement. This community offers support and guidance to beginners, making it easier to learn and use Python for deep learning.

4. Platform independence:

Python is platform-independent, which means that code written on one platform can be easily executed on another platform without any modification. This makes it easier to deploy deep learning models on different platforms and devices.

5. Easy integration with other languages:

Python can be easily integrated with other programming languages, such as Delphi, C++, and Java, making it ideal for building complex systems that require integrating different technologies.

Overall, Python’s ease of use, an abundance of libraries and frameworks, strong community support, platform independence, and ease of integration with other languages make it an indispensable tool for machine learning practitioners. Its popularity continues to soar as a result.

What is DALL-E? A revolutionary image generation model

DALL-E is a remarkable GAN model that can generate images from text descriptions (even an intricate one), such as “a cat wearing a bow tie” or “a painting of a landscape in the style of Van Gogh”. It is based on a large-scale dataset of text-image pairs, and a transformer architecture that can encode both text and image modalities.

Example of DALL E generated images
Prompt a cat wearing a bow tie
Example of Van Gogh style AI generated images
Prompt a painting of a landscape in the style of Van Gogh

DALL-E can create plausible and diverse images for a wide range of concepts, such as animals, objects, scenes, and transformations, and control their attributes, viewpoints, and perspectives. DALL-E can also combine multiple concepts, such as “an armchair in the shape of an avocado” or “a snail made of a harp”, and generate novel and creative images that do not exist in the real world. DALL-E demonstrates the power and potential of GANs for image synthesis and multimodal understanding.

Example of AI generated images
Prompt an armchair in the shape of an avocado
Example of AI generated images
Prompt a snail made of a harp

Applications beyond DALL-E: GANs in various domains

DALL-E is based on a large-scale dataset of text-image pairs, and a transformer architecture that can encode both text and image modalities. DALL-E demonstrates the power and potential of GANs for image synthesis and multimodal understanding.

However, DALL-E is not the only application of GANs. GANs have been successfully applied to various domains and tasks, such as computer vision, natural language generation, audio synthesis, video prediction, and more. Some of the examples of GAN applications are:

1. Image generation

GANs can generate realistic and diverse images of objects, scenes, faces, animals, and more, from random noise or text descriptions.

2. Image-to-image translation

GANs can transform images from one domain to another, such as changing the style, season, or content of the images.

3. Image enhancement

GANs can improve the quality and resolution of images, such as super-resolution, deblurring, denoising, inpainting, and colorization.

4. Text generation

GANs can generate realistic and diverse texts, such as stories, poems, reviews, captions, and more, from random noise or keywords.

5. Text-to-speech

GANs can synthesize natural and expressive speech from text, such as voice cloning, style transfer, and emotion modulation.

6. Speech enhancement

GANs can improve the quality and intelligibility of speech, such as noise reduction, dereverberation, and bandwidth extension.

7. Video generation

GANs can generate realistic and diverse videos, such as animations, simulations, and future predictions, from random noise or text descriptions.

8. Video-to-video translation

GANs can transform videos from one domain to another, such as changing the style, content, or viewpoint of the videos.

What is Generative Adversarial Networks (GANs)?

Generative Adversarial Networks (GANs) are a breakthrough innovation in deep learning that can generate realistic and diverse data from random noise or text descriptions. GANs have many applications in various domains, such as computer vision, natural language generation, audio synthesis, and more. GANs can also enable creativity, accessibility, and fairness by generating novel and inclusive data that do not exist in the real world.

GANs consist of two neural networks that compete with each other in a game-like scenario: 

1. Discriminator

A discriminator that tries to distinguish between real and fake data.

More formally, given a set of data instances X and a set of labels Y: Discriminative models capture the conditional probability p(Y | X).

How an AI discriminator works
Backpropagation in discriminator training Image source Reference 4

Illustration of the discriminative model in the handwritten digits generation use cases (we will explore the hands-on of it in the next sections):

Training samples
Image source Reference 2

2. Generator

A generator that tries to create fake data.

More formally, given a set of data instances X and a set of labels Y: Generative models capture the joint probability p(X, Y), or just p(X) if there are no labels.

00 gandiagram generator 2079912
Backpropagation in generator testing Image source Reference 4

Illustration of generative model in the handwritten digits generation use cases (we will explore the hands-on of it in the next sections):

00 figgenerative 9731713
Image source Reference 2

The discriminator and the generator are trained simultaneously, in an adversarial manner, until they reach an equilibrium, where the generator can fool the discriminator about half the time. GANs can learn to produce high-quality and diverse data, such as images, text, audio, and video, by leveraging large-scale datasets and advanced network architectures.

Here’s a picture of the whole GAN system:

00 gandiagram wholesystem 8931350
Image source Reference 4

What are Python tools and libraries needed for GAN development?

Python is one of the most popular and widely used programming languages for machine learning and artificial intelligence, especially for developing generative adversarial networks (GANs). Python offers a rich set of tools and libraries that can help you implement and train GAN models with ease and efficiency. 

Some of the most useful and popular Python tools and libraries for GAN development are:

1. PyTorch

PyTorch is an open-source deep learning framework that provides a flexible and dynamic way of building and running GAN models. PyTorch supports automatic differentiation, GPU acceleration, distributed training, and various GAN architectures and loss functions. PyTorch also has a large and active community that contributes to the development and improvement of the framework[10].

2. TensorFlow

TensorFlow is another open-source deep learning framework that offers a comprehensive and scalable platform for building and deploying GAN models. TensorFlow supports eager execution, graph optimization, tensor operations, and various GAN architectures and loss functions. TensorFlow also has a high-level API called Keras, which simplifies the process of creating and training GAN models[10].

3. PyGAN

PyGAN is a Python library that implements GANs and its variants, such as conditional GANs, adversarial auto-encoders, and energy-based GANs. PyGAN allows you to design generative models based on statistical machine learning problems and optimize them using various algorithms and metrics[10].

4. TorchGAN

TorchGAN is a Python library that provides a collection of GAN models, loss functions, and evaluation metrics, built on top of PyTorch. TorchGAN enables you to easily create and customize your own GAN models, as well as reproduce the results of existing GAN papers[10].

5. VeGANs

VeGANs is another Python library that provides a variety of GAN models, loss functions, and evaluation metrics, built on top of PyTorch. VeGANs aims to make GAN development accessible and user-friendly, by offering a simple and consistent interface, as well as tutorials and examples[10].

Without further ado, let’s get our hands dirty with the hands-on GANs with Python, with three different use cases: Numerical mathematics (approximate a plot of a sine function), generating handwritten digits, and generating realistic human faces.

Hands-On GAN 1: Generate random numbers using GAN, to approximate sine plot

In this section, we will explore how GANs can be used to generate data that follows a simple sine function, between interval 0 and . We will implement a GAN using PyTorch and show how the generator and the discriminator networks interact and improve over time. We will also demonstrate the results of our GAN by comparing the generated data with the original sine function data.

The following is the complete Python code to automatically generate random numbers using GAN, to approximate sine plot:

To execute the code above seamlessly without any errors, we can utilize the PyScripter IDE.

What did the code above do?

Let’s break down the important parts of the code above:

1. Data generation:

  • train_data_length specifies the number of data points to be generated.
  • train_data is a tensor of shape (train_data_length, 2) where the first column represents random values between 0 and , and the second column is the sine of the first column.
  • train_labels is a tensor of zeros.
  • train_set is a list of tuples, each containing a data point and its corresponding label.

2. Data visualization:

  • The generated training data is plotted using matplotlib.

3. Data loader:

  • batch_size is set to 32.
  • train_loader is a PyTorch data loader that shuffles and batches the training data.

4. Discriminator model:

  • The Discriminator class is defined as a subclass of nn.Module.
  • It consists of a feedforward neural network with layers of sizes 2 (input) → 256128641, followed by a Sigmoid activation.
  • Dropout layers with a dropout probability of 0.3 are added for regularization.

5. Generator model:

  • The Generator class is defined, similar to the Discriminator.
  • It is a neural network with layers of sizes 2 (input) → 16322.

6. Model initialization:

  • Instances of the Discriminator and Generator classes are created.

7. Training configuration:

  • Learning rate (lr) is set to 0.001.
  • num_epochs is set to 300.
  • Binary Cross Entropy Loss (nn.BCELoss()) is used as the loss function.

8. Optimizer initialization:

  • Adam optimizers are created for both the discriminator and generator.

9. Training loop:

  • The code runs a training loop for the specified number of epochs.
  • For each epoch, it iterates through batches of data from the train_loader.
  • For the discriminator:
    • Real and generated samples are combined.
    • The discriminator is trained to distinguish between real and generated samples.
  • For the generator:
    • The generator is trained to generate samples that the discriminator classifies as real.
  • Losses for the discriminator and generator are printed every 10 epochs.

10. Generated samples visualization:

  • After training, 100 samples are generated using the trained generator, and they are plotted.

In summary, the code above implements a simple Generative Adversarial Network (GAN) where the generator and discriminator are trained adversarially to generate realistic samples. The generator generates fake samples to try and fool the discriminator, while the discriminator learns to distinguish between real and fake samples.

Here are a few selected outputs from all the process above:

Selected outputs:

Examine the training data by plotting each point (x₁, x₂):

Generative AI example in Python

Plot the generated samples. We show you the screenshot of the plotting results in epoch 300, which almost perfectly resemble the sine plot:

Generative AI example in Python

To see the progression between epoch (from 0 to 300) more clearly, please watch the following video:

Hands-On GAN 2: Generate handwritten digits using GAN

In this section, we will explore how GANs can be used to generate realistic images of handwritten digits. For that, you’ll train the models using the MNIST dataset of handwritten digits, which is included in the torchvision package. We will implement a GAN using PyTorch and show how the generator will produce fake images and the discriminator will try to tell them apart.

The following is the complete Python code to automatically generate handwritten digits using GAN:

To execute the code provided seamlessly, without any errors, we can utilize the PyScripter IDE.

What did the code above do?

Let’s break down the important parts of the code above:

1. Importing necessary libraries

  • torch for PyTorch, 
  • nn for neural network modules, 
  • optim for optimizers, 
  • torchvision for handling datasets like MNIST, 
  • transforms for data transformations, 
  • math for mathematical functions, 
  • matplotlib.pyplot for plotting, and 
  • os for operating system related functions.

2. Checking if a CUDA-enabled GPU is available and setting the device accordingly.

3. Defining a data transformation pipeline using transforms.Compose

It converts the images to PyTorch tensors and normalizes them.

4. Loading the MNIST dataset for training, 

specifying the root directory, setting it for training, downloading it if not available, and applying the defined transformation.

5. Creating a PyTorch data loader to handle batching and shuffling of the training data.

6. Plotting 16 real samples from the MNIST dataset using matplotlib.

7. Defining the Discriminator class, 

which is a neural network with several fully connected layers, ReLU activations, and Dropout layers. The final layer has a Sigmoid activation.

8. Implementing the forward method for the Discriminator class

and creating an instance of the Discriminator class, moving it to the specified device.

9. Defining the Generator class, 

which is another neural network with fully connected layers, ReLU activations, and a hyperbolic tangent (Tanh) activation.

10. Implementing the forward method for the Generator class

and creating an instance of the Generator class, moving it to the specified device.

11. Setting hyperparameters: 

  • learning rate (lr)
  • number of epochs (num_epochs), and using 
  • Binary Cross Entropy Loss (nn.BCELoss())
  • Initializing Adam optimizers for both the discriminator and generator.

12. Generating random samples in the latent space for visualization.

13. Loading pre-trained models if available, 

otherwise training the models for a specified number of epochs.

14. Generating and plotting 16 samples from the generator.

15. Saving the trained model parameters for future use.

Here are a few selected outputs from all the process above:

Selected outputs:

Download and extract the dataset:

Generative AI example in Python

Train the model with seed=111:

Generative AI example in Python

The following is the visualization of the excerpt of the MNIST dataset:

MINIST example set

vs the results of generated handwriting by GAN in epoch 50:

Plot using GAIN

To see the progression between epoch (from 0 to 50) more clearly, please see the following video:

Hands-On 3: Generate realistic human faces using GAN

In this section, we will learn how to generate realistic human faces using GANs. The GAN consists of two competing networks: A generator that creates fake images from random noise, and a discriminator that distinguishes real images from fake ones. We will use a large dataset of celebrity images to train our GAN and produce high-quality and diverse faces. However, as this task consumes large computational power, we will perform it using Kaggle’s GPU, while we will also show you the limitations and challenges of using a regular laptop.

Introduction to Kaggle’s GPU Options: P100 vs. T4

Kaggle offers its users a 30-hour weekly time cap for GPU access, allowing them to choose between NVIDIA T4 and P100 GPUs. However, many Kaggle users may lack clarity on which GPU is best suited for their specific needs.

In general, the T4 GPU is an optimal choice for inference workloads that demand high throughput and low power consumption. On the other hand, the P100 GPU excels in handling training workloads, thanks to its superior performance and increased memory capacity.[11].

It’s important to note that TPUs (Tensor Processing Units) are not part of this comparison, as they represent a distinct type of hardware accelerator designed by Google. When considering GPUs, the P100 is recommended for training tasks, while both the GPU P100 and GPU T4 can be utilized for inference purposes. Selecting the appropriate GPU depends on the specific requirements of the given machine learning task.[11].

The complete code for generating realistic human faces using GAN, and what did it do?

The following is the complete Python code to automatically generate realistic human faces using GAN:

To execute the code provided seamlessly, without any errors, we can utilize the PyScripter IDE.

Let’s break down the important parts of the code above:

1. Importing necessary libraries for:

  • numerical operations (numpy), 
  • data processing (pandas), 
  • plotting (matplotlib), 
  • tqdm for progress bars, and
  • components from keras for building a Generative Adversarial Network (GAN).

2. Walking through the Kaggle input directory and printing the filenames.

3. Setting up parameters for loading and resizing images from the CelebA dataset.

Cropping and resizing images to a specified size.

4. Converting the list of images to a NumPy array 

and normalizing pixel values to the range [0, 1].

5. Displaying the first 25 images from the dataset.

6. Defining the generator model architecture using Keras.

7. Defining the discriminator model architecture using Keras.

8. Creating an instance of the generator and displaying its summary.

9. Creating an instance of the discriminator and displaying its summary. 

Setting trainable to False to prevent it from being trained during the GAN training phase.

10. Creating the GAN model by connecting the generator and discriminator

Compiling the GAN model with a binary cross-entropy loss.

11. Setting up parameters for training the GAN model and creating directories for saving generated images.

12. Training the GAN model, 

  • saving weights periodically, and plotting the losses during training. 
  • Images are also generated and saved for visualization.

Selected outputs:

Load and resize CelebA dataset:

Generative AI example in Python

Show image shape:

output07 printimageshape 2602965

Display first 25 images:

Generative AI example in Python

Generator summary:

Generative AI example in Python 3048847

Generator scheme (as generated automatically using model_to_dot function from keras.utils.vis_utils):

output14 kaggle defineganmodel generator 7626898

Discriminator summary:

Generative AI example in Python

Discriminator scheme (also generated automatically using model_to_dot function from keras.utils.vis_utils):

output15 kaggle defineganmodel discriminator 7280900

GAN summary:

After the step above, further training and output production would require a sufficient GPU, so, for the next steps, I move on using Kaggle (with GPU P100 as accelerator). 

Here is the screenshot of the last step that can be done without GPU using PyScripter IDE (if you have your own GPU, you can continue run the code on your PyScripter IDE seamlessly):

Generative AI example in Python

Plot of Discriminant and Adversary losses:

Generative AI example in Python

Quoting Reference [3] to help in interpreting the plot of Discriminant and Adversary losses: “GAN convergence is hard to identify. As the generator improves with training, the discriminator performance gets worse because the discriminator can’t easily tell the difference between real and fake. If the generator succeeds perfectly, then the discriminator has a 50% accuracy. In effect, the discriminator flips a coin to make its prediction.

This progression poses a problem for convergence of the GAN as a whole: the discriminator feedback gets less meaningful over time. If the GAN continues training past the point when the discriminator is giving completely random feedback, then the generator starts to train on junk feedback, and its quality may collapse.”

After 300 epochs, the code would produce the following realistic human faces:

  • On Kaggle /output:
Generative AI example in Python
  • On my computer:
Generative AI example in Python

Image output for the first epoch (0) vs the 300th epoch (299):

Generative AI example in Python
Generative AI example in Python

To see the progression between epoch (from 0 to 299) more clearly, please see the following video:

Out of curiosity, I retrain the model until epoch 599, which you can see the results (from epoch 300 to 599) in the following video:

Conclusion

GANs are a powerful and versatile class of deep generative models that can produce realistic and diverse data, such as images, text, audio, and video, from random noise. You also learned how GANs can generate realistic and diverse data using text descriptions as demonstrated by DALL-E.

This article has highlighted and demonstrated the potential use of deep learning, specifically within the context of the GANs architecture in the domain of numerical mathematics (approximate a plot of a function), generating handwritten digits, and generating realistic human faces. All are implemented with hands-on Python examples.

I hope this article was successful in giving you a comprehensive and accessible introduction to GANs, and a solid understanding and workflow of how to implement GANs to your domains and project goals, so, it would inspire you to learn more and experiment with GANs yourself.

Check out the full repository here:

github.com/Embarcadero/DL_Python05_GAN


Click here to get started with PyScripter, a free, feature-rich, and lightweight Python IDE.

Download RAD Studio to create more powerful Python GUI Windows Apps in 5x less time.

Check out Python4Delphi, which makes it simple to create Python GUIs for Windows using Delphi.

Also, look into DelphiVCL, which makes it simple to create Windows GUIs with Python.


References & further readings

[1] Biswal, A. (2023).

Top 10 Deep Learning Algorithms You Should Know in 2023. Simplilearn. simplilearn.com/tutorials/deep-learning-tutorial/deep-learning-algorithm.

[2] Candido, R. (2021).

Generative Adversarial Networks: Build Your First Models. Real Python. realpython.com/generative-adversarial-networks.

[3] Chauhan, N. S. (2021).

Generate Realistic Human Face using GAN. Kaggle. kaggle.com/code/nageshsingh/generate-realistic-human-face-using-gan.

[4] Google for Developers. (2024).

Generative Adversarial Networks. Advanced courses, machine learning, Google for Developers. developers.google.com/machine-learning/gan.

[5] LeCun, Y. (1998).

The MNIST database of handwritten digits. yann.lecun.com/exdb/mnist.

[6] Liu, Z., Luo, P., Wang, X., & Tang, X. (2018).

Large-scale celebfaces attributes (celeba) dataset. Retrieved August, 15(2018), 11. mmlab.ie.cuhk.edu.hk/projects/CelebA.html.

[7] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … & Bengio, Y. (2014).

Generative adversarial nets. Advances in neural information processing systems, 27.

[8] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … & Bengio, Y. (2020).

Generative adversarial networks. Communications of the ACM, 63(11), 139-144.

[9] Ramesh, A., Pavlov, M., Goh, G., Gray, S., Chen, M., Child, R., Misra, V., Mishkin, P., Krueger, G., Agarwal, S., & Sutskever, I. (2015-2023).

DALL·E: Creating images from text. DALL-E, OpenAI research. openai.com/research/dall-e.

[10] Sagar, R. (2020).

Top Libraries For Quick Implementation Of GANs. Analytics India Magazine. analyticsindiamag.com/generative-adversarial-networks-python-libraries.

[11] Siddhartha. (2023).

GPU T4 vs GPU P100 | Kaggle | GPU. Medium. siddhartha01writes.medium.com/gpu-t4-vs-gpu-p100-kaggle-gpu-cd852d56022c.

Related posts
CodeIDEProjectsPythonWindows

Unlock the Power of Python for Deep Learning with Diffusion Model - The Engine behind Stable Diffusion

CodeIDELearn PythonPythonPython GUITkinter

How To Make More Than 20 ChatGPT Prompts Work With Python GUI Builders And OpenCV Library?

CodeIDEProjectsPythonWindows

Unlock the Power of Python for Deep Learning with Radial Basis Function Networks (RBFNs)

CodeIDELearn PythonPythonPython GUITkinter

How To Make More Than 20 ChatGPT Prompts Work With Python GUI Builders And NumPy Library?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.