Site icon Python GUI

Unlock the Power of Python for Deep Learning with Generative Adversarial Networks (GANs) – The Engine behind DALL-E

Unlock the Power of Python for Deep Learning with Generative Adversarial Networks GANs The Engine behind DALL E

Deep learning algorithms work with almost any kind of data and require large amounts of computing power and information to solve complicated issues. Now, let us, deep-dive, into one of the most famous deep learning algorithms: Generative adversarial networks (GANs).

Generative adversarial networks (GANs) are an exciting (and relatively) recent innovation in machine learning and deep learning. GANs are generative deep learning algorithms that create new data instances that resemble the training data. GAN has two components: A generator, which learns to generate fake data, and a discriminator, which learns from that false information.

GANs are also the engine behind DALL-E, a recent breakthrough from OpenAI that can generate images from any text description.

If you are interested in learning how to use Python for deep learning with generative adversarial networks (GANs); which are a powerful technique for creating realistic and diverse synthetic data, this article is perfect for you. 

Before we begin, let’s see the remarkable ability of DALL-E (GAN) to generate a seemingly scientific photo of an atom:

Prompt First ever photo of an atom electron microscope photo Created by Niko × DALLE

Table of Contents

What is Deep Learning?

Deep learning, a branch of machine learning, addresses intricate problems through the utilization of artificial neural networks. These networks consist of interconnected nodes organized in multiple layers, extracting features from input data. Extensive datasets are employed to train these models, enabling them to identify patterns and correlations that might be challenging or impossible for humans to perceive.

The impact of deep learning on artificial intelligence has been substantial. It has paved the way for the development of intelligent systems capable of independent learning, adaptation, and decision-making. Deep learning has led to remarkable advancements in various domains, encompassing image and speech recognition, natural language processing, machine translation, text generation, image generation (as would be reviewed in this article), autonomous driving, and numerous others.

Examples of AI generated realistic human faces images using deep learning GAN that generated using thispersondoesnotexistcom

Why Python for Deep Learning?

Python has gained widespread popularity as a programming language due to its versatility and ease of use in diverse domains of computer science, especially in the field of deep learning. Thanks to its extensive range of libraries and frameworks specially tailored for deep learning, Python has emerged as a top choice among many machine learning professionals.

Python has emerged as the language of choice for deep learning, and here are some of the reasons why:

1. Simple to learn and use:

Python is a high-level programming language that is easy to learn and use, even for those who are new to programming. Its concise and uncomplicated syntax makes it easy to write and understand. This allows developers to concentrate on solving problems without worrying about the details of the language.

2. Abundant libraries and frameworks:

Python has a vast ecosystem of libraries and frameworks that cater specifically to deep learning. Some of these libraries include TensorFlow, PyTorch, Keras, and Theano. These libraries provide pre-built functions and modules that simplify the development process, reducing the need to write complex code from scratch.

3. Strong community support:

Python has a large and active community of developers contributing to its development, maintenance, and improvement. This community offers support and guidance to beginners, making it easier to learn and use Python for deep learning.

4. Platform independence:

Python is platform-independent, which means that code written on one platform can be easily executed on another platform without any modification. This makes it easier to deploy deep learning models on different platforms and devices.

5. Easy integration with other languages:

Python can be easily integrated with other programming languages, such as Delphi, C++, and Java, making it ideal for building complex systems that require integrating different technologies.

Overall, Python’s ease of use, an abundance of libraries and frameworks, strong community support, platform independence, and ease of integration with other languages make it an indispensable tool for machine learning practitioners. Its popularity continues to soar as a result.

What is DALL-E? A revolutionary image generation model

DALL-E is a remarkable GAN model that can generate images from text descriptions (even an intricate one), such as “a cat wearing a bow tie” or “a painting of a landscape in the style of Van Gogh”. It is based on a large-scale dataset of text-image pairs, and a transformer architecture that can encode both text and image modalities.

Prompt a cat wearing a bow tie
Prompt a painting of a landscape in the style of Van Gogh

DALL-E can create plausible and diverse images for a wide range of concepts, such as animals, objects, scenes, and transformations, and control their attributes, viewpoints, and perspectives. DALL-E can also combine multiple concepts, such as “an armchair in the shape of an avocado” or “a snail made of a harp”, and generate novel and creative images that do not exist in the real world. DALL-E demonstrates the power and potential of GANs for image synthesis and multimodal understanding.

Prompt an armchair in the shape of an avocado
Prompt a snail made of a harp

Applications beyond DALL-E: GANs in various domains

DALL-E is based on a large-scale dataset of text-image pairs, and a transformer architecture that can encode both text and image modalities. DALL-E demonstrates the power and potential of GANs for image synthesis and multimodal understanding.

However, DALL-E is not the only application of GANs. GANs have been successfully applied to various domains and tasks, such as computer vision, natural language generation, audio synthesis, video prediction, and more. Some of the examples of GAN applications are:

1. Image generation

GANs can generate realistic and diverse images of objects, scenes, faces, animals, and more, from random noise or text descriptions.

2. Image-to-image translation

GANs can transform images from one domain to another, such as changing the style, season, or content of the images.

3. Image enhancement

GANs can improve the quality and resolution of images, such as super-resolution, deblurring, denoising, inpainting, and colorization.

4. Text generation

GANs can generate realistic and diverse texts, such as stories, poems, reviews, captions, and more, from random noise or keywords.

5. Text-to-speech

GANs can synthesize natural and expressive speech from text, such as voice cloning, style transfer, and emotion modulation.

6. Speech enhancement

GANs can improve the quality and intelligibility of speech, such as noise reduction, dereverberation, and bandwidth extension.

7. Video generation

GANs can generate realistic and diverse videos, such as animations, simulations, and future predictions, from random noise or text descriptions.

8. Video-to-video translation

GANs can transform videos from one domain to another, such as changing the style, content, or viewpoint of the videos.

What is Generative Adversarial Networks (GANs)?

Generative Adversarial Networks (GANs) are a breakthrough innovation in deep learning that can generate realistic and diverse data from random noise or text descriptions. GANs have many applications in various domains, such as computer vision, natural language generation, audio synthesis, and more. GANs can also enable creativity, accessibility, and fairness by generating novel and inclusive data that do not exist in the real world.

GANs consist of two neural networks that compete with each other in a game-like scenario: 

1. Discriminator

A discriminator that tries to distinguish between real and fake data.

More formally, given a set of data instances X and a set of labels Y: Discriminative models capture the conditional probability p(Y | X).

Backpropagation in discriminator training Image source Reference

Illustration of the discriminative model in the handwritten digits generation use cases (we will explore the hands-on of it in the next sections):

Image source Reference

2. Generator

A generator that tries to create fake data.

More formally, given a set of data instances X and a set of labels Y: Generative models capture the joint probability p(X, Y), or just p(X) if there are no labels.

Backpropagation in generator testing Image source Reference

Illustration of generative model in the handwritten digits generation use cases (we will explore the hands-on of it in the next sections):

Image source Reference

The discriminator and the generator are trained simultaneously, in an adversarial manner, until they reach an equilibrium, where the generator can fool the discriminator about half the time. GANs can learn to produce high-quality and diverse data, such as images, text, audio, and video, by leveraging large-scale datasets and advanced network architectures.

Here’s a picture of the whole GAN system:

Image source Reference

What are Python tools and libraries needed for GAN development?

Python is one of the most popular and widely used programming languages for machine learning and artificial intelligence, especially for developing generative adversarial networks (GANs). Python offers a rich set of tools and libraries that can help you implement and train GAN models with ease and efficiency. 

Some of the most useful and popular Python tools and libraries for GAN development are:

1. PyTorch

PyTorch is an open-source deep learning framework that provides a flexible and dynamic way of building and running GAN models. PyTorch supports automatic differentiation, GPU acceleration, distributed training, and various GAN architectures and loss functions. PyTorch also has a large and active community that contributes to the development and improvement of the framework[10].

2. TensorFlow

TensorFlow is another open-source deep learning framework that offers a comprehensive and scalable platform for building and deploying GAN models. TensorFlow supports eager execution, graph optimization, tensor operations, and various GAN architectures and loss functions. TensorFlow also has a high-level API called Keras, which simplifies the process of creating and training GAN models[10].

3. PyGAN

PyGAN is a Python library that implements GANs and its variants, such as conditional GANs, adversarial auto-encoders, and energy-based GANs. PyGAN allows you to design generative models based on statistical machine learning problems and optimize them using various algorithms and metrics[10].

4. TorchGAN

TorchGAN is a Python library that provides a collection of GAN models, loss functions, and evaluation metrics, built on top of PyTorch. TorchGAN enables you to easily create and customize your own GAN models, as well as reproduce the results of existing GAN papers[10].

5. VeGANs

VeGANs is another Python library that provides a variety of GAN models, loss functions, and evaluation metrics, built on top of PyTorch. VeGANs aims to make GAN development accessible and user-friendly, by offering a simple and consistent interface, as well as tutorials and examples[10].

Without further ado, let’s get our hands dirty with the hands-on GANs with Python, with three different use cases: Numerical mathematics (approximate a plot of a sine function), generating handwritten digits, and generating realistic human faces.

Hands-On GAN 1: Generate random numbers using GAN, to approximate sine plot

In this section, we will explore how GANs can be used to generate data that follows a simple sine function, between interval 0 and . We will implement a GAN using PyTorch and show how the generator and the discriminator networks interact and improve over time. We will also demonstrate the results of our GAN by comparing the generated data with the original sine function data.

The following is the complete Python code to automatically generate random numbers using GAN, to approximate sine plot:

[crayon-6645a97dd9aea510378036/]

To execute the code above seamlessly without any errors, we can utilize the PyScripter IDE.

What did the code above do?

Let’s break down the important parts of the code above:

1. Data generation:

2. Data visualization:

3. Data loader:

4. Discriminator model:

5. Generator model:

6. Model initialization:

7. Training configuration:

8. Optimizer initialization:

9. Training loop:

10. Generated samples visualization:

In summary, the code above implements a simple Generative Adversarial Network (GAN) where the generator and discriminator are trained adversarially to generate realistic samples. The generator generates fake samples to try and fool the discriminator, while the discriminator learns to distinguish between real and fake samples.

Here are a few selected outputs from all the process above:

Selected outputs:

Examine the training data by plotting each point (x₁, x₂):

Plot the generated samples. We show you the screenshot of the plotting results in epoch 300, which almost perfectly resemble the sine plot:

To see the progression between epoch (from 0 to 300) more clearly, please watch the following video:

Hands-On GAN 2: Generate handwritten digits using GAN

In this section, we will explore how GANs can be used to generate realistic images of handwritten digits. For that, you’ll train the models using the MNIST dataset of handwritten digits, which is included in the torchvision package. We will implement a GAN using PyTorch and show how the generator will produce fake images and the discriminator will try to tell them apart.

The following is the complete Python code to automatically generate handwritten digits using GAN:

[crayon-6645a97dd9aff963021327/]

To execute the code provided seamlessly, without any errors, we can utilize the PyScripter IDE.

What did the code above do?

Let’s break down the important parts of the code above:

1. Importing necessary libraries

2. Checking if a CUDA-enabled GPU is available and setting the device accordingly.

3. Defining a data transformation pipeline using transforms.Compose

It converts the images to PyTorch tensors and normalizes them.

4. Loading the MNIST dataset for training, 

specifying the root directory, setting it for training, downloading it if not available, and applying the defined transformation.

5. Creating a PyTorch data loader to handle batching and shuffling of the training data.

6. Plotting 16 real samples from the MNIST dataset using matplotlib.

7. Defining the Discriminator class, 

which is a neural network with several fully connected layers, ReLU activations, and Dropout layers. The final layer has a Sigmoid activation.

8. Implementing the forward method for the Discriminator class

and creating an instance of the Discriminator class, moving it to the specified device.

9. Defining the Generator class, 

which is another neural network with fully connected layers, ReLU activations, and a hyperbolic tangent (Tanh) activation.

10. Implementing the forward method for the Generator class

and creating an instance of the Generator class, moving it to the specified device.

11. Setting hyperparameters: 

12. Generating random samples in the latent space for visualization.

13. Loading pre-trained models if available, 

otherwise training the models for a specified number of epochs.

14. Generating and plotting 16 samples from the generator.

15. Saving the trained model parameters for future use.

Here are a few selected outputs from all the process above:

Selected outputs:

Download and extract the dataset:

Train the model with seed=111:

The following is the visualization of the excerpt of the MNIST dataset:

vs the results of generated handwriting by GAN in epoch 50:

To see the progression between epoch (from 0 to 50) more clearly, please see the following video:

Hands-On 3: Generate realistic human faces using GAN

In this section, we will learn how to generate realistic human faces using GANs. The GAN consists of two competing networks: A generator that creates fake images from random noise, and a discriminator that distinguishes real images from fake ones. We will use a large dataset of celebrity images to train our GAN and produce high-quality and diverse faces. However, as this task consumes large computational power, we will perform it using Kaggle’s GPU, while we will also show you the limitations and challenges of using a regular laptop.

Introduction to Kaggle’s GPU Options: P100 vs. T4

Kaggle offers its users a 30-hour weekly time cap for GPU access, allowing them to choose between NVIDIA T4 and P100 GPUs. However, many Kaggle users may lack clarity on which GPU is best suited for their specific needs.

In general, the T4 GPU is an optimal choice for inference workloads that demand high throughput and low power consumption. On the other hand, the P100 GPU excels in handling training workloads, thanks to its superior performance and increased memory capacity.[11].

It’s important to note that TPUs (Tensor Processing Units) are not part of this comparison, as they represent a distinct type of hardware accelerator designed by Google. When considering GPUs, the P100 is recommended for training tasks, while both the GPU P100 and GPU T4 can be utilized for inference purposes. Selecting the appropriate GPU depends on the specific requirements of the given machine learning task.[11].

The complete code for generating realistic human faces using GAN, and what did it do?

The following is the complete Python code to automatically generate realistic human faces using GAN:

[crayon-6645a97dd9b05647447208/]

To execute the code provided seamlessly, without any errors, we can utilize the PyScripter IDE.

Let’s break down the important parts of the code above:

1. Importing necessary libraries for:

2. Walking through the Kaggle input directory and printing the filenames.

3. Setting up parameters for loading and resizing images from the CelebA dataset.

Cropping and resizing images to a specified size.

4. Converting the list of images to a NumPy array 

and normalizing pixel values to the range [0, 1].

5. Displaying the first 25 images from the dataset.

6. Defining the generator model architecture using Keras.

7. Defining the discriminator model architecture using Keras.

8. Creating an instance of the generator and displaying its summary.

9. Creating an instance of the discriminator and displaying its summary. 

Setting trainable to False to prevent it from being trained during the GAN training phase.

10. Creating the GAN model by connecting the generator and discriminator

Compiling the GAN model with a binary cross-entropy loss.

11. Setting up parameters for training the GAN model and creating directories for saving generated images.

12. Training the GAN model, 

Selected outputs:

Load and resize CelebA dataset:

Show image shape:

Display first 25 images:

Generator summary:

Generator scheme (as generated automatically using model_to_dot function from keras.utils.vis_utils):

Discriminator summary:

Discriminator scheme (also generated automatically using model_to_dot function from keras.utils.vis_utils):

GAN summary:

[crayon-6645a97dd9b0d464084375/]

After the step above, further training and output production would require a sufficient GPU, so, for the next steps, I move on using Kaggle (with GPU P100 as accelerator). 

Here is the screenshot of the last step that can be done without GPU using PyScripter IDE (if you have your own GPU, you can continue run the code on your PyScripter IDE seamlessly):

Plot of Discriminant and Adversary losses:

Quoting Reference [3] to help in interpreting the plot of Discriminant and Adversary losses: “GAN convergence is hard to identify. As the generator improves with training, the discriminator performance gets worse because the discriminator can’t easily tell the difference between real and fake. If the generator succeeds perfectly, then the discriminator has a 50% accuracy. In effect, the discriminator flips a coin to make its prediction.

This progression poses a problem for convergence of the GAN as a whole: the discriminator feedback gets less meaningful over time. If the GAN continues training past the point when the discriminator is giving completely random feedback, then the generator starts to train on junk feedback, and its quality may collapse.”

After 300 epochs, the code would produce the following realistic human faces:

Image output for the first epoch (0) vs the 300th epoch (299):

To see the progression between epoch (from 0 to 299) more clearly, please see the following video:

Out of curiosity, I retrain the model until epoch 599, which you can see the results (from epoch 300 to 599) in the following video:

Conclusion

GANs are a powerful and versatile class of deep generative models that can produce realistic and diverse data, such as images, text, audio, and video, from random noise. You also learned how GANs can generate realistic and diverse data using text descriptions as demonstrated by DALL-E.

This article has highlighted and demonstrated the potential use of deep learning, specifically within the context of the GANs architecture in the domain of numerical mathematics (approximate a plot of a function), generating handwritten digits, and generating realistic human faces. All are implemented with hands-on Python examples.

I hope this article was successful in giving you a comprehensive and accessible introduction to GANs, and a solid understanding and workflow of how to implement GANs to your domains and project goals, so, it would inspire you to learn more and experiment with GANs yourself.

Check out the full repository here:

github.com/Embarcadero/DL_Python05_GAN


Click here to get started with PyScripter, a free, feature-rich, and lightweight Python IDE.

Download RAD Studio to create more powerful Python GUI Windows Apps in 5x less time.

Check out Python4Delphi, which makes it simple to create Python GUIs for Windows using Delphi.

Also, look into DelphiVCL, which makes it simple to create Windows GUIs with Python.


References & further readings

[1] Biswal, A. (2023).

Top 10 Deep Learning Algorithms You Should Know in 2023. Simplilearn. simplilearn.com/tutorials/deep-learning-tutorial/deep-learning-algorithm.

[2] Candido, R. (2021).

Generative Adversarial Networks: Build Your First Models. Real Python. realpython.com/generative-adversarial-networks.

[3] Chauhan, N. S. (2021).

Generate Realistic Human Face using GAN. Kaggle. kaggle.com/code/nageshsingh/generate-realistic-human-face-using-gan.

[4] Google for Developers. (2024).

Generative Adversarial Networks. Advanced courses, machine learning, Google for Developers. developers.google.com/machine-learning/gan.

[5] LeCun, Y. (1998).

The MNIST database of handwritten digits. yann.lecun.com/exdb/mnist.

[6] Liu, Z., Luo, P., Wang, X., & Tang, X. (2018).

Large-scale celebfaces attributes (celeba) dataset. Retrieved August, 15(2018), 11. mmlab.ie.cuhk.edu.hk/projects/CelebA.html.

[7] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … & Bengio, Y. (2014).

Generative adversarial nets. Advances in neural information processing systems, 27.

[8] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … & Bengio, Y. (2020).

Generative adversarial networks. Communications of the ACM, 63(11), 139-144.

[9] Ramesh, A., Pavlov, M., Goh, G., Gray, S., Chen, M., Child, R., Misra, V., Mishkin, P., Krueger, G., Agarwal, S., & Sutskever, I. (2015-2023).

DALL·E: Creating images from text. DALL-E, OpenAI research. openai.com/research/dall-e.

[10] Sagar, R. (2020).

Top Libraries For Quick Implementation Of GANs. Analytics India Magazine. analyticsindiamag.com/generative-adversarial-networks-python-libraries.

[11] Siddhartha. (2023).

GPU T4 vs GPU P100 | Kaggle | GPU. Medium. siddhartha01writes.medium.com/gpu-t4-vs-gpu-p100-kaggle-gpu-cd852d56022c.

Exit mobile version