Unlock the Power of Python for Deep Learning with Diffusion Model

Deep learning is a subset of machine learning, which is a subset of artificial intelligence (AI), the technology behind the most exciting capabilities in robotics, natural language processing, image and video recognition, large language models (LLMs), generative AI, etc.

To address intricate problems, extensive amounts of data and substantial computational capabilities are essential for the functioning of deep learning algorithms. These algorithms are versatile in handling various types of data.

This article will delve into a comprehensive exploration of the Diffusion Model, a prominent member of the deep learning domain and the driving force behind Stable Diffusion, which is pretty popular and widely used in generative AI these days.

Stable Diffusion has been praised for making AI image generation accessible and flexible, becoming one of the key tools for creative professionals and hobbyists working with generative AI.

Before we begin, let’s see the overview of Latent Diffusion Models architecture:

00_architectureofldm — Overview of Latent Diffusion architecture. Image source: Reference [8].

Table of Contents

What is Deep Learning?

Deep learning is a subfield of machine learning that solves complex problems using artificial neural networks. These neural networks are made up of interconnected nodes arranged in multiple layers that extract features from input data. Large datasets are used to train these models, allowing them to detect patterns and correlations that humans would find difficult or impossible to detect.

The impact of deep learning on artificial intelligence has been substantial. It has paved the way for the development of intelligent systems capable of independent learning, adaptation, and decision-making. Deep learning has led to remarkable advancements in various domains, encompassing image and speech recognition, natural language processing, machine translation, text generation, image generation (as would be reviewed in this article), autonomous driving, and numerous others.

000 — Example of AI-generated image using Stable Diffusion XL model that I generated using the following prompt: “Illustration of a future Artificial General Intelligence (AGI)”.

Why Python for Deep Learning?

Python has gained widespread popularity as a programming language due to its versatility and ease of use in diverse domains of computer science, especially in the field of deep learning. Thanks to its extensive range of libraries and frameworks specially tailored for deep learning, Python has emerged as a top choice among many machine learning professionals.

Python has emerged as the language of choice for deep learning, and here are some of the reasons why:

1. Simple to learn and use:

Python is a high-level programming language that is easy to learn and use, even for those who are new to programming. Its concise and uncomplicated syntax makes it easy to write and understand. This allows developers to concentrate on solving problems without worrying about the details of the language.

2. Abundant libraries and frameworks:

Python has a vast ecosystem of libraries and frameworks that cater specifically to deep learning. Some of these libraries include TensorFlow, PyTorch, Keras, and Theano. These libraries provide pre-built functions and modules that simplify the development process, reducing the need to write complex code from scratch.

3. Strong community support:

Python has a large and active community of developers contributing to its development, maintenance, and improvement. This community offers support and guidance to beginners, making it easier to learn and use Python for deep learning.

4. Platform independence:

Python is platform-independent, which means that code written on one platform can be easily executed on another platform without any modification. This makes it easier to deploy deep learning models on different platforms and devices.

5. Easy integration with other languages:

Python can be easily integrated with other programming languages, such as Delphi, C++, and Java, making it ideal for building complex systems that require integrating different technologies.

Overall, Python’s ease of use, an abundance of libraries and frameworks, strong community support, platform independence, and ease of integration with other languages make it an indispensable tool for machine learning practitioners. Its popularity continues to soar as a result.

What are Diffusion and Latent Diffusion Models?

A diffusion model is a type of generative model in machine learning designed to create data by reversing a noise-adding process. It models the way data can evolve from randomness (pure noise) to meaningful structures, such as images, audio, or other complex data distributions.

The following table shows how the diffusion model is compared with other generative models^[8]:

Aspect	Diffusion Models	GANs (Generative Adversarial Networks)
Training Stability	More stable	Prone to mode collapse
Output Quality	High detail, fewer artifacts	Sometimes sharper, but less reliable
Speed	Slower to generate images	Faster at inference
Mode Coverage	Better at covering the data’s full distribution	GANs may miss some modes

To dive deeper into GAN, read our previous article below:

Unlock the Power of Python for Deep Learning with Generative Adversarial Networks (GANs) – The Engine behind DALL-E

On the other hand, a Latent Diffusion Model (LDM) is an advanced type of diffusion model that operates in a compressed (latent) space rather than directly on pixel data, making it more computationally efficient. LDMs, such as Stable Diffusion, enable faster image generation without compromising quality, which is especially useful for large-scale generative tasks like text-to-image synthesis.

The following table shows how LDMs improve over traditional diffusion models^[8]:

Aspect	Traditional Diffusion Models	Latent Diffusion Models
Data Space	Operates directly on pixels	Works in a compressed latent space
Speed	Slower due to pixel-level steps	Faster due to reduced dimensionality
Resource Usage	Higher GPU/CPU requirements	More efficient for large-scale models
Quality	High, but with higher cost	High quality with lower overhead

What is Stable Diffusion?

Stable Diffusion is a generative artificial intelligence (generative AI) model that allows us to produce unique, high-quality, or even photorealistic images from text and image prompts^[1]. Stable Diffusion leverages the Latent Diffusion model^[2][5][10], developed by researchers from the Machine Vision and Learning group at LMU Munich, a.k.a CompVis.

Model checkpoints were publicly released at the end of August 2022 by a collaboration of Stability AI, CompVis, and Runway with support from EleutherAI and LAION^[7][9]. For more information, you can check out their official blog post^[13][14].

At the time this article was written, Stable Diffusion 3 Medium had already been released. Stable Diffusion 3 Medium is the latest and most advanced text-to-image AI model in our Stable Diffusion 3 series, comprising two billion parameters. It excels in photorealism, processes complex prompts, and generates clear text.

Try Stable Diffusion online with no-code approach

Before we dive deeper into Stable Diffusion with Python, let’s try it online first, with online Stable Diffusion 2.1 Demo:

00_huggingfacedemo01 — Outputs of “A cat playing bass guitar” prompt.

For faster generation and API access, you can try: DreamStudio Beta.

00_dreamstudiodemo02 — Outputs of “A cat playing bass guitar” prompt.

Or, you can try Playground AI, which enables us to try dozens of different filters and presets, to generate far better outputs:

00_playgroundaidemo01 — Output of “A cat playing bass guitar” prompt. The left side shows the dozens of different filters or styles offered by Playground AI.

00_playgroundaidemo02 — Outputs of “A cat playing bass guitar” prompt, using 4 different filters.

How do you get started in Stable Diffusion with Python using Hugging Face’s Diffusers library?

The easiest way to get started with Stable Diffusion and other diffusion models with Python is by using Hugging Face’s Diffusers library.

What is 🤗 Diffusers library?

🤗 Diffusers is a leading library for state-of-the-art pre-trained diffusion models, enabling the generation of images, audio, and even 3D structures of molecules. It serves as a modular toolkit suitable for both simple inference tasks or training your own custom diffusion model.

🤗 Diffusers library is designed with a focus on usability over performance, simplicity over easy, and customizability over abstractions. One goal of the 🤗 Diffusers library is to make diffusion models accessible to a wide range of deep learning practitioners.

The underlying model of 🤗 Diffusers library, a neural network, is trained to predict a way to slightly denoise the image in each step. After a certain number of steps, a sample is obtained.

The following is the architecture of the neural network (commonly follows the U-net architecture as proposed by reference^[4]and improved upon in the Pixel++ paper):

Some of the highlights of the architecture are:

this model predicts images of the same size as the input
the model makes the input image go through several blocks of ResNet layers which halves the image size by 2
then through the same number of blocks that upsample it again
skip connections link features on the downsample path to corresponding layers in the upsample path.

How to install Diffusers on your local machine?

Move to your chosen or preferred working directory, and then create a new virtual environment, and install Python version 3.10.

Create a virtual environment called “diffusers”, and install Python 3.10:

1	conda create --name diffusers python=3.10

To activate this environment, use:

1	conda activate diffusers

To deactivate an active environment, use this command:

1	conda deactivate

Before we begin any further, make sure we have all the necessary libraries installed using the following pip command:

1	pip install --upgrade diffusers accelerate transformers

We would install the following two libraries:

Accelerate: To speed up model loading for inference and training.
Transformers: This is required to run the most popular diffusion models, such as Stable Diffusion.

There are three main components of the library to know about:

1. DiffusionPipeline

The DiffusionPipeline is a high-level end-to-end class designed to rapidly generate samples from popular pre-trained diffusion models for inference, in a user-friendly fashion.

We’ll begin by importing a pipeline first. We’ll use the google/ddpm-celebahq-256 model developed by Google and U.C. Berkeley. It’s a model that utilizes the Denoising Diffusion Probabilistic Models (DDPM) algorithm that is trained on a dataset of celebrities images.

Hands-on and selected outputs:

The following is a code snippet for the basic use of DiffusionPipeline, and a sufficient explanation of selected outputs:

from diffusers import DDPMPipeline

image_pipe = DDPMPipeline.from_pretrained("google/ddpm-celebahq-256")

image_pipe.to("cpu")

images = image_pipe().images

# Show an example of the generated image from the Hugging Face Hub (DDPM-CelebHQ)

images[0].show()

# Browse what was the building blocks of the pipeline:

image_pipe

output01_diffusersonpyscripter-pipelines01

To generate an image, we simply run the pipeline and don’t even need to give it any input, it will generate a random initial noise sample and then iterate the diffusion process.

The pipeline returns as output a dictionary with a generated sample of interest:

output01_diffusersonpyscripter-pipelines02

Let’s take a look at the image by running images[0].show() on PyScripter IDE:

output01_diffusersonpyscripter-pipelines03

Run image_pipe on PyScripter, to see what the pipeline is made of, so we can try to understand better what was going on under the hood:

output01_diffusersonpyscripter-pipelines04

Now we can see what’s inside the pipeline: A scheduler and a UNet model. Let’s look closely at them and what this pipeline just did under the hood.

2. Pretrained models

Popular pre-trained model architectures and modules can be used as building blocks for creating diffusion systems.

Instances of the model class are neural networks that take a noisy sample as well as a timestep as inputs to predict a less noisy output sample. In this subsection, we’ll load a pre-trained model and play around with it to understand the model API. We’ll load a simple unconditional image generation model of type UNet2DModel which was released with the DDPM Paper^[3] and for instance, take a look at another checkpoint trained on church images: google/ddpm-church-256.

Hands-on and selected outputs

The following is a code snippet for the basic use of the models, and a sufficient explanation of selected outputs:

import os

os.environ['TF_ENABLE_ONEDNN_OPTS'] = '0'

from diffusers import UNet2DModel

repo_id = "google/ddpm-church-256"

model = UNet2DModel.from_pretrained(repo_id, use_safetensors=False)

model

model.config

model_random = UNet2DModel(**model.config)

model_random.save_pretrained("my_model")

# Add random gaussian sample

import torch

torch.manual_seed(0)

noisy_sample = torch.randn(1, model.config.in_channels, model.config.sample_size, model.config.sample_size)

noisy_sample.shape

# Inference

with torch.no_grad():

noisy_residual = model(sample=noisy_sample, timestep=2).sample

Now let’s take a look at the model’s configuration. By accessing the config attribute using model.config on PyScripter IDE, we can browse all the necessary parameters to define the model architecture:

You can access all the complete output of the model and model.config in the repository [3].

A couple of important config parameters are:

sample_size: defines the height and width dimension of the input sample.
in_channels: defines the number of input channels of the input sample.
down_block_types and up_block_types: define the type of down- and upsampling blocks that are used to create the UNet architecture as was seen in the figure at the beginning of this notebook.
block_out_channels: defines the number of output channels of the downsampling blocks, also used in reversed order for the number of input channels of the upsampling blocks.
layers_per_block: defines how many ResNet blocks are present in each UNet block.

Coming back to the trained model, let’s now see how you can use the model for inference. First, you need a random gaussian sample in the shape of an image (batch_size × in_channels × sample_size × sample_size). We have a batch axis because a model can receive multiple random noises. A channel axis because each one consists of multiple channels (such as red-green-blue). And finally, sample_size corresponds to the height and width. Let’s confirm the output shapes match using noisy_sample.shape:

The predicted noisy_residual has the exact same shape as the input and we use it to compute a slightly less noisy image. Let’s confirm the output shapes match using noisy_residual.shape:

3. Schedulers

Schedulers are algorithms wrapped into a Python class that define the noise schedule, which is used to add noise to the model during training and also define the algorithm to compute the slightly less noisy sample given the model output (noisy_residual). This article only focuses on how to use scheduler classes for inference.

We will use DDPMScheduler, the denoising algorithm proposed in the DDPM Paper^[4].

Hands-on and selected outputs

The following is a code snippet for the basic use of the schedulers, and a sufficient explanation of selected outputs:

import os

os.environ['TF_ENABLE_ONEDNN_OPTS'] = '0'

from diffusers import UNet2DModel

from diffusers import DDPMScheduler

repo_id = "google/ddpm-ema-church-256"

model = UNet2DModel.from_pretrained(repo_id, use_safetensors=False)

scheduler = DDPMScheduler.from_pretrained(repo_id)

scheduler.config

scheduler.save_config("my_scheduler")

new_scheduler = DDPMScheduler.from_pretrained("my_scheduler")

# Add random gaussian sample

import torch

torch.manual_seed(0)

noisy_sample = torch.randn(1, model.config.in_channels, model.config.sample_size, model.config.sample_size)

noisy_sample.shape

# Inference

with torch.no_grad():

noisy_residual = model(sample=noisy_sample, timestep=2).sample

less_noisy_sample = scheduler.step(model_output=noisy_residual, timestep=2, sample=noisy_sample).prev_sample

less_noisy_sample.shape

# Define the denoising loop

import PIL.Image

import numpy as np

def display_sample(sample, i):

image_processed = sample.cpu().permute(0, 2, 3, 1)

image_processed = (image_processed + 1.0) * 127.5

image_processed = image_processed.numpy().astype(np.uint8)

image_pil = PIL.Image.fromarray(image_processed[0])

print(f"Image at step {i}")

image_pil.show()

# Display the progress, at every 50th step

import tqdm

sample = noisy_sample

for i, t in enumerate(tqdm.tqdm(scheduler.timesteps)):

# 1. predict noise residual

with torch.no_grad():

residual = model(sample, t).sample

# 2. compute less noisy image and set x_t -> x_t-1

sample = scheduler.step(residual, t, sample).prev_sample

# 3. optionally look at image

if (i + 1) % 50 == 0:

display_sample(sample, i + 1)

Let’s take a look at the scheduler configuration here, by running scheduler.config on PyScripter IDE:

output03_diffusersonpyscripter-schedulers01-01

Different schedulers are usually defined by different parameters. The following are the most important ones that we need to know:

num_train_timesteps defines the length of the denoising process, e.g. how many timesteps are needed to process random Gaussian noise to a data sample.
beta_schedule defines the type of noise schedule that shall be used for inference and training.
beta_start and beta_end define the smallest and highest noise values of the schedule.

We’ll try to use the model output from the previous section. We can see that the computed sample has the exact same shape as the model input, which means that we are ready to pass it to the model again in the next step.

The last step is to bring it all together and define the denoising loop. This loop prints out the (less and less) noisy samples along the way for better visualization in the denoising loop.

In the code above, we already define a display function that takes care of post-processing the denoised image, and then convert it to a PIL.Image and display it. Here is the output (displayed using the PIL.Image) of the 50th step:

output03_diffusersonpyscripter-schedulers02

It takes quite some time to see a meaningful shape, it can be seen after 800 steps. And, here is the final result, the 1000th step:

output03_diffusersonpyscripter-schedulers03

By saving the image results after the 50th step and its multiples until the 1000th step and aggregating them, we can see the following denoising progress:

output03_diffusersonpyscripter-schedulers04

Isn’t it amazing? We should now have a solid foundational understanding of the schedulers and all other components of the 🤗 Diffusers library. The key points to keep in mind are:

1. Schedulers have no trainable weights (parameter-free).

2. During inference, schedulers specify the algorithm that computes the slightly less noisy sample.

To end the subsection about models and schedulers, please also note that we very much deliberately try to keep models and schedulers as independent from each other as possible. This means a scheduler should never accept a model as an input and vice-versa. The model predicts the noise residual or slightly less noisy image with its trained weights, while the scheduler computes the previous sample given the model’s output.

How to perform text-to-image using Stable Diffusion on 🤗 Diffusers?

In this section, we will try text-to-image or image generation from text, using the Diffusers library. We will try it using two different classes: DiffusionPipeline and StableDiffusionPipeline.

Using `DiffusionPipeline`

Load the model with the from_pretrained() method:

from diffusers import DiffusionPipeline

pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", use_safetensors=True)

The DiffusionPipeline downloads and caches all modeling, tokenization, and scheduling components.

output301_diffusionpipeline-from_pretrained

You’ll see that the Stable Diffusion pipeline is composed of the UNet2DConditionModel and PNDMScheduler among other things:

The following is the complete code example to generate an image from a text prompt using Diffusers:

from diffusers import DiffusionPipeline

pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", use_safetensors=True)

image = pipeline("A painting of a cat playing bass guitar").images[0]

image

image.save("painting_of_cat_playing_bass2.png")

Run it on PyScripter IDE, the “A painting of a cat playing bass guitar” prompt would generate the following output:

Using `StableDiffusionPipeline`

In this second example, we will generate an image from a text prompt, directly using StableDiffusionPipeline from the diffusers library.

Run the following code and your prompt on PyScripter IDE:

import torch

from diffusers import StableDiffusionPipeline

model_id = "runwayml/stable-diffusion-v1-5"

pipe = StableDiffusionPipeline.from_pretrained(model_id)

prompt = "A man coding on his laptop"

pipe = pipe.to("cpu")

generator = torch.Generator("cpu").manual_seed(0)

image = pipe(prompt, generator=generator).images[0]

image

image.save("a_man_coding_on_his_laptop.png")

Other interesting implementations

The advancement of generative AI-particularly Stable Diffusion, enables us to do creativity-demanding tasks such as creating video art, with just a few clicks away. Below are videos I generate using combinations of text-to-image, image-to-image, frame interpolation, and text/image-to-video using Playground AI and Runway ML, which all are rooted in Stable Diffusion.

Text/image to video

Text/image to video is a multimodal AI system that can generate novel videos from text, images, or video clips.

Here is the collection of 11 short videos (4 seconds each) that I generated or animated from existing images with additional guidance from text prompts, with help from Runway ML:

The following are the prompts I use to guide the image-to-video generation process:

1. A cat playing bass guitar

2. An astronaut working on ISS

3. Human and robot handshake

4. Jackson Pollock's No. 1 (Lavender Mist) live drip painting

5. Typing on keyboard

6. A woman working with her laptop

7. Two software developers brainstorming

8. The startup founder gives a presentation

9. A PhD student teaches us about her research

10. A young mathematician struggling to solve equations on a blackboard

11. An alien gray staring at us

Text-to-image + frame interpolation

Frame interpolation is a technique to turn a sequence of images into an animated video, by filling in between images with smooth transitions.

First, I generate 120 images from unusual, obscure, and complex text prompts using Playground AI. The following are the prompts I used to generate images:

1. Photographs of Space-Time Anomalies

2. Illustration of a future Artificial General Intelligence (AGI)

3. Illustration of deep learning and AI community

4. Very advanced alien civilizations that live inside black hole

5. Vibrating membrane from brane theory and m-theory

6. First ever photo of an atom, First ever photo of a proton taken using electron microscope

7. Deep sea, Deep sea with deep sea creatures, Deep sea monsters

8. A battalion of military robots

9. Dream of the Future of Humanity: Interplanetary, Interstellar, and Intergalactic Colony

10. Surface of exoplanet

11. Reimagine newton's apple and universal law of gravitation

12. Draw Feynman diagram in artistic but scientifically formal way

Then, I create a video by automatically generating smooth transitions between those images using frame interpolation from Runway ML:

Conclusion

In conclusion, leveraging Python for deep learning with diffusion models unlocks immense potential for generative AI, take Stable Diffusion as a perfect example. These models, particularly Latent Diffusion Models (LDMs), have revolutionized the field by combining computational efficiency with high-quality outputs, enabling accessible and versatile applications such as text-to-image synthesis.

We’ve also learned the hands-on parts of diffusion models by utilizing libraries like Hugging Face’s 🤗 Diffusers with Python, to draw picture from random noise (we explored pre-trained models, customizable pipelines, and efficient inference methods to draw church realistically) and performing text-to-image synthesis.

As we delve deeper into these cutting-edge advancements, diffusion models continue to shape the future of AI-driven innovation, empowering diverse domains ranging from art and design to scientific discovery.

I hope this article was successful in giving you a comprehensive and accessible introduction to diffusion models, and a solid understanding and workflow of how to implement them to your domains and project goals, so, it would inspire you to learn more and experiment with diffusion models yourself.

Check out the full repository here: github.com/Embarcadero/DL_Python07_ DiffusionModels

Click here to get started with PyScripter, a free, feature-rich, and lightweight Python IDE.

Download RAD Studio to create more powerful Python GUI Windows Apps in 5x less time.

Check out Python4Delphi, which makes it simple to create Python GUIs for Windows using Delphi.

Also, look into DelphiVCL4Python, which makes it simple to create Windows GUIs with Python.

References & further readings

[1] Amazon Web Services, Inc. (2024).

What is Stable Diffusion? AWS What is. aws.amazon.com/what-is/stable-diffusion

[2] Hakim, M. A. (2023).

How to bypass the ChatGPT information cutoff? A busy (or lazy) man guide to read more recent ML/DL papers. Paper-001: “Rombach et al., 2022”. hkaLabs blog. hkalabs.com/blog/how-to-read-deep-learning-papers-using-bing-chat-ai-001

[3] Hakim, M. A. (2024).

Article47 – Deep Learning Python 07 – Diffusion Models. embarcaderoBlog-repo. GitHub repository. github.com/MuhammadAzizulHakim/ embarcaderoBlog-repo/tree/main/Article47%20-%20Deep%20Learning%20Python%2007%20-%20Diffusion%20Models

[4] Ho, J., Jain, A., & Abbeel, P. (2020).

Denoising diffusion probabilistic models. Advances in neural information processing systems, 33, 6840-6851.

[5] Hugging Face. (2024).

Diffusers. Hugging Face docs. huggingface.co/docs/diffusers/index

[6] Hugging Face. (2023).

diffusers_intro.ipynb: Introducing Hugging Face’s new library for diffusion models. Hugging Face GitHub repository. colab.research.google.com/github/huggingface/ notebooks/blob/main/diffusers/diffusers_intro.ipynb

[7] Hugging Face. (2023).

The Stable Diffusion Guide. Hugging Face docs. huggingface.co/docs/diffusers/v0.13.0/en/ stable_diffusion

[8] OpenAI. (2024).

ChatGPT (Nov version) [Large language model]. chat.openai.com/chat

[9] Patil, S., Cuenca, P., Lambert, N., and von Platen, P. (2024).

Stable Diffusion with 🧨 Diffusers. Hugging Face blog. huggingface.co/blog/stable_diffusion

[10] Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022).

High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10684-10695).

[11] Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022).

latent-diffusion: High-Resolution Image Synthesis with Latent Diffusion Models. CompVis – Computer Vision and Learning LMU Munich. GitHub repository. github.com/CompVis/latent-diffusion

[12] Ronneberger, O., Fischer, P., & Brox, T. (2015).

U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18 (pp. 234-241). Springer International Publishing.

[13] Stability AI. (2024).

Stability AI: Activating humanity’s potential through generative AI. Stability AI official website. stability.ai

[14] Stability AI. (2024).

Stable Diffusion Public Release. Stability AI news. stability.ai/news/stable-diffusion-public-release

[15] Towards AI Editorial Team. (2023).

Diffusion Models vs. GANs vs. VAEs: Comparison of Deep Generative Models. Toward AI blog. towardsai.net/p/generative-ai/diffusion-models-vs-gans-vs-vaes-comparison-of-deep-generative-models

Unlock the Power of Python for Deep Learning with Diffusion Model – The Engine behind Stable Diffusion

What is Deep Learning?

Why Python for Deep Learning?

1. Simple to learn and use:

2. Abundant libraries and frameworks:

3. Strong community support:

4. Platform independence:

5. Easy integration with other languages:

What are Diffusion and Latent Diffusion Models?

The following table shows how the diffusion model is compared with other generative models[8]:

The following table shows how LDMs improve over traditional diffusion models[8]:

What is Stable Diffusion?

Try Stable Diffusion online with no-code approach

How do you get started in Stable Diffusion with Python using Hugging Face’s Diffusers library?

What is 🤗 Diffusers library?

How to install Diffusers on your local machine?

There are three main components of the library to know about:

1. DiffusionPipeline

Hands-on and selected outputs:

2. Pretrained models

Hands-on and selected outputs

3. Schedulers

Hands-on and selected outputs

How to perform text-to-image using Stable Diffusion on 🤗 Diffusers?

Using DiffusionPipeline

Using StableDiffusionPipeline

Other interesting implementations

Text/image to video

Text-to-image + frame interpolation

Conclusion

References & further readings

[1] Amazon Web Services, Inc. (2024).

[2] Hakim, M. A. (2023).

[3] Hakim, M. A. (2024).

[4] Ho, J., Jain, A., & Abbeel, P. (2020).

[5] Hugging Face. (2024).

[6] Hugging Face. (2023).

[7] Hugging Face. (2023).

[8] OpenAI. (2024).

[9] Patil, S., Cuenca, P., Lambert, N., and von Platen, P. (2024).

[10] Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022).

[11] Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022).

[12] Ronneberger, O., Fischer, P., & Brox, T. (2015).

[13] Stability AI. (2024).

[14] Stability AI. (2024).

[15] Towards AI Editorial Team. (2023).

Related posts

Leave a Reply Cancel reply

Something Fresh

What People Reading

Categories

Python GUI

Categories

Useful Links

Follow us

The following table shows how the diffusion model is compared with other generative models^[8]:

The following table shows how LDMs improve over traditional diffusion models^[8]:

Using `DiffusionPipeline`

Using `StableDiffusionPipeline`