Stable Diffusion v1-1

Property	Value
License	CreativeML OpenRAIL-M
Authors	Robin Rombach, Patrick Esser
Training Data	LAION-2B(en)
Architecture	Latent Diffusion Model

What is stable-diffusion-v1-1?

Stable Diffusion v1-1 is a state-of-the-art latent text-to-image diffusion model capable of generating photorealistic images from text descriptions. The model was trained on 237,000 steps at 256x256 resolution on LAION-2B-en, followed by 194,000 steps at 512x512 resolution on LAION-high-resolution dataset.

Implementation Details

The model combines an autoencoder with a diffusion model trained in latent space. It utilizes a CLIP ViT-L/14 text encoder and operates with a relative downsampling factor of 8, converting images of shape H x W x 3 to latents of shape H/f x W/f x 4.

Training Infrastructure: 32 x 8 x A100 GPUs
Batch Size: 2048
Optimizer: AdamW with 0.0001 learning rate
Training Dataset: LAION-2B with high-resolution subset

Core Capabilities

High-quality image generation from text descriptions
Support for various noise schedulers including PNDM and LMS
Efficient latent space operations
Float16 precision support for memory-constrained environments

Frequently Asked Questions

Q: What makes this model unique?

This model represents one of the first publicly available versions of Stable Diffusion, offering a balanced approach between quality and computational efficiency through its latent diffusion architecture.

Q: What are the recommended use cases?

The model is primarily intended for research purposes, including safe deployment studies, artistic processes, educational tools, and research on generative models. It should not be used for creating harmful or offensive content.