stable-diffusion-v1-1

Maintained By
CompVis

Stable Diffusion v1-1

PropertyValue
LicenseCreativeML OpenRAIL-M
AuthorsRobin Rombach, Patrick Esser
Training DataLAION-2B(en)
ArchitectureLatent Diffusion Model

What is stable-diffusion-v1-1?

Stable Diffusion v1-1 is a state-of-the-art latent text-to-image diffusion model capable of generating photorealistic images from text descriptions. The model was trained on 237,000 steps at 256x256 resolution on LAION-2B-en, followed by 194,000 steps at 512x512 resolution on LAION-high-resolution dataset.

Implementation Details

The model combines an autoencoder with a diffusion model trained in latent space. It utilizes a CLIP ViT-L/14 text encoder and operates with a relative downsampling factor of 8, converting images of shape H x W x 3 to latents of shape H/f x W/f x 4.

  • Training Infrastructure: 32 x 8 x A100 GPUs
  • Batch Size: 2048
  • Optimizer: AdamW with 0.0001 learning rate
  • Training Dataset: LAION-2B with high-resolution subset

Core Capabilities

  • High-quality image generation from text descriptions
  • Support for various noise schedulers including PNDM and LMS
  • Efficient latent space operations
  • Float16 precision support for memory-constrained environments

Frequently Asked Questions

Q: What makes this model unique?

This model represents one of the first publicly available versions of Stable Diffusion, offering a balanced approach between quality and computational efficiency through its latent diffusion architecture.

Q: What are the recommended use cases?

The model is primarily intended for research purposes, including safe deployment studies, artistic processes, educational tools, and research on generative models. It should not be used for creating harmful or offensive content.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.