Stable Diffusion v1-1
Property | Value |
---|---|
License | CreativeML OpenRAIL-M |
Authors | Robin Rombach, Patrick Esser |
Training Data | LAION-2B(en) |
Architecture | Latent Diffusion Model |
What is stable-diffusion-v1-1?
Stable Diffusion v1-1 is a state-of-the-art latent text-to-image diffusion model capable of generating photorealistic images from text descriptions. The model was trained on 237,000 steps at 256x256 resolution on LAION-2B-en, followed by 194,000 steps at 512x512 resolution on LAION-high-resolution dataset.
Implementation Details
The model combines an autoencoder with a diffusion model trained in latent space. It utilizes a CLIP ViT-L/14 text encoder and operates with a relative downsampling factor of 8, converting images of shape H x W x 3 to latents of shape H/f x W/f x 4.
- Training Infrastructure: 32 x 8 x A100 GPUs
- Batch Size: 2048
- Optimizer: AdamW with 0.0001 learning rate
- Training Dataset: LAION-2B with high-resolution subset
Core Capabilities
- High-quality image generation from text descriptions
- Support for various noise schedulers including PNDM and LMS
- Efficient latent space operations
- Float16 precision support for memory-constrained environments
Frequently Asked Questions
Q: What makes this model unique?
This model represents one of the first publicly available versions of Stable Diffusion, offering a balanced approach between quality and computational efficiency through its latent diffusion architecture.
Q: What are the recommended use cases?
The model is primarily intended for research purposes, including safe deployment studies, artistic processes, educational tools, and research on generative models. It should not be used for creating harmful or offensive content.