Stable Diffusion 2.1 Base
Property | Value |
---|---|
License | CreativeML OpenRAIL++ |
Authors | Robin Rombach, Patrick Esser |
Paper | High-Resolution Image Synthesis With Latent Diffusion Models (CVPR 2022) |
Training Data | LAION-5B subset |
What is stable-diffusion-2-1-base?
Stable Diffusion 2.1 Base is an advanced text-to-image generation model that builds upon the stable-diffusion-2-base architecture with 220,000 additional training steps. It employs a Latent Diffusion Model architecture with OpenCLIP-ViT/H text encoding, capable of generating high-quality images at 512x512 resolution from text descriptions.
Implementation Details
The model combines an autoencoder with a diffusion model trained in latent space. It uses a relative downsampling factor of 8 and transforms images from HxWx3 to latents of H/f x W/f x 4. The training process involved extensive computational resources (32 x 8 x A100 GPUs) with the AdamW optimizer and a batch size of 2048.
- Trained with punsafe=0.98 filtering
- Implements the v-objective for improved generation quality
- Uses EulerDiscreteScheduler for inference
- Supports attention slicing for memory efficiency
Core Capabilities
- High-quality text-to-image generation at 512x512 resolution
- Improved safety features through dataset filtering
- Efficient latent space processing
- Memory-efficient attention mechanisms
- Support for various inference optimizations
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its additional 220k training steps over the base model, with enhanced safety filtering (punsafe=0.98) while maintaining high-quality generation capabilities.
Q: What are the recommended use cases?
The model is primarily intended for research purposes, including safe deployment studies, artistic applications, educational tools, and generative model research. It explicitly excludes harmful content generation and misuse cases.