stable-diffusion-2

Maintained By
stabilityai

Stable Diffusion v2

PropertyValue
LicenseCreativeML Open RAIL++-M
AuthorsRobin Rombach, Patrick Esser
Training DataLAION-5B filtered subset
PaperLatent Diffusion Models

What is stable-diffusion-2?

Stable Diffusion v2 is an advanced text-to-image generation model that builds upon the success of its predecessor. It's a Latent Diffusion Model that combines an autoencoder with a diffusion model trained in latent space, utilizing OpenCLIP-ViT/H as its text encoder. The model supports high-resolution image generation up to 768x768 pixels and implements the v-objective training approach for improved quality.

Implementation Details

The model architecture consists of three main components: an image encoder that converts images into latent representations with a downsampling factor of 8, a text encoder using OpenCLIP-ViT/H, and a UNet backbone that processes the combined information. Training was conducted on 32 A100 GPUs with a batch size of 2048 and AdamW optimizer.

  • Supports multiple specialized checkpoints: base model, inpainting, depth-aware generation, and upscaling
  • Implements efficient attention mechanisms through optional xformers integration
  • Provides flexibility in sampling with various schedulers including DDIM and Euler Discrete

Core Capabilities

  • High-quality image generation at 768x768 resolution
  • Improved photorealism compared to previous versions
  • Text-guided image generation and manipulation
  • Support for inpainting and depth-aware generation
  • 4x upscaling capabilities

Frequently Asked Questions

Q: What makes this model unique?

This model introduces significant improvements over its predecessor, including better photorealism, higher resolution support (768x768), and the implementation of the v-objective training approach. It also offers specialized versions for different tasks like inpainting and depth-aware generation.

Q: What are the recommended use cases?

The model is primarily intended for research purposes, including safe deployment studies, artistic applications, educational tools, and research on generative models. It specifically excludes the generation of harmful content, disinformation, or non-consensual imagery.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.