stable-diffusion-v1-4

Maintained By
Narsil

Stable Diffusion v1.4

PropertyValue
LicenseCreativeML OpenRAIL-M
AuthorsRobin Rombach, Patrick Esser
Training Infrastructure32 x 8 x A100 GPUs
Base ModelStable Diffusion v1.3

What is stable-diffusion-v1-4?

Stable Diffusion v1.4 is an advanced latent text-to-image diffusion model that represents a significant evolution in the field of AI-powered image generation. Built upon its predecessor v1.3, this model leverages a sophisticated latent diffusion architecture combined with a CLIP ViT-L/14 text encoder to generate high-quality images from textual descriptions.

Implementation Details

The model employs a complex architecture that combines an autoencoder with a diffusion model trained in latent space. It processes images through an encoder that transforms them into latent representations, using a downsampling factor of 8. The training procedure utilized AdamW optimizer with a learning rate of 0.0001 and a batch size of 2048, implemented across 32 A100 GPUs.

  • Utilizes CLIP ViT-L/14 text encoder for processing prompts
  • Implements cross-attention in the UNet backbone
  • Supports multiple scheduling algorithms including PLMS and K-LMS
  • Operates at 512x512 resolution for optimal results

Core Capabilities

  • High-quality text-to-image generation
  • Support for artistic and creative applications
  • Advanced compositional understanding
  • Classifier-free guidance sampling
  • Research and educational applications

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its improved aesthetic capabilities and refined classifier-free guidance sampling, building upon the successful architecture of v1.3. It's particularly notable for its balance between image quality and generation speed.

Q: What are the recommended use cases?

The model is primarily intended for research purposes, including safe deployment studies, artistic applications, educational tools, and generative model research. It explicitly excludes the generation of harmful, offensive, or misleading content.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.