stable-diffusion-1.5

Maintained By
Jiali

Stable Diffusion v1.5

PropertyValue
AuthorsRobin Rombach, Patrick Esser
LicenseCreativeML OpenRAIL M
Training DataLAION-aesthetics v2 5+
Resolution512x512
PaperHigh-Resolution Image Synthesis With Latent Diffusion Models (CVPR 2022)

What is stable-diffusion-1.5?

Stable Diffusion v1.5 is a powerful latent text-to-image diffusion model that represents a significant advancement in AI-powered image generation. Built upon the foundation of v1.2, this model underwent extensive fine-tuning with 595k additional steps on the curated LAION-aesthetics v2 5+ dataset, incorporating innovative techniques like 10% text-conditioning dropout to enhance classifier-free guidance sampling.

Implementation Details

The model employs a sophisticated architecture combining an autoencoder with a diffusion model trained in latent space. It uses a CLIP ViT-L/14 text encoder and features a relative downsampling factor of 8, efficiently mapping images to latent representations.

  • Supports both inference (4.27GB ema-only) and fine-tuning (7.7GB full) versions
  • Trained on 32 x 8 A100 GPUs with AdamW optimizer
  • Batch size of 2048 with gradient accumulation
  • Constant learning rate of 0.0001 after 10,000 warmup steps

Core Capabilities

  • High-quality photorealistic image generation from text descriptions
  • Efficient processing at 512x512 resolution
  • Advanced composition handling through latent space operations
  • Support for creative and artistic applications
  • Research-focused features for model analysis and improvement

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its optimized balance between quality and efficiency, incorporating improved aesthetics through specialized dataset curation and innovative training techniques like text-conditioning dropout. It's particularly notable for its stability and consistent output quality.

Q: What are the recommended use cases?

The model is primarily intended for research purposes, including safe deployment studies, bias investigation, artistic creation, educational tools, and generative model research. It specifically excludes creation of harmful content, misrepresentation, or commercial use without proper safety mechanisms.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.