sd-image-variations-diffusers

Maintained By
lambdalabs

sd-image-variations-diffusers

PropertyValue
LicenseCreativeML OpenRAIL-M
FrameworkDiffusers
Training DataLAION Aesthetics 6plus
Base ModelStable Diffusion v1.4

What is sd-image-variations-diffusers?

sd-image-variations-diffusers is a specialized version of Stable Diffusion that generates variations of input images, similar to DALLE-2's functionality. This model replaces the traditional text encoder with a CLIP image encoder, allowing it to work directly with image inputs rather than text prompts.

Implementation Details

The model was trained in two stages on 8 A100-40GB GPUs. The first stage fine-tuned only CrossAttention layer weights for 46,000 steps, while the second stage trained the entire UNet for 50,000 steps. It uses ViT-L/14 as the image encoder and includes the final projection layer to the CLIP shared embedding space.

  • Training utilized AdamW optimizer with learning rates up to 1e-5
  • Batch sizes of 128-160 across distributed training
  • Implements native support in 🤗 Diffusers library
  • Requires specific image preprocessing without anti-aliasing

Core Capabilities

  • Generate creative variations of input images
  • Maintains aesthetic quality of original images
  • Supports batch processing for multiple variations
  • Integrates seamlessly with Diffusers pipeline

Frequently Asked Questions

Q: What makes this model unique?

This model's unique feature is its ability to create image variations using CLIP image embeddings instead of text inputs, offering a different approach to image generation compared to standard Stable Diffusion models.

Q: What are the recommended use cases?

The model is recommended for research purposes, artistic applications, educational tools, and creative design processes. It should not be used to create harmful, offensive, or misleading content.

The first platform built for prompt engineering