Kohaku-XL-Zeta

Maintained By
KBlueLeaf

Kohaku-XL-Zeta

PropertyValue
LicenseFair-AI-public-1.0-sd
FrameworkDiffusers
Training Dataset Size8.46M images
Resolution1024x1024

What is Kohaku-XL-Zeta?

Kohaku-XL-Zeta is an advanced text-to-image diffusion model that builds upon its predecessor, Kohaku-XL-Epsilon rev2. It represents a significant advancement in stable image generation, combining both traditional tag-based and natural language caption approaches. The model was trained on a massive dataset of 8.46M images, including content from Danbooru, Pixiv, PVC figures, and Realbooru.

Implementation Details

The model was trained using quad RTX 3090s with FP16 mixed precision, utilizing the Lion8bit optimizer with a learning rate of 1e-5 for UNet. The training process involved 16,548 total steps over 430 hours, with an equivalent batch size of 512. The implementation features Min SNR Gamma of 5 and IP Noise Gamma of 0.05.

  • Extended context length limit to 300
  • Support for both tag-based and natural language prompts
  • Advanced CCIP metrics surpassing Sanae XL anime with over 2,200 characters scoring above 0.9
  • Improved stability requiring less detailed prompts

Core Capabilities

  • High-fidelity character and style reproduction
  • Flexible prompt formatting supporting tags and natural language
  • Resolution support from 256 to 4096
  • Advanced quality control through special tags
  • Multi-dataset training for improved concept understanding

Frequently Asked Questions

Q: What makes this model unique?

The model's ability to handle both traditional tags and natural language captions, combined with its extensive training dataset and improved stability, sets it apart from other text-to-image models. It achieves superior character fidelity with CCIP metrics exceeding comparable models.

Q: What are the recommended use cases?

The model excels at generating images with 1024x1024 resolution using CFG scales of 3.5-6.5. It's particularly effective with Euler(A) or DPM++ series samplers, and supports various styles and concepts due to its diverse training dataset.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.