Kohaku-XL-Zeta
Property | Value |
---|---|
License | Fair-AI-public-1.0-sd |
Framework | Diffusers |
Training Dataset Size | 8.46M images |
Resolution | 1024x1024 |
What is Kohaku-XL-Zeta?
Kohaku-XL-Zeta is an advanced text-to-image diffusion model that builds upon its predecessor, Kohaku-XL-Epsilon rev2. It represents a significant advancement in stable image generation, combining both traditional tag-based and natural language caption approaches. The model was trained on a massive dataset of 8.46M images, including content from Danbooru, Pixiv, PVC figures, and Realbooru.
Implementation Details
The model was trained using quad RTX 3090s with FP16 mixed precision, utilizing the Lion8bit optimizer with a learning rate of 1e-5 for UNet. The training process involved 16,548 total steps over 430 hours, with an equivalent batch size of 512. The implementation features Min SNR Gamma of 5 and IP Noise Gamma of 0.05.
- Extended context length limit to 300
- Support for both tag-based and natural language prompts
- Advanced CCIP metrics surpassing Sanae XL anime with over 2,200 characters scoring above 0.9
- Improved stability requiring less detailed prompts
Core Capabilities
- High-fidelity character and style reproduction
- Flexible prompt formatting supporting tags and natural language
- Resolution support from 256 to 4096
- Advanced quality control through special tags
- Multi-dataset training for improved concept understanding
Frequently Asked Questions
Q: What makes this model unique?
The model's ability to handle both traditional tags and natural language captions, combined with its extensive training dataset and improved stability, sets it apart from other text-to-image models. It achieves superior character fidelity with CCIP metrics exceeding comparable models.
Q: What are the recommended use cases?
The model excels at generating images with 1024x1024 resolution using CFG scales of 3.5-6.5. It's particularly effective with Euler(A) or DPM++ series samplers, and supports various styles and concepts due to its diverse training dataset.