Taiyi-Stable-Diffusion-1B-Chinese-v0.1
Property | Value |
---|---|
License | CreativeML OpenRAIL-M |
Research Paper | Fengshenbang 1.0 Paper |
Training Data | 20M filtered Chinese image-text pairs |
Training Infrastructure | 32 x A100 GPUs, 100 hours training |
What is Taiyi-Stable-Diffusion-1B-Chinese-v0.1?
This groundbreaking model represents the first open-source Chinese Stable Diffusion implementation, specifically designed to generate images from Chinese text prompts. Built on the foundation of Stable Diffusion v1.4, it incorporates a specialized Chinese text encoder while preserving the original model's powerful generation capabilities.
Implementation Details
The model was trained using a carefully curated dataset combining Noah-Wukong (100M) and Zero (23M) datasets, filtered using CLIP scoring to ensure high-quality image-text pairs. The training process focused on fine-tuning only the text encoder while freezing other model components to maintain generation quality while achieving Chinese language alignment.
- Uses Taiyi-CLIP-RoBERTa-102M-ViT-L-Chinese as text encoder
- Implements CLIP Score filtering (threshold > 0.2) for training data
- Preserves original Stable Diffusion architecture while enabling Chinese text understanding
Core Capabilities
- Chinese text-to-image generation
- Support for both basic and advanced prompting
- Compatibility with both full and half-precision inference
- Integration with popular UI tools and frameworks
Frequently Asked Questions
Q: What makes this model unique?
It's the first open-source Stable Diffusion model specifically trained for Chinese language input, offering native understanding of Chinese concepts and artistic expressions.
Q: What are the recommended use cases?
The model excels at generating images from Chinese poetry, descriptive texts, and artistic concepts, making it ideal for creative applications, digital art, and cultural content creation.