Taiyi-Stable-Diffusion-1B-Chinese-EN-v0.1
Property | Value |
---|---|
License | CreativeML OpenRAIL-M |
Base Model | Stable Diffusion v1.4 |
Training Data | 20M filtered Chinese image-text pairs |
Paper | Fengshenbang 1.0 |
What is Taiyi-Stable-Diffusion-1B-Chinese-EN-v0.1?
This is a groundbreaking bilingual Stable Diffusion model that enables both Chinese and English text-to-image generation. Developed by IDEA-CCNL, it's trained on carefully curated datasets from Noah-Wukong and Zero, filtered using CLIP scoring for high-quality image-text pairs.
Implementation Details
The model underwent a two-stage training process on 8 A80 GPUs: First stage (80 hours) focused on text encoder training while freezing other components, followed by a second stage (100 hours) with full model fine-tuning for better Chinese language compatibility.
- Built on Stable Diffusion v1.4 architecture
- Uses CLIP Score filtering (>0.2) for training data selection
- Supports both full precision and half-precision (FP16) inference
Core Capabilities
- Bilingual text-to-image generation
- Support for artistic style transfer (e.g., Van Gogh style)
- Complex concept combination in both languages
- DreamBooth fine-tuning compatibility
Frequently Asked Questions
Q: What makes this model unique?
It's the first open-source Stable Diffusion model specifically trained for both Chinese and English text-to-image generation, with carefully curated training data and a two-stage training approach.
Q: What are the recommended use cases?
The model excels at generating images from Chinese or English prompts, artistic style transfer, and can be further fine-tuned using DreamBooth for specific use cases. It's particularly effective for cultural-specific Chinese concepts and artistic interpretations.