Taiyi-Stable-Diffusion-1B-Chinese-v0.1

Property	Value
License	CreativeML OpenRAIL-M
Research Paper	Fengshenbang 1.0 Paper
Training Data	20M filtered Chinese image-text pairs
Training Infrastructure	32 x A100 GPUs, 100 hours training

What is Taiyi-Stable-Diffusion-1B-Chinese-v0.1?

This groundbreaking model represents the first open-source Chinese Stable Diffusion implementation, specifically designed to generate images from Chinese text prompts. Built on the foundation of Stable Diffusion v1.4, it incorporates a specialized Chinese text encoder while preserving the original model's powerful generation capabilities.

Implementation Details

The model was trained using a carefully curated dataset combining Noah-Wukong (100M) and Zero (23M) datasets, filtered using CLIP scoring to ensure high-quality image-text pairs. The training process focused on fine-tuning only the text encoder while freezing other model components to maintain generation quality while achieving Chinese language alignment.

Uses Taiyi-CLIP-RoBERTa-102M-ViT-L-Chinese as text encoder
Implements CLIP Score filtering (threshold > 0.2) for training data
Preserves original Stable Diffusion architecture while enabling Chinese text understanding

Core Capabilities

Chinese text-to-image generation
Support for both basic and advanced prompting
Compatibility with both full and half-precision inference
Integration with popular UI tools and frameworks

Frequently Asked Questions

Q: What makes this model unique?

It's the first open-source Stable Diffusion model specifically trained for Chinese language input, offering native understanding of Chinese concepts and artistic expressions.

Q: What are the recommended use cases?

The model excels at generating images from Chinese poetry, descriptive texts, and artistic concepts, making it ideal for creative applications, digital art, and cultural content creation.