Taiyi-Stable-Diffusion-1B-Chinese-EN-v0.1

Property	Value
License	CreativeML OpenRAIL-M
Base Model	Stable Diffusion v1.4
Training Data	20M filtered Chinese image-text pairs
Paper	Fengshenbang 1.0

What is Taiyi-Stable-Diffusion-1B-Chinese-EN-v0.1?

This is a groundbreaking bilingual Stable Diffusion model that enables both Chinese and English text-to-image generation. Developed by IDEA-CCNL, it's trained on carefully curated datasets from Noah-Wukong and Zero, filtered using CLIP scoring for high-quality image-text pairs.

Implementation Details

The model underwent a two-stage training process on 8 A80 GPUs: First stage (80 hours) focused on text encoder training while freezing other components, followed by a second stage (100 hours) with full model fine-tuning for better Chinese language compatibility.

Built on Stable Diffusion v1.4 architecture
Uses CLIP Score filtering (>0.2) for training data selection
Supports both full precision and half-precision (FP16) inference

Core Capabilities

Bilingual text-to-image generation
Support for artistic style transfer (e.g., Van Gogh style)
Complex concept combination in both languages
DreamBooth fine-tuning compatibility

Frequently Asked Questions

Q: What makes this model unique?

It's the first open-source Stable Diffusion model specifically trained for both Chinese and English text-to-image generation, with carefully curated training data and a two-stage training approach.

Q: What are the recommended use cases?

The model excels at generating images from Chinese or English prompts, artistic style transfer, and can be further fine-tuned using DreamBooth for specific use cases. It's particularly effective for cultural-specific Chinese concepts and artistic interpretations.