T5-v1_1-small-chinese-cluecorpussmall
Property | Value |
---|---|
Architecture | T5 Version 1.1 |
Size | Small (8 layers, 512 hidden size) |
Training Data | CLUECorpusSmall |
Paper | UER Paper |
Author | UER |
What is t5-v1_1-small-chinese-cluecorpussmall?
This is a Chinese language T5 (Text-to-Text Transfer Transformer) model that represents version 1.1 of the architecture, pre-trained on the CLUECorpusSmall dataset. It's specifically designed for Chinese text generation tasks and includes several improvements over the original T5 model, including GEGLU activation in feed-forward layers and no parameter sharing between embedding and classifier layers.
Implementation Details
The model was trained using UER-py framework in two stages: 1,000,000 steps with 128 sequence length, followed by 250,000 additional steps with 512 sequence length. It utilizes span masking with geometric probability of 0.3 and maximum span length of 5.
- Improved architecture with GEGLU activation
- Dropout disabled during pre-training
- Independent embedding and classifier layer parameters
- 8 layers with 512 hidden size (Small configuration)
Core Capabilities
- Text-to-Text Generation for Chinese language
- Supports span masking with sentinel tokens
- Efficient processing with smaller parameter count
- Optimized for Chinese language understanding and generation
Frequently Asked Questions
Q: What makes this model unique?
This model's uniqueness lies in its optimized architecture for Chinese language processing, incorporating T5 v1.1 improvements while maintaining a smaller parameter count for efficiency. It uses sentinel tokens for masking and includes specialized GEGLU activation.
Q: What are the recommended use cases?
The model is best suited for Chinese text generation tasks, including text completion, summarization, and other text-to-text transformation tasks. It's particularly efficient for applications requiring a balance between performance and computational resources.