t5-v1_1-small-chinese-cluecorpussmall

Maintained By
uer

T5-v1_1-small-chinese-cluecorpussmall

PropertyValue
ArchitectureT5 Version 1.1
SizeSmall (8 layers, 512 hidden size)
Training DataCLUECorpusSmall
PaperUER Paper
AuthorUER

What is t5-v1_1-small-chinese-cluecorpussmall?

This is a Chinese language T5 (Text-to-Text Transfer Transformer) model that represents version 1.1 of the architecture, pre-trained on the CLUECorpusSmall dataset. It's specifically designed for Chinese text generation tasks and includes several improvements over the original T5 model, including GEGLU activation in feed-forward layers and no parameter sharing between embedding and classifier layers.

Implementation Details

The model was trained using UER-py framework in two stages: 1,000,000 steps with 128 sequence length, followed by 250,000 additional steps with 512 sequence length. It utilizes span masking with geometric probability of 0.3 and maximum span length of 5.

  • Improved architecture with GEGLU activation
  • Dropout disabled during pre-training
  • Independent embedding and classifier layer parameters
  • 8 layers with 512 hidden size (Small configuration)

Core Capabilities

  • Text-to-Text Generation for Chinese language
  • Supports span masking with sentinel tokens
  • Efficient processing with smaller parameter count
  • Optimized for Chinese language understanding and generation

Frequently Asked Questions

Q: What makes this model unique?

This model's uniqueness lies in its optimized architecture for Chinese language processing, incorporating T5 v1.1 improvements while maintaining a smaller parameter count for efficiency. It uses sentinel tokens for masking and includes specialized GEGLU activation.

Q: What are the recommended use cases?

The model is best suited for Chinese text generation tasks, including text completion, summarization, and other text-to-text transformation tasks. It's particularly efficient for applications requiring a balance between performance and computational resources.

The first platform built for prompt engineering