T5-v1.1-base Model
Property | Value |
---|---|
Author | |
License | Apache 2.0 |
Training Data | C4 Dataset |
Paper | Link to Paper |
What is t5-v1_1-base?
T5-v1.1-base is an improved version of Google's original T5 (Text-To-Text Transfer Transformer) model. This version introduces several significant enhancements, including the implementation of GEGLU activation in the feed-forward hidden layer, replacing the traditional ReLU activation. The model has been exclusively pre-trained on the Colossal Clean Crawled Corpus (C4) without mixing in downstream tasks.
Implementation Details
The model architecture features several key improvements over its predecessor:
- GEGLU activation function implementation in feed-forward layers
- Disabled dropout during pre-training (should be re-enabled for fine-tuning)
- Independent parameters between embedding and classifier layers
- Optimized model architecture with modified d_model, num_heads, and d_ff parameters
Core Capabilities
- Text-to-text generation and transfer learning
- Suitable for various NLP tasks after fine-tuning
- Optimized for performance on downstream tasks
- Supports both PyTorch and TensorFlow frameworks
Frequently Asked Questions
Q: What makes this model unique?
This model's uniqueness lies in its improved architecture with GEGLU activation and its pure C4 dataset training, making it a more refined version of the original T5. The removal of parameter sharing between embedding and classifier layers also contributes to its enhanced performance.
Q: What are the recommended use cases?
The model requires fine-tuning before use in downstream tasks. It's particularly well-suited for summarization, question answering, text classification, and other NLP tasks after proper fine-tuning on task-specific data.