T5-v1.1-base Model

Property	Value
Author	Google
License	Apache 2.0
Training Data	C4 Dataset
Paper	Link to Paper

What is t5-v1_1-base?

T5-v1.1-base is an improved version of Google's original T5 (Text-To-Text Transfer Transformer) model. This version introduces several significant enhancements, including the implementation of GEGLU activation in the feed-forward hidden layer, replacing the traditional ReLU activation. The model has been exclusively pre-trained on the Colossal Clean Crawled Corpus (C4) without mixing in downstream tasks.

Implementation Details

The model architecture features several key improvements over its predecessor:

GEGLU activation function implementation in feed-forward layers
Disabled dropout during pre-training (should be re-enabled for fine-tuning)
Independent parameters between embedding and classifier layers
Optimized model architecture with modified d_model, num_heads, and d_ff parameters

Core Capabilities

Text-to-text generation and transfer learning
Suitable for various NLP tasks after fine-tuning
Optimized for performance on downstream tasks
Supports both PyTorch and TensorFlow frameworks

Frequently Asked Questions

Q: What makes this model unique?

This model's uniqueness lies in its improved architecture with GEGLU activation and its pure C4 dataset training, making it a more refined version of the original T5. The removal of parameter sharing between embedding and classifier layers also contributes to its enhanced performance.

Q: What are the recommended use cases?

The model requires fine-tuning before use in downstream tasks. It's particularly well-suited for summarization, question answering, text classification, and other NLP tasks after proper fine-tuning on task-specific data.

t5-v1_1-base