LaMini-Flan-T5-248M
Property | Value |
---|---|
Model Type | Text-to-Text Generation |
Base Model | Flan-T5-base |
Parameters | 248M |
License | CC BY NC 4.0 |
Paper | arXiv:2304.14402 |
What is LaMini-Flan-T5-248M?
LaMini-Flan-T5-248M is part of the LaMini-LM model series, developed by MBZUAI. It's a fine-tuned version of the Flan-T5-base model, specifically trained on the LaMini-instruction dataset containing 2.58M instruction samples. This model represents a sweet spot between model size and performance, earning it a "✩" recommendation from its creators for its optimal performance within its size class.
Implementation Details
The model was trained using a carefully tuned set of hyperparameters, including a learning rate of 0.0005, batch size of 512, and linear learning rate scheduling over 5 epochs. It uses the Adam optimizer and implements gradient accumulation steps for stable training.
- Training batch size: 128 with 4 gradient accumulation steps
- Evaluation batch size: 64
- Optimizer: Adam (β1=0.9, β2=0.999, ε=1e-08)
- Training duration: 5 epochs
Core Capabilities
- Text-to-text generation tasks
- Instruction following
- Natural language processing tasks
- Efficient performance with moderate model size
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its optimal balance between size and performance, being part of a carefully distilled series of models trained on a large-scale instruction dataset. It's specifically marked with a "✩" indicating its recommended status among similar-sized models.
Q: What are the recommended use cases?
The model is best suited for responding to human instructions written in natural language. It can be easily implemented using the Hugging Face pipeline for text-to-text generation tasks, making it accessible for various NLP applications.