LaMini-Flan-T5-783M
Property | Value |
---|---|
Model Size | 783M parameters |
Base Model | google/flan-t5-large |
License | CC BY NC 4.0 |
Paper | arXiv:2304.14402 |
What is LaMini-Flan-T5-783M?
LaMini-Flan-T5-783M is a powerful text-to-text generation model that belongs to the LaMini-LM series. It's built by fine-tuning the Flan-T5-large architecture on a massive instruction dataset containing 2.58M samples. This model represents one of the most capable versions in the LaMini series, specifically optimized for instruction-following tasks.
Implementation Details
The model was trained using a sophisticated approach with carefully selected hyperparameters, including a learning rate of 0.0005, batch size of 512, and linear learning rate scheduling over 5 epochs. It utilizes the Adam optimizer and implements gradient accumulation for stable training.
- Architecture based on Flan-T5-large with 783M parameters
- Trained on LaMini-instruction dataset with 2.58M samples
- Optimized for instruction-following tasks
- Implements text-to-text generation pipeline
Core Capabilities
- Natural language instruction following
- Text generation and transformation
- Question answering
- Task-oriented response generation
Frequently Asked Questions
Q: What makes this model unique?
This model stands out due to its efficient architecture and comprehensive instruction fine-tuning on a diverse dataset. It represents one of the best-performing models in its size category, marked with a ✩ in the LaMini-LM series.
Q: What are the recommended use cases?
The model is specifically designed for responding to human instructions in natural language. It excels in tasks requiring text generation, transformation, and instruction following, making it suitable for various NLP applications.