LaMini-Flan-T5-783M

Maintained By
MBZUAI

LaMini-Flan-T5-783M

PropertyValue
Model Size783M parameters
Base Modelgoogle/flan-t5-large
LicenseCC BY NC 4.0
PaperarXiv:2304.14402

What is LaMini-Flan-T5-783M?

LaMini-Flan-T5-783M is a powerful text-to-text generation model that belongs to the LaMini-LM series. It's built by fine-tuning the Flan-T5-large architecture on a massive instruction dataset containing 2.58M samples. This model represents one of the most capable versions in the LaMini series, specifically optimized for instruction-following tasks.

Implementation Details

The model was trained using a sophisticated approach with carefully selected hyperparameters, including a learning rate of 0.0005, batch size of 512, and linear learning rate scheduling over 5 epochs. It utilizes the Adam optimizer and implements gradient accumulation for stable training.

  • Architecture based on Flan-T5-large with 783M parameters
  • Trained on LaMini-instruction dataset with 2.58M samples
  • Optimized for instruction-following tasks
  • Implements text-to-text generation pipeline

Core Capabilities

  • Natural language instruction following
  • Text generation and transformation
  • Question answering
  • Task-oriented response generation

Frequently Asked Questions

Q: What makes this model unique?

This model stands out due to its efficient architecture and comprehensive instruction fine-tuning on a diverse dataset. It represents one of the best-performing models in its size category, marked with a ✩ in the LaMini-LM series.

Q: What are the recommended use cases?

The model is specifically designed for responding to human instructions in natural language. It excels in tasks requiring text generation, transformation, and instruction following, making it suitable for various NLP applications.

The first platform built for prompt engineering