FLAN-T5-3B Summarizer
Property | Value |
---|---|
Base Model | google/flan-t5-xl |
License | BSD-3-Clause |
Training Hardware | 8x NVIDIA A100-SXM4-40GB |
Framework | PyTorch, Transformers |
What is flan-t5-3b-summarizer?
The flan-t5-3b-summarizer is a sophisticated text summarization model fine-tuned from Google's FLAN-T5-XL architecture. Developed by Jordan Clive, this model specializes in generating high-quality summaries across various domains, including news articles, scientific papers, conversations, and legislative documents. The model leverages prompt engineering to control summary types and outputs.
Implementation Details
The model was trained using BF16 precision with DeepSpeed stage 2 optimization across 6 epochs, monitored using ROUGE-2 metrics on the validation set. It utilizes an effective batch size of 80 and implements the Adam optimizer with carefully tuned learning rates and warm-up steps.
- Training utilized 8 NVIDIA A100-SXM4-40GB GPUs
- Implements linear learning rate scheduling
- Supports max source length of 512 tokens
- Produces summaries up to 150 tokens
Core Capabilities
- Multi-purpose summarization across various text types
- Prompt-controlled summary generation
- Support for article, one-sentence, conversation, scientific, and legislative summarization
- Efficient processing with beam search and n-gram repetition control
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its versatility in handling different types of summarization tasks through a single architecture, controlled via specific prompts. It was trained on diverse datasets including xsum, wikihow, CNN/DailyMail, samsum, scitldr/AIC, and billsum.
Q: What are the recommended use cases?
The model is ideal for academic and general-purpose summarization tasks. It excels at generating article summaries, scientific paper TL;DRs, conversation summaries, and legislative document summaries. Users can control the summary type by varying the instruction prompt prepended to the source document.