FLAN-T5-3B Summarizer

Property	Value
Base Model	google/flan-t5-xl
License	BSD-3-Clause
Training Hardware	8x NVIDIA A100-SXM4-40GB
Framework	PyTorch, Transformers

What is flan-t5-3b-summarizer?

The flan-t5-3b-summarizer is a sophisticated text summarization model fine-tuned from Google's FLAN-T5-XL architecture. Developed by Jordan Clive, this model specializes in generating high-quality summaries across various domains, including news articles, scientific papers, conversations, and legislative documents. The model leverages prompt engineering to control summary types and outputs.

Implementation Details

The model was trained using BF16 precision with DeepSpeed stage 2 optimization across 6 epochs, monitored using ROUGE-2 metrics on the validation set. It utilizes an effective batch size of 80 and implements the Adam optimizer with carefully tuned learning rates and warm-up steps.

Training utilized 8 NVIDIA A100-SXM4-40GB GPUs
Implements linear learning rate scheduling
Supports max source length of 512 tokens
Produces summaries up to 150 tokens

Core Capabilities

Multi-purpose summarization across various text types
Prompt-controlled summary generation
Support for article, one-sentence, conversation, scientific, and legislative summarization
Efficient processing with beam search and n-gram repetition control

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its versatility in handling different types of summarization tasks through a single architecture, controlled via specific prompts. It was trained on diverse datasets including xsum, wikihow, CNN/DailyMail, samsum, scitldr/AIC, and billsum.

Q: What are the recommended use cases?

The model is ideal for academic and general-purpose summarization tasks. It excels at generating article summaries, scientific paper TL;DRs, conversation summaries, and legislative document summaries. Users can control the summary type by varying the instruction prompt prepended to the source document.