BigBird-Pegasus-Large-PubMed
Property | Value |
---|---|
License | Apache 2.0 |
Author | |
Primary Task | Scientific Paper Summarization |
Paper | BigBird: Transformers for Longer Sequences |
ROUGE-1 Score | 40.89 (PubMed) |
What is bigbird-pegasus-large-pubmed?
BigBird-Pegasus-Large-PubMed is a specialized transformer model designed for summarizing scientific papers, particularly those from PubMed. It implements block sparse attention mechanisms to efficiently process long documents up to 4096 tokens, making it particularly suitable for scientific literature summarization. The model achieves state-of-the-art performance on PubMed summarization tasks with a ROUGE-1 score of 40.89.
Implementation Details
The model utilizes block sparse attention instead of traditional full attention, significantly reducing computational complexity while maintaining performance. It's built on the BigBird architecture and fine-tuned specifically for the PubMed dataset from the scientific papers collection.
- Supports sequence lengths up to 4096 tokens
- Implements configurable block size and random blocks
- Features both encoder-decoder architecture with block sparse attention in encoder
- Maintains full attention in decoder for optimal generation
Core Capabilities
- Scientific paper summarization with state-of-the-art performance
- Long document processing with efficient attention mechanism
- Customizable attention patterns with block_size and num_random_blocks parameters
- Demonstrated strong performance on both PubMed (ROUGE-1: 40.89) and arXiv (ROUGE-1: 40.38) datasets
Frequently Asked Questions
Q: What makes this model unique?
This model combines BigBird's efficient attention mechanism with Pegasus's summarization capabilities, specifically optimized for scientific literature. Its ability to handle long sequences while maintaining high performance on technical content sets it apart from standard transformer models.
Q: What are the recommended use cases?
The model is best suited for summarizing scientific papers, particularly in the medical and biomedical domains. It excels at processing long academic documents and generating concise, accurate summaries while preserving technical details.