bigbird-pegasus-large-pubmed

Maintained By
google

BigBird-Pegasus-Large-PubMed

PropertyValue
LicenseApache 2.0
AuthorGoogle
Primary TaskScientific Paper Summarization
PaperBigBird: Transformers for Longer Sequences
ROUGE-1 Score40.89 (PubMed)

What is bigbird-pegasus-large-pubmed?

BigBird-Pegasus-Large-PubMed is a specialized transformer model designed for summarizing scientific papers, particularly those from PubMed. It implements block sparse attention mechanisms to efficiently process long documents up to 4096 tokens, making it particularly suitable for scientific literature summarization. The model achieves state-of-the-art performance on PubMed summarization tasks with a ROUGE-1 score of 40.89.

Implementation Details

The model utilizes block sparse attention instead of traditional full attention, significantly reducing computational complexity while maintaining performance. It's built on the BigBird architecture and fine-tuned specifically for the PubMed dataset from the scientific papers collection.

  • Supports sequence lengths up to 4096 tokens
  • Implements configurable block size and random blocks
  • Features both encoder-decoder architecture with block sparse attention in encoder
  • Maintains full attention in decoder for optimal generation

Core Capabilities

  • Scientific paper summarization with state-of-the-art performance
  • Long document processing with efficient attention mechanism
  • Customizable attention patterns with block_size and num_random_blocks parameters
  • Demonstrated strong performance on both PubMed (ROUGE-1: 40.89) and arXiv (ROUGE-1: 40.38) datasets

Frequently Asked Questions

Q: What makes this model unique?

This model combines BigBird's efficient attention mechanism with Pegasus's summarization capabilities, specifically optimized for scientific literature. Its ability to handle long sequences while maintaining high performance on technical content sets it apart from standard transformer models.

Q: What are the recommended use cases?

The model is best suited for summarizing scientific papers, particularly in the medical and biomedical domains. It excels at processing long academic documents and generating concise, accurate summaries while preserving technical details.

The first platform built for prompt engineering