pegasus-multi_news

Maintained By
google

Pegasus Multi-News

PropertyValue
AuthorGoogle
PaperarXiv:1912.08777
Downloads5,285
TaskSummarization

What is pegasus-multi_news?

Pegasus-multi_news is a state-of-the-art abstractive summarization model developed by Google Research, specifically optimized for multi-document summarization tasks. This implementation represents a significant advancement in the field, achieving a ROUGE-1 score of 47.65 on the multi_news dataset using mixed and stochastic training approaches.

Implementation Details

The model employs a mixed and stochastic training approach, combining both C4 and HugeNews datasets weighted by their example counts. It was trained for 1.5M steps, significantly longer than the standard 500k steps, due to observed slower convergence on pretraining perplexity.

  • Utilizes dynamic gap sentence ratio sampling (15-45%)
  • Implements 20% uniform noise for importance sentence sampling
  • Features enhanced sentencepiece tokenizer with newline character support
  • Achieves 47.65/18.75/24.95 on ROUGE-1/2/L metrics for multi-news dataset

Core Capabilities

  • Multi-document summarization with state-of-the-art performance
  • Robust handling of long-form content
  • Improved tokenization with newline preservation
  • Flexible gap sentence ratio for diverse summarization styles

Frequently Asked Questions

Q: What makes this model unique?

The model's mixed & stochastic training approach sets it apart, combining multiple datasets and implementing dynamic sentence sampling techniques. This results in more robust and adaptable summarization capabilities compared to traditional approaches.

Q: What are the recommended use cases?

The model excels at summarizing multiple news articles or documents into a coherent, concise summary. It's particularly well-suited for news aggregation, research summarization, and content curation applications where multiple source documents need to be condensed.

The first platform built for prompt engineering