whisper-large

Maintained By
openai

Whisper Large

PropertyValue
Parameter Count1.54B
LicenseApache 2.0
PaperView Paper
Languages Supported99

What is whisper-large?

Whisper-large is OpenAI's state-of-the-art speech recognition model, trained on 680,000 hours of multilingual audio data. It's a Transformer-based encoder-decoder architecture designed for robust speech recognition and translation across 99 languages.

Implementation Details

The model represents a significant advancement in automatic speech recognition (ASR), utilizing a sequence-to-sequence architecture with 1.54B parameters. It's trained using large-scale weak supervision and demonstrates remarkable generalization capabilities without the need for fine-tuning.

  • Supports both transcription and translation tasks
  • Processes 30-second audio chunks efficiently
  • Achieves 3.0 WER on LibriSpeech test-clean
  • Handles background noise and diverse accents robustly

Core Capabilities

  • Multilingual ASR supporting 99 languages
  • Zero-shot translation to English
  • Timestamp prediction
  • Batch processing for long-form audio
  • Context-aware transcription with forced decoder ids

Frequently Asked Questions

Q: What makes this model unique?

Whisper-large stands out for its robust generalization capabilities across languages and domains without requiring fine-tuning, thanks to its extensive training on 680k hours of labeled data. It's particularly notable for handling challenging audio conditions and diverse accents.

Q: What are the recommended use cases?

The model excels in research applications, general transcription tasks, and accessibility tools. It's particularly effective for English ASR but should be carefully evaluated for high-stakes applications. Real-time transcription requires additional optimization.

The first platform built for prompt engineering