whisper-base.en

Maintained By
openai

Whisper Base.en

PropertyValue
Parameter Count74M
Model TypeTransformer Encoder-Decoder
LicenseApache 2.0
PaperRobust Speech Recognition via Large-Scale Weak Supervision
TaskAutomatic Speech Recognition (English)

What is whisper-base.en?

Whisper-base.en is a specialized English speech recognition model developed by OpenAI, designed for efficient and accurate transcription of English audio content. As part of the Whisper model family, it represents a balanced compromise between model size and performance, featuring 74 million parameters optimized specifically for English language processing.

Implementation Details

The model utilizes a Transformer-based encoder-decoder architecture, trained on 680,000 hours of labeled speech data. It's implemented using PyTorch and supports F32 tensor operations. The model can process audio chunks of up to 30 seconds and can handle longer audio through automatic chunking.

  • Pre-trained on 438,000 hours of English audio data
  • Achieves 4.27% Word Error Rate (WER) on LibriSpeech test-clean
  • Supports batch processing for efficient inference
  • Includes timestamp generation capabilities

Core Capabilities

  • High-accuracy English speech transcription
  • Robust performance across different accents and background noise
  • Support for long-form transcription through chunking
  • Integration with Hugging Face Transformers pipeline
  • Efficient batch processing for large-scale applications

Frequently Asked Questions

Q: What makes this model unique?

The model's specialization in English-only transcription allows it to achieve excellent performance while maintaining a relatively small size of 74M parameters. It offers a perfect balance between accuracy and computational efficiency, making it ideal for production deployments.

Q: What are the recommended use cases?

The model is particularly well-suited for English speech transcription tasks, including podcast transcription, meeting recordings, and general audio content processing. It's especially valuable in scenarios requiring accurate transcription without the need for multilingual support.

The first platform built for prompt engineering