Whisper Tiny
Property | Value |
---|---|
Parameter Count | 37.8M parameters |
Model Type | Automatic Speech Recognition |
License | Apache 2.0 |
Paper | View Paper |
What is whisper-tiny?
Whisper-tiny is the most compact variant of OpenAI's Whisper family, designed for efficient automatic speech recognition and translation. As a transformer-based encoder-decoder model, it offers an impressive balance between performance and resource efficiency, supporting 99 languages while maintaining a relatively small footprint of 37.8M parameters.
Implementation Details
The model utilizes a sequence-to-sequence architecture trained on 680,000 hours of multilingual audio data. It processes audio by converting it to log-Mel spectrograms and can handle both transcription and translation tasks through specialized decoder prompts.
- Supports both English-only and multilingual transcription
- Handles audio chunks of up to 30 seconds
- Includes timestamp prediction capabilities
- Uses F32 tensor type for computations
Core Capabilities
- Multilingual ASR supporting 99 languages
- Speech-to-text transcription with 7.54% WER on LibriSpeech clean test
- Speech translation to English
- Long-form transcription through chunking
- Robust performance across various accents and background noise conditions
Frequently Asked Questions
Q: What makes this model unique?
Whisper-tiny stands out for its exceptional efficiency-to-performance ratio, offering multilingual capabilities in a compact form factor. It's particularly notable for achieving reasonable accuracy while maintaining a small parameter count, making it suitable for deployment in resource-constrained environments.
Q: What are the recommended use cases?
The model is ideal for lightweight ASR applications, development and testing environments, and scenarios where resource efficiency is crucial. It's particularly well-suited for English transcription tasks, basic multilingual transcription, and prototyping speech recognition solutions.