Whisper Tiny

Property	Value
Parameter Count	37.8M parameters
Model Type	Automatic Speech Recognition
License	Apache 2.0
Paper	View Paper

What is whisper-tiny?

Whisper-tiny is the most compact variant of OpenAI's Whisper family, designed for efficient automatic speech recognition and translation. As a transformer-based encoder-decoder model, it offers an impressive balance between performance and resource efficiency, supporting 99 languages while maintaining a relatively small footprint of 37.8M parameters.

Implementation Details

The model utilizes a sequence-to-sequence architecture trained on 680,000 hours of multilingual audio data. It processes audio by converting it to log-Mel spectrograms and can handle both transcription and translation tasks through specialized decoder prompts.

Supports both English-only and multilingual transcription
Handles audio chunks of up to 30 seconds
Includes timestamp prediction capabilities
Uses F32 tensor type for computations

Core Capabilities

Multilingual ASR supporting 99 languages
Speech-to-text transcription with 7.54% WER on LibriSpeech clean test
Speech translation to English
Long-form transcription through chunking
Robust performance across various accents and background noise conditions

Frequently Asked Questions

Q: What makes this model unique?

Whisper-tiny stands out for its exceptional efficiency-to-performance ratio, offering multilingual capabilities in a compact form factor. It's particularly notable for achieving reasonable accuracy while maintaining a small parameter count, making it suitable for deployment in resource-constrained environments.

Q: What are the recommended use cases?

The model is ideal for lightweight ASR applications, development and testing environments, and scenarios where resource efficiency is crucial. It's particularly well-suited for English transcription tasks, basic multilingual transcription, and prototyping speech recognition solutions.

whisper-tiny