distil-large-v3

Maintained By
distil-whisper

Distil-Whisper Large V3

PropertyValue
Parameter Count756M
Model TypeSpeech Recognition
LicenseMIT
PaperDistil-Whisper Paper
Relative Speed6.3x faster than Whisper large-v3

What is distil-large-v3?

Distil-large-v3 is a knowledge-distilled version of OpenAI's Whisper large-v3 model, designed specifically for English speech recognition. It achieves comparable accuracy while being significantly faster, making it ideal for production environments. The model was trained on 22,000 hours of diverse audio data from nine open-source datasets, ensuring robustness across different domains and speaking styles.

Implementation Details

The model utilizes an encoder-decoder architecture, with the encoder retained from the original Whisper model and a reduced decoder for improved efficiency. It supports both sequential and chunked long-form transcription, with specialized optimizations for 30-second context windows.

  • Optimized for both short-form and long-form audio transcription
  • Supports multiple inference backends including Flash Attention 2 and PyTorch SDPA
  • Compatible with popular frameworks like Whisper.cpp, Faster-Whisper, and Transformers.js

Core Capabilities

  • Achieves WER within 1% of Whisper large-v3
  • 6.3x faster inference speed compared to the original model
  • Robust performance across different audio domains
  • Supports speculative decoding for 2x speed improvement
  • Optimized for both CPU and GPU inference

Frequently Asked Questions

Q: What makes this model unique?

The model's key innovation is its ability to maintain Whisper's accuracy while significantly reducing computational requirements through targeted knowledge distillation and architecture optimization.

Q: What are the recommended use cases?

It's ideal for production environments requiring fast, accurate English speech recognition, particularly for both short-form and long-form audio transcription tasks. It's especially suitable for applications where computational efficiency is crucial.

The first platform built for prompt engineering