wav2vec2-large-voxrex-swedish
Property | Value |
---|---|
Parameter Count | 315M |
License | CC0 1.0 |
Paper | arXiv:2205.03026 |
WER (Common Voice) | 8.49% |
What is wav2vec2-large-voxrex-swedish?
This is a fine-tuned Swedish speech recognition model based on the Wav2Vec 2.0 architecture. Developed by KBLab, it represents a significant advancement in Swedish automatic speech recognition (ASR), leveraging the VoxRex large model architecture and trained on a diverse dataset including Swedish radio broadcasts, NST, and Common Voice data.
Implementation Details
The model underwent extensive training with 120,000 updates on a combined dataset of NST and CommonVoice. It requires input audio to be sampled at 16kHz and employs the CTC (Connectionist Temporal Classification) approach for speech recognition.
- Architecture: Wav2Vec 2.0 large architecture with 315M parameters
- Performance: 8.49% WER on Common Voice test set (7.37% with 4-gram LM)
- Exceptional performance on NST + Common Voice test set with 2.5% WER
Core Capabilities
- Swedish speech recognition with state-of-the-art accuracy
- Robust performance across different speech contexts
- Direct integration with HuggingFace Transformers library
- Support for both raw model usage and language model enhancement
Frequently Asked Questions
Q: What makes this model unique?
The model's uniqueness lies in its specialized training for Swedish language processing and its impressive performance metrics, particularly the 2.5% WER on combined test sets. It's built on the robust VoxRex architecture and has been extensively fine-tuned on diverse Swedish speech data.
Q: What are the recommended use cases?
This model is ideal for Swedish speech recognition tasks, particularly in scenarios requiring high accuracy transcription of clear speech. It's especially suitable for applications in media transcription, voice command systems, and general Swedish language processing tasks.