wav2vec2-xls-r-300m-cs-250
Property | Value |
---|---|
Parameter Count | 315M parameters |
License | Apache 2.0 |
Task | Automatic Speech Recognition |
Language | Czech |
Best WER | 7.3% (with LM) |
What is wav2vec2-xls-r-300m-cs-250?
This is a state-of-the-art Czech speech recognition model based on the XLS-R 300M architecture. It's been fine-tuned on over 250 hours of Czech speech data, including Common Voice 8.0 and various other Czech datasets. The model demonstrates impressive performance with a 7.3% Word Error Rate (WER) and 2.1% Character Error Rate (CER) on the test set.
Implementation Details
The model is built upon facebook/wav2vec2-xls-r-300m and has been optimized specifically for Czech language processing. It requires 16kHz audio input and can be used with or without a language model, though best results are achieved with LM integration.
- Native mixed precision training with AMP
- Trained for 5 epochs with Adam optimizer
- Learning rate: 0.0001 with linear scheduler
- Batch size: 32 (train) / 8 (eval)
Core Capabilities
- Direct speech-to-text transcription for Czech audio
- Handles various Czech speech patterns and accents
- Optimized for both accuracy and efficiency
- Compatible with popular speech processing frameworks
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its extensive training on diverse Czech datasets and impressive WER of 7.3% with language model integration. It's specifically optimized for Czech language processing while maintaining the robustness of the XLS-R architecture.
Q: What are the recommended use cases?
The model is ideal for Czech speech transcription tasks, particularly in applications requiring high accuracy. It's suitable for both academic and production environments, especially when dealing with 16kHz audio input.