wav2vec2-xls-r-300m-hebrew
Property | Value |
---|---|
Parameter Count | 315M |
Model Type | Automatic Speech Recognition |
Base Model | facebook/wav2vec2-xls-r-300m |
Best WER | 23.18% |
What is wav2vec2-xls-r-300m-hebrew?
This is a specialized Hebrew automatic speech recognition model based on the wav2vec2-xls-r-300m architecture. It underwent a two-stage fine-tuning process using both small high-quality and large diverse datasets, totaling 97 hours of Hebrew speech data.
Implementation Details
The model was trained in two distinct stages: first on a 28-hour high-quality dataset, followed by fine-tuning on a larger 69-hour dataset combined with weakly labeled data. The training utilized Adam optimizer with a linear learning rate scheduler and mixed precision training.
- First stage training achieved 17.73% WER on clean data
- Second stage training improved generalization with 23.18% WER on diverse data
- Uses native AMP for efficient training
- Trained with batch size 64 across multiple GPUs
Core Capabilities
- Hebrew speech recognition with robust performance
- Handles diverse audio sources and qualities
- Optimized for real-world applications
- Supports batch processing for efficient inference
Frequently Asked Questions
Q: What makes this model unique?
The model's two-stage training approach, combining high-quality and diverse datasets, makes it particularly robust for real-world Hebrew ASR applications. The initial training on clean data followed by exposure to more varied sources helps balance accuracy and generalization.
Q: What are the recommended use cases?
This model is ideal for Hebrew speech recognition tasks requiring robust performance across different audio qualities. It's particularly suitable for applications needing to handle diverse speech patterns and recording conditions, though performance may vary based on audio quality.