wav2vec2-xls-r-300m-hebrew

Property	Value
Parameter Count	315M
Model Type	Automatic Speech Recognition
Base Model	facebook/wav2vec2-xls-r-300m
Best WER	23.18%

What is wav2vec2-xls-r-300m-hebrew?

This is a specialized Hebrew automatic speech recognition model based on the wav2vec2-xls-r-300m architecture. It underwent a two-stage fine-tuning process using both small high-quality and large diverse datasets, totaling 97 hours of Hebrew speech data.

Implementation Details

The model was trained in two distinct stages: first on a 28-hour high-quality dataset, followed by fine-tuning on a larger 69-hour dataset combined with weakly labeled data. The training utilized Adam optimizer with a linear learning rate scheduler and mixed precision training.

First stage training achieved 17.73% WER on clean data
Second stage training improved generalization with 23.18% WER on diverse data
Uses native AMP for efficient training
Trained with batch size 64 across multiple GPUs

Core Capabilities

Hebrew speech recognition with robust performance
Handles diverse audio sources and qualities
Optimized for real-world applications
Supports batch processing for efficient inference

Frequently Asked Questions

Q: What makes this model unique?

The model's two-stage training approach, combining high-quality and diverse datasets, makes it particularly robust for real-world Hebrew ASR applications. The initial training on clean data followed by exposure to more varied sources helps balance accuracy and generalization.

Q: What are the recommended use cases?

This model is ideal for Hebrew speech recognition tasks requiring robust performance across different audio qualities. It's particularly suitable for applications needing to handle diverse speech patterns and recording conditions, though performance may vary based on audio quality.