wav2vec2-xls-r-300m-hebrew

Maintained By
imvladikon

wav2vec2-xls-r-300m-hebrew

PropertyValue
Parameter Count315M
Model TypeAutomatic Speech Recognition
Base Modelfacebook/wav2vec2-xls-r-300m
Best WER23.18%

What is wav2vec2-xls-r-300m-hebrew?

This is a specialized Hebrew automatic speech recognition model based on the wav2vec2-xls-r-300m architecture. It underwent a two-stage fine-tuning process using both small high-quality and large diverse datasets, totaling 97 hours of Hebrew speech data.

Implementation Details

The model was trained in two distinct stages: first on a 28-hour high-quality dataset, followed by fine-tuning on a larger 69-hour dataset combined with weakly labeled data. The training utilized Adam optimizer with a linear learning rate scheduler and mixed precision training.

  • First stage training achieved 17.73% WER on clean data
  • Second stage training improved generalization with 23.18% WER on diverse data
  • Uses native AMP for efficient training
  • Trained with batch size 64 across multiple GPUs

Core Capabilities

  • Hebrew speech recognition with robust performance
  • Handles diverse audio sources and qualities
  • Optimized for real-world applications
  • Supports batch processing for efficient inference

Frequently Asked Questions

Q: What makes this model unique?

The model's two-stage training approach, combining high-quality and diverse datasets, makes it particularly robust for real-world Hebrew ASR applications. The initial training on clean data followed by exposure to more varied sources helps balance accuracy and generalization.

Q: What are the recommended use cases?

This model is ideal for Hebrew speech recognition tasks requiring robust performance across different audio qualities. It's particularly suitable for applications needing to handle diverse speech patterns and recording conditions, though performance may vary based on audio quality.

The first platform built for prompt engineering