wav2vec2-large-xls-r-300m-Urdu
Property | Value |
---|---|
Parameter Count | 315M parameters |
License | Apache 2.0 |
Base Model | facebook/wav2vec2-xls-r-300m |
Test WER | 39.89% (with LM) |
What is wav2vec2-large-xls-r-300m-Urdu?
This is a specialized speech recognition model fine-tuned for the Urdu language, based on Facebook's wav2vec2-xls-r-300m architecture. It represents a significant advancement in Urdu ASR technology, achieving a Word Error Rate (WER) of 39.89% with language model integration and a Character Error Rate (CER) of 16.7% on the Common Voice 8.0 test set.
Implementation Details
The model was trained using carefully optimized hyperparameters, including a learning rate of 0.0001, batch size of 64, and linear scheduler with 1000 warmup steps. Training extended over 200 epochs using the Adam optimizer, resulting in robust performance metrics.
- Trained on Mozilla Common Voice 8.0 Urdu dataset
- Implements wav2vec2 architecture with 315M parameters
- Uses PyTorch framework with Transformers library
- Supports 16kHz audio input
Core Capabilities
- Direct Urdu speech-to-text transcription
- Supports streaming inference with configurable chunk sizes
- Achieves 52.03% WER without LM, improving to 39.89% with LM
- Optimized for production deployment via Inference Endpoints
Frequently Asked Questions
Q: What makes this model unique?
This model represents a specialized adaptation of the XLS-R architecture for Urdu, offering state-of-the-art performance for Urdu speech recognition with both LM and non-LM configurations. Its architecture is optimized for production use while maintaining high accuracy.
Q: What are the recommended use cases?
The model is ideal for Urdu speech transcription applications, including media transcription, voice assistants, and automated subtitling. It's particularly effective when used with the language model for improved accuracy.