wav2vec2-large-xlsr-persian-v3
Property | Value |
---|---|
Author | m3hrdadfi |
Task | Automatic Speech Recognition |
Language | Persian (Farsi) |
WER Score | 10.36% |
What is wav2vec2-large-xlsr-persian-v3?
This is a fine-tuned version of Facebook's wav2vec2-large-xlsr-53 model, specifically optimized for Persian (Farsi) speech recognition. The model has been trained on the Common Voice dataset and demonstrates impressive performance with a Word Error Rate (WER) of 10.36%. It's designed to process audio input sampled at 16kHz and includes specialized normalization for Persian text.
Implementation Details
The model leverages the wav2vec2 architecture and includes custom preprocessing steps for Persian language handling. It requires specific packages including transformers, torchaudio, and custom normalizers for optimal performance.
- Built on wav2vec2-large-xlsr-53 architecture
- Includes specialized Persian text normalization
- Optimized for 16kHz audio input
- Supports batch processing for efficient inference
Core Capabilities
- Accurate Persian speech recognition
- Robust handling of various Persian dialects
- Efficient batch processing of audio files
- Custom text normalization for Persian language
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its specialized optimization for Persian language processing, achieving a competitive 10.36% WER. It includes custom normalization tools specifically designed for Persian text processing, making it particularly effective for Persian ASR tasks.
Q: What are the recommended use cases?
The model is ideal for Persian speech recognition applications, including transcription services, voice assistants, and automated subtitling systems. It's particularly suited for applications requiring 16kHz audio processing and those needing accurate Persian language handling.