wav2vec2-large-xlsr-persian-v3

Property	Value
Author	m3hrdadfi
Task	Automatic Speech Recognition
Language	Persian (Farsi)
WER Score	10.36%

What is wav2vec2-large-xlsr-persian-v3?

This is a fine-tuned version of Facebook's wav2vec2-large-xlsr-53 model, specifically optimized for Persian (Farsi) speech recognition. The model has been trained on the Common Voice dataset and demonstrates impressive performance with a Word Error Rate (WER) of 10.36%. It's designed to process audio input sampled at 16kHz and includes specialized normalization for Persian text.

Implementation Details

The model leverages the wav2vec2 architecture and includes custom preprocessing steps for Persian language handling. It requires specific packages including transformers, torchaudio, and custom normalizers for optimal performance.

Built on wav2vec2-large-xlsr-53 architecture
Includes specialized Persian text normalization
Optimized for 16kHz audio input
Supports batch processing for efficient inference

Core Capabilities

Accurate Persian speech recognition
Robust handling of various Persian dialects
Efficient batch processing of audio files
Custom text normalization for Persian language

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its specialized optimization for Persian language processing, achieving a competitive 10.36% WER. It includes custom normalization tools specifically designed for Persian text processing, making it particularly effective for Persian ASR tasks.

Q: What are the recommended use cases?

The model is ideal for Persian speech recognition applications, including transcription services, voice assistants, and automated subtitling systems. It's particularly suited for applications requiring 16kHz audio processing and those needing accurate Persian language handling.