wav2vec2-large-xlsr-basque
Property | Value |
---|---|
License | Apache-2.0 |
Author | cahya |
Test WER | 12.44% |
Base Model | wav2vec2-large-xlsr-53 |
What is wav2vec2-large-xlsr-basque?
wav2vec2-large-xlsr-basque is a specialized speech recognition model fine-tuned specifically for the Basque language. Built upon Facebook's wav2vec2-large-xlsr-53 architecture, this model demonstrates impressive performance in automatic speech recognition (ASR) tasks, achieving a Word Error Rate (WER) of 12.44% on the Common Voice Basque test dataset.
Implementation Details
The model operates on 16kHz audio input and utilizes the powerful Wav2Vec2 architecture combined with CTC (Connectionist Temporal Classification) for speech recognition. It's implemented using PyTorch and the Transformers library, making it easily accessible for deployment in production environments.
- Built on the wav2vec2-large-xlsr-53 foundation model
- Requires 16kHz audio input sampling rate
- Implements CTC-based speech recognition
- Fine-tuned on the Basque Common Voice dataset
Core Capabilities
- Direct speech-to-text transcription without language model
- Batch processing support for multiple audio files
- Robust performance on Basque speech recognition
- Compatible with standard audio processing libraries like torchaudio
Frequently Asked Questions
Q: What makes this model unique?
This model is specifically optimized for Basque language speech recognition, leveraging the powerful XLSR-53 architecture while achieving a competitive 12.44% WER on the test set. It's one of the few models specifically trained for Basque ASR.
Q: What are the recommended use cases?
The model is ideal for Basque speech transcription tasks, including audio content indexing, subtitle generation, and voice command systems. It's particularly suitable for applications requiring real-time transcription without the need for a separate language model.