wav2vec2-large-xlsr-53-portuguese
Property | Value |
---|---|
Developer | |
License | Apache 2.0 |
Downloads | 1,621 |
Framework | PyTorch, JAX |
What is wav2vec2-large-xlsr-53-portuguese?
This is a specialized speech recognition model developed by Facebook, specifically fine-tuned for Portuguese language processing. Built upon the wav2vec2-large-xlsr-53 architecture, it demonstrates robust performance with a 27.1% Word Error Rate (WER) on the Common Voice Portuguese test set.
Implementation Details
The model utilizes the wav2vec2 architecture with Cross-Language Speech Representations (XLSR). It processes audio input at 16kHz sampling rate and includes automatic resampling from 48kHz. The implementation supports batch processing and includes attention masking for optimal performance.
- Supports PyTorch and JAX frameworks
- Implements CTC (Connectionist Temporal Classification) for speech recognition
- Includes built-in text normalization and preprocessing
Core Capabilities
- Automatic Speech Recognition for Portuguese
- Batch processing support
- Automatic audio resampling
- Robust handling of various audio inputs
Frequently Asked Questions
Q: What makes this model unique?
This model specializes in Portuguese speech recognition using the powerful wav2vec2 architecture, achieving competitive WER rates on standard benchmarks. It's particularly notable for its cross-lingual learning capabilities inherited from the XLSR architecture.
Q: What are the recommended use cases?
The model is ideal for Portuguese speech recognition tasks, including transcription services, voice commands, and audio content analysis. It's particularly suitable for applications requiring batch processing of audio files at scale.