wav2vec2-large-xlsr-53-portuguese

Property	Value
Developer	Facebook
License	Apache 2.0
Downloads	1,621
Framework	PyTorch, JAX

What is wav2vec2-large-xlsr-53-portuguese?

This is a specialized speech recognition model developed by Facebook, specifically fine-tuned for Portuguese language processing. Built upon the wav2vec2-large-xlsr-53 architecture, it demonstrates robust performance with a 27.1% Word Error Rate (WER) on the Common Voice Portuguese test set.

Implementation Details

The model utilizes the wav2vec2 architecture with Cross-Language Speech Representations (XLSR). It processes audio input at 16kHz sampling rate and includes automatic resampling from 48kHz. The implementation supports batch processing and includes attention masking for optimal performance.

Supports PyTorch and JAX frameworks
Implements CTC (Connectionist Temporal Classification) for speech recognition
Includes built-in text normalization and preprocessing

Core Capabilities

Automatic Speech Recognition for Portuguese
Batch processing support
Automatic audio resampling
Robust handling of various audio inputs

Frequently Asked Questions

Q: What makes this model unique?

This model specializes in Portuguese speech recognition using the powerful wav2vec2 architecture, achieving competitive WER rates on standard benchmarks. It's particularly notable for its cross-lingual learning capabilities inherited from the XLSR architecture.

Q: What are the recommended use cases?

The model is ideal for Portuguese speech recognition tasks, including transcription services, voice commands, and audio content analysis. It's particularly suitable for applications requiring batch processing of audio files at scale.