wav2vec2-large-xlsr-53-portuguese

Maintained By
facebook

wav2vec2-large-xlsr-53-portuguese

PropertyValue
DeveloperFacebook
LicenseApache 2.0
Downloads1,621
FrameworkPyTorch, JAX

What is wav2vec2-large-xlsr-53-portuguese?

This is a specialized speech recognition model developed by Facebook, specifically fine-tuned for Portuguese language processing. Built upon the wav2vec2-large-xlsr-53 architecture, it demonstrates robust performance with a 27.1% Word Error Rate (WER) on the Common Voice Portuguese test set.

Implementation Details

The model utilizes the wav2vec2 architecture with Cross-Language Speech Representations (XLSR). It processes audio input at 16kHz sampling rate and includes automatic resampling from 48kHz. The implementation supports batch processing and includes attention masking for optimal performance.

  • Supports PyTorch and JAX frameworks
  • Implements CTC (Connectionist Temporal Classification) for speech recognition
  • Includes built-in text normalization and preprocessing

Core Capabilities

  • Automatic Speech Recognition for Portuguese
  • Batch processing support
  • Automatic audio resampling
  • Robust handling of various audio inputs

Frequently Asked Questions

Q: What makes this model unique?

This model specializes in Portuguese speech recognition using the powerful wav2vec2 architecture, achieving competitive WER rates on standard benchmarks. It's particularly notable for its cross-lingual learning capabilities inherited from the XLSR architecture.

Q: What are the recommended use cases?

The model is ideal for Portuguese speech recognition tasks, including transcription services, voice commands, and audio content analysis. It's particularly suitable for applications requiring batch processing of audio files at scale.

The first platform built for prompt engineering