wav2vec2-large

wav2vec2-large

facebook

Wav2vec2-large is Facebook's advanced speech recognition model, pretrained on 16kHz audio, achieving state-of-the-art WER of 1.8/3.3 on Librispeech clean/other tests.

PropertyValue
LicenseApache 2.0
FrameworkPyTorch
PaperView Paper
Downloads3,986

What is wav2vec2-large?

Wav2vec2-large is a powerful speech recognition model developed by Facebook that learns representations from raw audio data. It's designed to work with 16kHz sampled speech audio and employs a unique approach of masking speech input in the latent space while solving contrastive tasks over quantized latent representations.

Implementation Details

The model utilizes a transformer-based architecture and is pretrained on unlabeled speech data. It demonstrates remarkable performance even with limited labeled data - using just 10 minutes of labeled data and pretraining on 53k hours of unlabeled data achieves 4.8/8.2 WER on clean/other test sets.

  • Pretrained on 16kHz sampled speech audio
  • Employs masking in latent space
  • Uses joint learning of quantized representations
  • Achieves 1.8/3.3 WER on Librispeech clean/other test sets

Core Capabilities

  • Speech recognition with minimal labeled data
  • Robust representation learning from raw audio
  • Fine-tuning capability for specific ASR tasks
  • State-of-the-art performance on standard benchmarks

Frequently Asked Questions

Q: What makes this model unique?

The model's ability to learn powerful representations from speech audio alone and achieve state-of-the-art results with minimal labeled data sets it apart. It can match or exceed semi-supervised methods while being conceptually simpler.

Q: What are the recommended use cases?

The model is best suited for automatic speech recognition tasks after fine-tuning. It's particularly valuable in scenarios with limited labeled data but access to large amounts of unlabeled speech data.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026