wav2vec2-large-xlsr-53-gender-recognition-librispeech

Maintained By
alefiury

wav2vec2-large-xlsr-53-gender-recognition-librispeech

PropertyValue
Parameter Count316M
LicenseApache 2.0
Base Modelfacebook/wav2vec2-xls-r-300m
F1 Score0.9993

What is wav2vec2-large-xlsr-53-gender-recognition-librispeech?

This is a specialized audio classification model fine-tuned for gender recognition in speech. Built on Facebook's Wav2Vec2 XLS-R architecture, it has been optimized using the LibriSpeech-clean-100 dataset to distinguish between male and female voices with remarkable accuracy.

Implementation Details

The model leverages the powerful wav2vec2-xls-r-300m architecture and has been trained using mixed-precision training with Native AMP. It processes audio inputs at 16kHz sampling rate and can handle various audio lengths through automatic padding or truncation to 5-second segments.

  • Training split: 70% training, 10% validation, 20% testing
  • Learning rate: 3e-05 with linear scheduler and 10% warmup
  • Batch size: 16 (4 base with 4 gradient accumulation steps)
  • Architecture: Transformer-based with 316M parameters

Core Capabilities

  • Binary gender classification (male/female) from audio input
  • Handles variable-length audio inputs
  • Exceptional accuracy with 0.9993 F1 score
  • Supports batch processing for efficient inference
  • Compatible with both CPU and CUDA devices

Frequently Asked Questions

Q: What makes this model unique?

The model combines the robust wav2vec2 architecture with specialized gender recognition capabilities, achieving near-perfect accuracy while maintaining the ability to process multilingual inputs thanks to the XLSR base model.

Q: What are the recommended use cases?

This model is ideal for automated gender recognition in audio processing pipelines, voice analysis systems, and research applications requiring gender classification from speech. It's particularly useful for processing LibriSpeech-style clean audio recordings.

The first platform built for prompt engineering