wav2vec2-large-xlsr-53-gender-recognition-librispeech

Property	Value
Parameter Count	316M
License	Apache 2.0
Base Model	facebook/wav2vec2-xls-r-300m
F1 Score	0.9993

What is wav2vec2-large-xlsr-53-gender-recognition-librispeech?

This is a specialized audio classification model fine-tuned for gender recognition in speech. Built on Facebook's Wav2Vec2 XLS-R architecture, it has been optimized using the LibriSpeech-clean-100 dataset to distinguish between male and female voices with remarkable accuracy.

Implementation Details

The model leverages the powerful wav2vec2-xls-r-300m architecture and has been trained using mixed-precision training with Native AMP. It processes audio inputs at 16kHz sampling rate and can handle various audio lengths through automatic padding or truncation to 5-second segments.

Training split: 70% training, 10% validation, 20% testing
Learning rate: 3e-05 with linear scheduler and 10% warmup
Batch size: 16 (4 base with 4 gradient accumulation steps)
Architecture: Transformer-based with 316M parameters

Core Capabilities

Binary gender classification (male/female) from audio input
Handles variable-length audio inputs
Exceptional accuracy with 0.9993 F1 score
Supports batch processing for efficient inference
Compatible with both CPU and CUDA devices

Frequently Asked Questions

Q: What makes this model unique?

The model combines the robust wav2vec2 architecture with specialized gender recognition capabilities, achieving near-perfect accuracy while maintaining the ability to process multilingual inputs thanks to the XLSR base model.

Q: What are the recommended use cases?

This model is ideal for automated gender recognition in audio processing pipelines, voice analysis systems, and research applications requiring gender classification from speech. It's particularly useful for processing LibriSpeech-style clean audio recordings.