wav2vec2-lg-xlsr-en-speech-emotion-recognition

Maintained By
ehcalabres

wav2vec2-lg-xlsr-en-speech-emotion-recognition

PropertyValue
Parameter Count316M
LicenseApache 2.0
ArchitectureWav2Vec2.0 XLSR
Accuracy82.23%

What is wav2vec2-lg-xlsr-en-speech-emotion-recognition?

This is a fine-tuned speech emotion recognition model based on wav2vec2-large-xlsr-53-english, specifically trained to classify emotions in spoken English. The model can identify 8 distinct emotions: angry, calm, disgust, fearful, happy, neutral, sad, and surprised, making it valuable for various applications in emotion analysis and human-computer interaction.

Implementation Details

The model was trained using the RAVDESS dataset, comprising 1440 audio samples. Training utilized Adam optimizer with a learning rate of 0.0001, mixed precision training, and a linear learning rate scheduler over 3 epochs. The implementation achieved significant improvement in accuracy from 13.59% to 82.23% through the training process.

  • Native AMP mixed precision training
  • Gradient accumulation steps: 2
  • Batch size: 8 (4 per step)
  • Training epochs: 3

Core Capabilities

  • Multi-class emotion classification across 8 categories
  • High accuracy (82.23%) on evaluation set
  • Support for English language audio processing
  • Real-time emotion detection capabilities

Frequently Asked Questions

Q: What makes this model unique?

This model uniquely combines the powerful wav2vec2 architecture with emotion recognition capabilities, achieving high accuracy while maintaining practical usability. Its fine-tuning on the RAVDESS dataset makes it particularly effective for English speech emotion analysis.

Q: What are the recommended use cases?

The model is ideal for applications in conversational AI, mental health monitoring, customer service analysis, and human-computer interaction where emotion detection from speech is crucial. It's particularly suitable for real-time emotion classification in English speech.

The first platform built for prompt engineering