hubert-large-speech-emotion-recognition-russian-dusha-finetuned
Property | Value |
---|---|
Parameter Count | 316M |
License | Apache 2.0 |
Base Model | facebook/hubert-large-ls960-ft |
Tensor Type | F32 |
What is hubert-large-speech-emotion-recognition-russian-dusha-finetuned?
This is a specialized speech emotion recognition model fine-tuned for the Russian language. Built on the HuBERT architecture, it's capable of identifying five distinct emotional states in Russian speech: neutral, angry, positive, sad, and other. The model achieves impressive performance metrics with 86% accuracy and 0.81 macro F1 score on the test set.
Implementation Details
The model was fine-tuned on the DUSHA dataset using an A100 GPU. The training process involved freezing most layers except for the projector, classifier, and 24 HuBERT encoder layers. Key training parameters include 2 epochs, batch size of 8, and a learning rate of 5e-5.
- Trained on half of the DUSHA dataset
- Gradient accumulation steps: 4
- No warm-up or decay in learning rate
- 16kHz audio sampling rate required
Core Capabilities
- Russian speech emotion classification into 5 categories
- Balanced accuracy of 0.76
- Support for audio input up to 10 seconds
- Efficient processing with PyTorch backend
Frequently Asked Questions
Q: What makes this model unique?
This model specializes in Russian language emotion recognition, which is relatively rare in the field. It builds upon the powerful HuBERT architecture while achieving high accuracy specifically for Russian speech emotional content.
Q: What are the recommended use cases?
The model is ideal for Russian speech analysis applications, including customer service automation, sentiment analysis in recorded conversations, and emotional content monitoring in Russian audio content.