hubert-large-speech-emotion-recognition-russian-dusha-finetuned

Property	Value
Parameter Count	316M
License	Apache 2.0
Base Model	facebook/hubert-large-ls960-ft
Tensor Type	F32

What is hubert-large-speech-emotion-recognition-russian-dusha-finetuned?

This is a specialized speech emotion recognition model fine-tuned for the Russian language. Built on the HuBERT architecture, it's capable of identifying five distinct emotional states in Russian speech: neutral, angry, positive, sad, and other. The model achieves impressive performance metrics with 86% accuracy and 0.81 macro F1 score on the test set.

Implementation Details

The model was fine-tuned on the DUSHA dataset using an A100 GPU. The training process involved freezing most layers except for the projector, classifier, and 24 HuBERT encoder layers. Key training parameters include 2 epochs, batch size of 8, and a learning rate of 5e-5.

Trained on half of the DUSHA dataset
Gradient accumulation steps: 4
No warm-up or decay in learning rate
16kHz audio sampling rate required

Core Capabilities

Russian speech emotion classification into 5 categories
Balanced accuracy of 0.76
Support for audio input up to 10 seconds
Efficient processing with PyTorch backend

Frequently Asked Questions

Q: What makes this model unique?

This model specializes in Russian language emotion recognition, which is relatively rare in the field. It builds upon the powerful HuBERT architecture while achieving high accuracy specifically for Russian speech emotional content.

Q: What are the recommended use cases?

The model is ideal for Russian speech analysis applications, including customer service automation, sentiment analysis in recorded conversations, and emotional content monitoring in Russian audio content.