metricgan-plus-voicebank

Maintained By
speechbrain

MetricGAN+ Voicebank Speech Enhancement Model

PropertyValue
LicenseApache 2.0
FrameworkPyTorch (SpeechBrain)
DatasetVoicebank-DEMAND
MetricsPESQ: 3.15, STOI: 93.0
PaperResearch Paper

What is metricgan-plus-voicebank?

MetricGAN+ Voicebank is a sophisticated speech enhancement model developed by SpeechBrain that leverages the MetricGAN+ architecture to improve the quality of noisy speech signals. This model has demonstrated exceptional performance with a PESQ score of 3.15 and STOI of 93.0, making it particularly effective for real-world audio enhancement applications.

Implementation Details

The model is implemented using PyTorch through the SpeechBrain framework and operates on 16kHz single-channel audio. It employs spectral mask enhancement techniques and can be easily deployed using SpeechBrain's enhancement pipeline.

  • Supports batch processing with automatic audio normalization
  • GPU-compatible for faster inference
  • Includes automatic resampling and mono channel selection
  • Built on the robust SpeechBrain framework

Core Capabilities

  • High-quality speech enhancement with state-of-the-art metrics
  • Real-time audio processing capability
  • Automatic audio format handling
  • Easy integration with existing audio pipelines

Frequently Asked Questions

Q: What makes this model unique?

The model's uniqueness lies in its implementation of the MetricGAN+ architecture, which directly optimizes for speech quality metrics like PESQ and STOI, resulting in superior enhancement performance compared to traditional approaches.

Q: What are the recommended use cases?

This model is ideal for applications requiring high-quality speech enhancement, such as teleconferencing systems, podcast production, voice recording cleanup, and general audio restoration tasks where noise reduction is crucial.

The first platform built for prompt engineering