tts_en_fastpitch

Maintained By
nvidia

NVIDIA FastPitch Text-to-Speech Model

PropertyValue
Parameters45M
LicenseCC-BY-4.0
LanguageEnglish (US)
Sample Rate22050Hz
Research PaperFastPitch: Parallel Text-to-speech with Pitch Prediction

What is tts_en_fastpitch?

NVIDIA FastPitch is a state-of-the-art text-to-speech model that employs a fully-parallel transformer architecture for generating high-quality speech with precise prosody control. Developed by NVIDIA, this model represents a significant advancement in speech synthesis technology, offering both speed and quality improvements over traditional approaches.

Implementation Details

The model is built on the NeMo toolkit and utilizes a transformer-based architecture with unsupervised speech-text alignment. It generates mel spectrograms that can be converted to audio using compatible vocoders like HifiGAN. The implementation is optimized for 22050Hz sampling rate and particularly excels at producing female English voices with American accents.

  • Fully-parallel architecture enabling faster inference compared to sequential models
  • Integrated pitch prediction and prosody control capabilities
  • Unsupervised speech-text alignment mechanism
  • Compatible with NVIDIA Riva for production deployment

Core Capabilities

  • High-quality spectrogram generation for English speech synthesis
  • Fine-grained control over pitch and individual phoneme duration
  • Batch processing of text inputs
  • Integration with popular vocoders for final audio generation
  • Production-ready deployment through NVIDIA Riva

Frequently Asked Questions

Q: What makes this model unique?

FastPitch stands out for its parallel processing architecture, which provides significantly faster inference times compared to traditional models like Tacotron2, while maintaining high-quality speech output with precise prosody control.

Q: What are the recommended use cases?

The model is ideal for applications requiring high-quality English speech synthesis, particularly for female American accent voices. It's especially suitable for production environments through NVIDIA Riva integration, making it perfect for virtual assistants, automated content reading, and accessibility applications.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.