tts_en_fastpitch

Maintained By
nvidia

NVIDIA FastPitch Text-to-Speech Model

PropertyValue
Parameters45M
LicenseCC-BY-4.0
LanguageEnglish (US)
Sample Rate22050Hz
Research PaperFastPitch: Parallel Text-to-speech with Pitch Prediction

What is tts_en_fastpitch?

NVIDIA FastPitch is a sophisticated text-to-speech model that employs a fully-parallel transformer architecture to generate high-quality speech with precise prosody control. Developed by NVIDIA, this model enables fine-grained control over pitch and individual phoneme duration, making it particularly effective for generating natural-sounding English speech with American accents.

Implementation Details

The model is implemented using the NVIDIA NeMo toolkit and PyTorch framework. It operates in two stages: first generating mel spectrograms from text, then requiring a vocoder (such as HiFiGAN) to convert these spectrograms into audible waveforms. The model leverages an unsupervised speech-text aligner for improved accuracy.

  • Fully-parallel architecture enabling faster inference compared to sequential models like Tacotron2
  • Prosody control through pitch contour prediction
  • Trained on LJSpeech dataset for 1000 epochs
  • Compatible with NVIDIA Riva for production deployment

Core Capabilities

  • Text-to-spectrogram generation at 22050Hz
  • Precise control over speech characteristics
  • Batch processing of text inputs
  • Integration with various vocoders
  • Enterprise-grade deployment support through Riva

Frequently Asked Questions

Q: What makes this model unique?

FastPitch stands out for its parallel processing architecture, which enables significantly faster generation compared to traditional TTS models, while maintaining high-quality output through its sophisticated pitch prediction and duration control mechanisms.

Q: What are the recommended use cases?

The model is ideal for applications requiring high-quality English speech synthesis, particularly where American female voices are needed. It's especially suitable for production environments through NVIDIA Riva integration, making it perfect for virtual assistants, automated customer service, and content accessibility solutions.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.