NVIDIA FastPitch Text-to-Speech Model

Property	Value
Parameters	45M
License	CC-BY-4.0
Language	English (US)
Sample Rate	22050Hz
Research Paper	FastPitch: Parallel Text-to-speech with Pitch Prediction

What is tts_en_fastpitch?

NVIDIA FastPitch is a sophisticated text-to-speech model that employs a fully-parallel transformer architecture to generate high-quality speech with precise prosody control. Developed by NVIDIA, this model enables fine-grained control over pitch and individual phoneme duration, making it particularly effective for generating natural-sounding English speech with American accents.

Implementation Details

The model is implemented using the NVIDIA NeMo toolkit and PyTorch framework. It operates in two stages: first generating mel spectrograms from text, then requiring a vocoder (such as HiFiGAN) to convert these spectrograms into audible waveforms. The model leverages an unsupervised speech-text aligner for improved accuracy.

Fully-parallel architecture enabling faster inference compared to sequential models like Tacotron2
Prosody control through pitch contour prediction
Trained on LJSpeech dataset for 1000 epochs
Compatible with NVIDIA Riva for production deployment

Core Capabilities

Text-to-spectrogram generation at 22050Hz
Precise control over speech characteristics
Batch processing of text inputs
Integration with various vocoders
Enterprise-grade deployment support through Riva

Frequently Asked Questions

Q: What makes this model unique?

FastPitch stands out for its parallel processing architecture, which enables significantly faster generation compared to traditional TTS models, while maintaining high-quality output through its sophisticated pitch prediction and duration control mechanisms.

Q: What are the recommended use cases?

The model is ideal for applications requiring high-quality English speech synthesis, particularly where American female voices are needed. It's especially suitable for production environments through NVIDIA Riva integration, making it perfect for virtual assistants, automated customer service, and content accessibility solutions.

tts_en_fastpitch