F5-TTS-pt-br
Property | Value |
---|---|
Author | firstpixel |
License | cc-by-nc-4.0 |
Training Data | ~330 hours combined |
Model URL | https://huggingface.co/firstpixel/F5-TTS-pt-br |
What is F5-TTS-pt-br?
F5-TTS-pt-br is a specialized text-to-speech model designed specifically for Brazilian Portuguese. Built on the F5-TTS architecture, this model has been extensively trained on approximately 330 hours of speech data, including 3500 different speakers from the Mozilla Common Voice dataset and additional sources.
Implementation Details
The model underwent multiple training phases, including initial training on 130 hours of 5-second samples, followed by expansion to 200 hours of 20-second samples, and finally incorporating 3500 speakers from Mozilla Common Voice. The training utilized both A100 and T4 GPUs across multiple sessions to achieve optimal performance.
- Supports dynamic voice characteristics through reference audio
- Handles emotional speech synthesis with speaker markers
- Utilizes advanced vocoders like vocos and bigvgan
- Processes text with intelligent number-to-words conversion
Core Capabilities
- Natural Brazilian Portuguese speech synthesis
- Emotion-aware voice generation with reference audio
- Support for long-form text with automatic chunking
- Flexible deployment options with multiple device support
- Automatic audio concatenation and format conversion
Frequently Asked Questions
Q: What makes this model unique?
The model's specialization in Brazilian Portuguese and its extensive training on diverse speaker data sets it apart. It also features emotional speech capabilities and reference audio support, making it highly versatile for various applications.
Q: What are the recommended use cases?
The model is ideal for applications requiring high-quality Brazilian Portuguese speech synthesis, including audiobook creation, virtual assistants, and content localization. It's particularly useful when emotional variation in speech is needed.