Parler-TTS Mini Multilingual
Property | Value |
---|---|
Parameter Count | 938M |
License | Apache 2.0 |
Paper | Natural language guidance of high-fidelity text-to-speech |
Supported Languages | English, French, Spanish, Portuguese, Polish, German, Italian, Dutch |
What is parler-tts-mini-multilingual?
Parler-TTS Mini Multilingual is an advanced text-to-speech model that represents a significant evolution in multilingual speech synthesis. Built as an extension of Parler-TTS Mini, this model leverages transformer architecture to generate natural-sounding speech across 8 European languages. With 938M parameters, it has been trained on approximately 9,200 hours of non-English data and 580 hours of high-quality English data.
Implementation Details
The model utilizes two distinct tokenizers - one for prompts and another for descriptions. It implements an improved prompt tokenizer with a larger vocabulary and byte fallback capability, making it more adaptable to multiple languages. The model can be easily integrated using the HuggingFace Transformers library and supports both CPU and GPU inference.
- Trained on cleaned CML-TTS and Multilingual LibriSpeech datasets
- Uses advanced tokenization for better multilingual handling
- Supports natural language prompting for speech characteristic control
- Implements F32 tensor type for high-quality audio generation
Core Capabilities
- Multilingual speech synthesis across 8 European languages
- Control over speech features including gender, speaking rate, pitch, and reverberation
- High-quality audio generation with controllable characteristics
- Support for punctuation-based prosody control
- Streaming capability for efficient generation
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its comprehensive multilingual support and natural language-based control system. Unlike many TTS models, it's fully open-source and uses an innovative dual-tokenizer approach for enhanced language handling.
Q: What are the recommended use cases?
The model is ideal for applications requiring high-quality multilingual speech synthesis, including audiobook creation, voice assistants, and content localization. It's particularly useful when fine control over speech characteristics is needed through natural language prompting.