Kokoro-82M-v1.0-ONNX
Property | Value |
---|---|
Parameter Count | 82 Million |
Model Type | Text-to-Speech (TTS) |
Framework | ONNX |
Model URL | https://huggingface.co/onnx-community/Kokoro-82M-v1.0-ONNX |
What is Kokoro-82M-v1.0-ONNX?
Kokoro-82M-v1.0-ONNX is a frontier text-to-speech model that stands out for its impressive capabilities despite its relatively compact size of 82 million parameters. The model supports multiple voice options and offers various quantization levels for optimal deployment scenarios.
Implementation Details
The model is implemented in ONNX format and supports both JavaScript and Python implementations. It features a context length of 512 tokens and operates at a 24kHz sample rate for audio generation. The architecture includes support for different quantization options ranging from FP32 to 4-bit precision, enabling flexible deployment options based on performance requirements.
- Multiple voice profiles including American and British accents for both male and female voices
- Various quantization options from FP32 (326MB) down to 4-bit (154MB)
- Supports both synchronous and asynchronous inference
- Includes style vector processing for voice characteristics
Core Capabilities
- High-quality speech synthesis with multiple voice options
- Support for 28 different voice profiles across different accents and genders
- Efficient memory usage through various quantization options
- Easy integration through both JavaScript and Python APIs
- Customizable speech generation parameters including speed and style
Frequently Asked Questions
Q: What makes this model unique?
The model's ability to maintain high-quality speech synthesis while being relatively small (82M parameters) and supporting multiple quantization options makes it particularly suitable for both production and resource-constrained environments.
Q: What are the recommended use cases?
The model is ideal for applications requiring high-quality text-to-speech conversion, particularly where resource efficiency is important. Use cases include virtual assistants, content accessibility tools, and automated voice-over generation.