Kokoro-82M-v1.0-ONNX

Property	Value
Parameter Count	82 Million
Model Type	Text-to-Speech (TTS)
Framework	ONNX
Model URL	https://huggingface.co/onnx-community/Kokoro-82M-v1.0-ONNX

What is Kokoro-82M-v1.0-ONNX?

Kokoro-82M-v1.0-ONNX is a frontier text-to-speech model that stands out for its impressive capabilities despite its relatively compact size of 82 million parameters. The model supports multiple voice options and offers various quantization levels for optimal deployment scenarios.

Implementation Details

The model is implemented in ONNX format and supports both JavaScript and Python implementations. It features a context length of 512 tokens and operates at a 24kHz sample rate for audio generation. The architecture includes support for different quantization options ranging from FP32 to 4-bit precision, enabling flexible deployment options based on performance requirements.

Multiple voice profiles including American and British accents for both male and female voices
Various quantization options from FP32 (326MB) down to 4-bit (154MB)
Supports both synchronous and asynchronous inference
Includes style vector processing for voice characteristics

Core Capabilities

High-quality speech synthesis with multiple voice options
Support for 28 different voice profiles across different accents and genders
Efficient memory usage through various quantization options
Easy integration through both JavaScript and Python APIs
Customizable speech generation parameters including speed and style

Frequently Asked Questions

Q: What makes this model unique?

The model's ability to maintain high-quality speech synthesis while being relatively small (82M parameters) and supporting multiple quantization options makes it particularly suitable for both production and resource-constrained environments.

Q: What are the recommended use cases?

The model is ideal for applications requiring high-quality text-to-speech conversion, particularly where resource efficiency is important. Use cases include virtual assistants, content accessibility tools, and automated voice-over generation.