Kokoro-82M-v1.0-ONNX

Maintained By
onnx-community

Kokoro-82M-v1.0-ONNX

PropertyValue
Parameter Count82 Million
Model TypeText-to-Speech (TTS)
FrameworkONNX
Model URLhttps://huggingface.co/onnx-community/Kokoro-82M-v1.0-ONNX

What is Kokoro-82M-v1.0-ONNX?

Kokoro-82M-v1.0-ONNX is a frontier text-to-speech model that stands out for its impressive capabilities despite its relatively compact size of 82 million parameters. The model supports multiple voice options and offers various quantization levels for optimal deployment scenarios.

Implementation Details

The model is implemented in ONNX format and supports both JavaScript and Python implementations. It features a context length of 512 tokens and operates at a 24kHz sample rate for audio generation. The architecture includes support for different quantization options ranging from FP32 to 4-bit precision, enabling flexible deployment options based on performance requirements.

  • Multiple voice profiles including American and British accents for both male and female voices
  • Various quantization options from FP32 (326MB) down to 4-bit (154MB)
  • Supports both synchronous and asynchronous inference
  • Includes style vector processing for voice characteristics

Core Capabilities

  • High-quality speech synthesis with multiple voice options
  • Support for 28 different voice profiles across different accents and genders
  • Efficient memory usage through various quantization options
  • Easy integration through both JavaScript and Python APIs
  • Customizable speech generation parameters including speed and style

Frequently Asked Questions

Q: What makes this model unique?

The model's ability to maintain high-quality speech synthesis while being relatively small (82M parameters) and supporting multiple quantization options makes it particularly suitable for both production and resource-constrained environments.

Q: What are the recommended use cases?

The model is ideal for applications requiring high-quality text-to-speech conversion, particularly where resource efficiency is important. Use cases include virtual assistants, content accessibility tools, and automated voice-over generation.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.