Kokoro-82M

Maintained By
hexgrad

Kokoro-82M

PropertyValue
Parameter Count82 Million
LicenseApache
ArchitectureStyleTTS 2 + ISTFTNet
Training Cost$1000 (1000 A100 GPU hours)
Paper ReferenceStyleTTS 2 Paper

What is Kokoro-82M?

Kokoro-82M is a lightweight, open-weight text-to-speech (TTS) model that delivers high-quality voice synthesis across 8 languages with 54 distinct voices. Despite its relatively small size of 82 million parameters, it achieves performance comparable to larger models while maintaining efficiency and cost-effectiveness. The model utilizes a decoder-only architecture based on StyleTTS 2 and ISTFTNet frameworks.

Implementation Details

The model is built on a hybrid architecture combining StyleTTS 2 and ISTFTNet, focusing on decoder-only implementation without diffusion or encoder components. It was trained on hundreds of hours of permissive/non-copyrighted audio data, including public domain content and synthetic audio generated by closed TTS models. The training process cost approximately $1000 using A100 80GB GPUs over 1000 hours.

  • Supports 8 languages including English, Spanish, French, Hindi, Italian, Portuguese, Japanese, and Chinese
  • Implements 54 distinct voice profiles
  • Uses IPA phoneme labels for improved pronunciation accuracy
  • Trained exclusively on permissive audio data

Core Capabilities

  • Multi-language text-to-speech synthesis
  • Voice style transfer and control
  • Efficient inference with modest computational requirements
  • Production-ready with Apache license compatibility
  • Easy integration through pip installation

Frequently Asked Questions

Q: What makes this model unique?

Kokoro-82M stands out for its excellent performance-to-size ratio, supporting multiple languages and voices while maintaining a relatively small parameter count of 82M. Its Apache license and cost-efficient training make it particularly attractive for both personal and production deployments.

Q: What are the recommended use cases?

The model is suitable for a wide range of applications, from personal projects to production environments. Its lightweight nature makes it ideal for deployments where computational resources are limited, while its multi-language support enables global applications.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.