metavoice-1B-v0.1

Maintained By
metavoiceio

MetaVoice-1B-v0.1

PropertyValue
Model Size1.2B parameters
LicenseApache 2.0
LanguageEnglish
Training Data100K hours of speech

What is metavoice-1B-v0.1?

MetaVoice-1B-v0.1 is a sophisticated text-to-speech model designed to generate natural and emotional speech. Built with 1.2 billion parameters and trained on 100,000 hours of speech data, it represents a significant advancement in voice synthesis technology, offering voice cloning capabilities with minimal training data requirements.

Implementation Details

The model employs a multi-stage architecture that includes a causal GPT model for predicting EnCodec tokens, a non-causal transformer for hierarchy prediction, and multi-band diffusion for waveform generation. It utilizes advanced techniques like Flash Decoding for KV-caching and supports efficient batching operations.

  • Custom BPE tokenizer with 512 tokens
  • Two-hierarchy prediction system with flattened interleaving
  • Condition-free sampling for enhanced cloning
  • DeepFilterNet for artifact cleanup

Core Capabilities

  • Emotional speech synthesis with natural rhythm and tone
  • Voice cloning with as little as 1 minute of training data
  • Zero-shot cloning for American & British voices (30s reference)
  • Long-form synthesis support
  • Batch processing of varying text lengths

Frequently Asked Questions

Q: What makes this model unique?

The model's ability to achieve high-quality voice cloning with minimal training data (1 minute) and zero-shot capabilities for specific accents sets it apart. It also maintains emotional fidelity without hallucinations, making it particularly reliable for production use.

Q: What are the recommended use cases?

The model is ideal for applications requiring emotional text-to-speech conversion, voice cloning services, and long-form content generation. It's particularly suitable for projects needing quick voice adaptation with minimal training data.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.