MARS5-TTS

Property	Value
License	AGPL-3.0
Model Type	Text-to-Speech
Architecture	Two-stage AR-NAR pipeline
Parameters	~1.2B (AR: 750M, NAR: 450M)

What is MARS5-TTS?

MARS5-TTS is a cutting-edge text-to-speech model developed by CAMB.AI that represents a significant advancement in voice synthesis technology. The model employs a novel two-stage AR-NAR (Autoregressive-Non-Autoregressive) pipeline with a distinctive NAR component, enabling it to generate highly natural speech with impressive prosody control using just 5 seconds of reference audio.

Implementation Details

The model architecture consists of two main components: an AR model with 750M parameters and a NAR model with 450M parameters. It utilizes byte-pair encoding for both text and encodec codes, and requires at least 20GB of GPU VRAM for inference. The system operates at 24kHz and supports both shallow and deep cloning approaches for voice synthesis.

Two-stage processing pipeline with AR and NAR components
Byte-pair encoding tokenization for text and audio features
Support for both fast (shallow) and high-quality (deep) cloning
Prosody control through punctuation and capitalization

Core Capabilities

Voice cloning from just 5 seconds of reference audio
Natural prosody generation for diverse scenarios including sports commentary and anime
Support for reference transcripts in deep cloning mode

Frequently Asked Questions

Q: What makes this model unique?

MARS5-TTS stands out for its ability to generate high-quality speech with natural prosody using minimal reference audio. Its novel NAR component and the ability to control speech characteristics through simple text formatting make it particularly versatile.

Q: What are the recommended use cases?

The model is well-suited for applications requiring high-quality voice cloning, including dubbing, content creation, and personalized speech synthesis. It's particularly effective for scenarios requiring dynamic prosody like sports commentary or character voice acting.

MARS5-TTS

MARS5-TTS

What is MARS5-TTS?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models