mms-tts-cat

Maintained By
facebook

MMS-TTS-CAT: Catalan Text-to-Speech Model

PropertyValue
Parameter Count36.3M
LicenseCC-BY-NC 4.0
AuthorFacebook
PaperMMS Paper
Model TypeText-to-Speech (VITS)

What is mms-tts-cat?

MMS-TTS-CAT is part of Facebook's Massively Multilingual Speech (MMS) project, specifically designed for Catalan language text-to-speech synthesis. This model represents a significant advancement in making speech technology accessible for the Catalan language, utilizing the sophisticated VITS architecture for high-quality speech generation.

Implementation Details

The model employs a Conditional Variational Autoencoder (VAE) architecture with three main components: a posterior encoder, decoder, and conditional prior. It uses a Transformer-based text encoder combined with flow-based modules for spectrogram prediction, followed by a HiFi-GAN-style decoder for waveform generation.

  • Stochastic duration predictor for varied speech rhythm
  • Flow-based modules for improved expressiveness
  • End-to-end training with variational lower bound and adversarial losses
  • Non-deterministic output requiring seed fixing for reproducibility

Core Capabilities

  • High-quality Catalan speech synthesis from text input
  • Variable speech rhythm generation
  • Efficient inference using PyTorch
  • Integration with 🤗 Transformers library (v4.33+)

Frequently Asked Questions

Q: What makes this model unique?

This model is unique in its specific optimization for Catalan language speech synthesis, being part of the larger MMS project that aims to democratize speech technology across multiple languages. The VITS architecture enables high-quality, natural-sounding speech output with variable rhythm capabilities.

Q: What are the recommended use cases?

The model is ideal for applications requiring Catalan language text-to-speech conversion, such as accessibility tools, educational software, and content localization. However, due to its CC-BY-NC 4.0 license, it's restricted to non-commercial applications.

The first platform built for prompt engineering