MERT-v1-95M

Property	Value
Parameter Count	95M
Architecture	Transformer (12 layers, 768 dimensions)
License	CC-BY-NC-4.0
Paper	arXiv:2306.00107
Sample Rate	24kHz
Feature Rate	75Hz

What is MERT-v1-95M?

MERT-v1-95M is a state-of-the-art music understanding model that leverages masked language modeling (MLM) pre-training on 20,000 hours of audio data. It represents a significant advancement in the m-a-p model family, introducing improved audio processing capabilities with higher frequency sampling and enhanced feature extraction.

Implementation Details

The model employs a transformer architecture with 12 layers and 768-dimensional features. It processes audio at 24kHz and outputs features at 75Hz, representing a significant upgrade from previous versions. The model utilizes 8 codebooks from encodec for pseudo-labels and implements MLM prediction with in-batch noise mixture.

Transformer-based architecture with 95M parameters
Pre-trained on 20,000 hours of music data
Supports 5-second context window during pre-training
Implements advanced MLM paradigm with noise mixture

Core Capabilities

High-quality music audio feature extraction
Support for music generation tasks
Flexible feature output from different transformer layers
Efficient processing with 75Hz feature rate

Frequently Asked Questions

Q: What makes this model unique?

MERT-v1-95M stands out for its use of encodec codebooks for pseudo-labels, higher audio frequency processing (24kHz), and significant training data volume (20K hours), making it more robust and versatile than previous versions.

Q: What are the recommended use cases?

The model is ideal for music understanding tasks, audio classification, and feature extraction for downstream music processing applications. It's particularly effective when you need high-quality music representations while working with limited computational resources.

MERT-v1-95M

MERT-v1-95M

What is MERT-v1-95M?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models

The first platform built for prompt engineering