modernbert-embed-base

Maintained By
nomic-ai

ModernBERT Embed Base

PropertyValue
AuthorNomic AI
Dimensions768 (reducible to 256)
PaperarXiv:2402.01613
Model TypeEmbedding Model

What is modernbert-embed-base?

ModernBERT Embed Base is an advanced embedding model that builds upon the ModernBERT architecture, specifically designed for generating high-quality text embeddings. The model supports both 768 and 256 dimensions through Matryoshka Representation Learning, offering flexibility in deployment while maintaining strong performance across various tasks.

Implementation Details

The model employs a multi-stage training pipeline, starting with a long-context BERT model and utilizing both unsupervised contrastive learning and supervised fine-tuning. It requires specific prefixes ('search_query:' and 'search_document:') for optimal performance and can be easily integrated using popular frameworks like Sentence Transformers and Hugging Face Transformers.

  • Supports dimension reduction from 768 to 256 with minimal performance loss
  • Achieves state-of-the-art results across 56 benchmark tasks
  • Compatible with multiple implementation frameworks including Transformers.js

Core Capabilities

  • Classification Performance: 74.31% (768d) / 72.40% (256d)
  • Clustering Efficiency: 44.98% (768d) / 43.82% (256d)
  • Strong Semantic Textual Similarity: 81.78% (768d)
  • Effective Reranking: 56.42% (768d)

Frequently Asked Questions

Q: What makes this model unique?

The model's ability to perform dimensionality reduction while maintaining performance, combined with its comprehensive training on both weakly-supervised and supervised datasets, sets it apart. The Matryoshka architecture allows for flexible deployment options without significant performance degradation.

Q: What are the recommended use cases?

The model excels in text embedding tasks such as semantic search, document classification, clustering, and similarity comparison. It's particularly effective for applications requiring efficient storage and computation while maintaining high accuracy.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.