ModernBERT-large

Maintained By
answerdotai

ModernBERT-large

PropertyValue
Parameter Count395 million
Architecture28-layer Transformer encoder
Context Length8,192 tokens
Training Data2 trillion tokens (English + code)
LicenseApache 2.0
PaperarXiv:2412.13663

What is ModernBERT-large?

ModernBERT-large is a state-of-the-art bidirectional encoder model that modernizes the traditional BERT architecture with cutting-edge improvements. It represents a significant advancement in transformer-based models, incorporating features like Rotary Positional Embeddings (RoPE) and Local-Global Alternating Attention for efficient processing of long sequences.

Implementation Details

The model utilizes a pre-norm transformer architecture with GeGLU activations and was trained using StableAdamW optimizer with trapezoidal learning rate scheduling. It employs Flash Attention and unpadding techniques for optimal inference performance.

  • 28 transformer layers with modern architectural improvements
  • Native support for sequences up to 8,192 tokens
  • Trained on both text and code data for versatile applications
  • Implements efficient attention mechanisms for better performance

Core Capabilities

  • Achieves 90.4 on GLUE benchmark, second only to DeBERTa-v3-large
  • Superior performance in code retrieval tasks (59.5 on CodeSearchNet)
  • Excellent results in long-context retrieval (80.4 on MLDR_OOD)
  • Efficient processing of long documents for classification and semantic search

Frequently Asked Questions

Q: What makes this model unique?

ModernBERT-large combines recent architectural innovations with extensive pretraining on diverse data sources, resulting in superior performance across various tasks while maintaining efficient processing of long sequences. Its integration of RoPE and Flash Attention makes it particularly well-suited for modern applications requiring long-context understanding.

Q: What are the recommended use cases?

The model excels in tasks requiring long document processing, including document retrieval, classification, and semantic search. It's particularly effective for hybrid applications involving both code and text, making it ideal for technical documentation search, code retrieval, and general natural language understanding tasks.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.