snowflake-arctic-embed-l

Maintained By
Snowflake

Snowflake Arctic Embed L

PropertyValue
Parameter Count334M
Embedding Dimension1024
LicenseApache 2.0
PaperTechnical Report
Base Modele5-large-unsupervised

What is snowflake-arctic-embed-l?

Snowflake-arctic-embed-l is a state-of-the-art text embedding model that achieves superior performance in retrieval tasks, surpassing both open-source and closed-source alternatives with an MTEB Retrieval Score (NDCG@10) of 55.98. Built on the e5-large-unsupervised architecture, this model represents the largest variant in Snowflake's arctic-embed family.

Implementation Details

The model employs a multi-stage training pipeline, initially leveraging 400M samples of mixed public and proprietary web search data. It's further refined through focused training on 1M carefully curated triplets of query-positive-negative document samples. The architecture produces 1024-dimensional embeddings and supports a context window of 512 tokens.

  • Multi-stage training with large-scale pretraining and focused fine-tuning
  • Hard negative mining for optimal retrieval performance
  • Optimized for both accuracy and practical deployment
  • Compatible with popular frameworks including Sentence Transformers and Hugging Face

Core Capabilities

  • State-of-the-art retrieval performance (55.98 NDCG@10)
  • Robust text embedding generation for search and retrieval tasks
  • Direct replacement for closed-source embedding APIs
  • Efficient processing with 512 token context window

Frequently Asked Questions

Q: What makes this model unique?

The model achieves industry-leading retrieval performance through its innovative training approach and careful optimization, making it a viable alternative to closed-source solutions while maintaining full transparency and customization possibilities.

Q: What are the recommended use cases?

The model excels in enterprise search, document retrieval, semantic similarity tasks, and any application requiring high-quality text embeddings. It's particularly well-suited for production environments requiring state-of-the-art accuracy.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.