Solon-embeddings-large-0.1
Property | Value |
---|---|
Parameter Count | 560M |
License | MIT |
Language | French |
Tensor Type | F32 |
What is Solon-embeddings-large-0.1?
Solon-embeddings-large-0.1 is a state-of-the-art French language embedding model that achieves superior performance across various NLP tasks. Developed by OrdalieTech, it outperforms other multilingual models like cohere/embed-multilingual-v3 and OpenAI's ada-002, achieving a mean score of 0.749 on MTEB benchmarks.
Implementation Details
The model is optimized for French language processing and requires a specific format for queries - adding "query:" before the input text improves retrieval performance. It uses the XLM-RoBERTa architecture and is available in both base and large variants.
- Achieves 92.7% Recall@500 on mMARCO-fr passage retrieval
- Superior performance on classification tasks (89.26% accuracy on MTOP Domain Classification)
- Strong STS (Semantic Textual Similarity) capabilities with 83.31% Spearman correlation on STS22
Core Capabilities
- Text Classification
- Semantic Search and Retrieval
- Clustering
- Bitext Mining
- Semantic Textual Similarity
- Reranking
Frequently Asked Questions
Q: What makes this model unique?
The model's specialized focus on French language processing and its comprehensive evaluation across 9 French benchmarks sets it apart. It consistently outperforms other multilingual models in French-specific tasks.
Q: What are the recommended use cases?
The model excels in semantic search, document classification, and similarity assessment tasks. It's particularly effective for French language applications requiring precise semantic understanding and retrieval capabilities.