sentence-transformers-multilingual-e5-small

Maintained By
beademiguelperez

Multilingual-E5-small Text Embeddings

PropertyValue
Parameter Count118M
LicenseMIT
PaperMultilingual E5 Text Embeddings: A Technical Report
Languages Supported100+

What is sentence-transformers-multilingual-e5-small?

Multilingual-E5-small is a compact yet powerful text embedding model designed for cross-lingual applications. Built on a 12-layer architecture with 384-dimensional embeddings, it supports over 100 languages and excels in semantic search, retrieval, and similarity tasks. The model utilizes an innovative prefix-based approach ("query:" and "passage:") for different types of input processing.

Implementation Details

The model was developed through a two-stage training process: first with contrastive pre-training using weak supervision on massive multilingual datasets (including mC4, CC News, NLLB), followed by supervised fine-tuning on high-quality labeled datasets. It incorporates specialized prefix tokens for different use cases, enabling optimal performance across various tasks.

  • Initialized from microsoft/Multilingual-MiniLM-L12-H384
  • Trained on 1B+ text pairs for contrastive learning
  • Fine-tuned on datasets like MS MARCO, NQ, and multilingual resources

Core Capabilities

  • Cross-lingual semantic search and retrieval
  • Text similarity assessment across 100+ languages
  • Bitext mining and parallel text alignment
  • Document classification and clustering
  • Semantic textual similarity (STS) tasks

Frequently Asked Questions

Q: What makes this model unique?

The model combines compact size (118M parameters) with strong multilingual capabilities, using innovative prefix-based encoding and achieving state-of-the-art performance on multilingual benchmarks like Mr. TyDi.

Q: What are the recommended use cases?

The model excels in cross-lingual information retrieval, semantic search, and text similarity tasks. Use "query:" prefix for symmetric tasks and both "query:" and "passage:" for asymmetric tasks like passage retrieval.

The first platform built for prompt engineering