E5-base Text Embedding Model

Property	Value
Parameters	109M
Architecture	12 layers, 768d embeddings
License	MIT
Paper	Text Embeddings by Weakly-Supervised Contrastive Pre-training

What is e5-base?

E5-base is a powerful text embedding model developed through weakly-supervised contrastive pre-training. It's designed to create high-quality semantic representations of text, particularly excelling in tasks like semantic similarity, information retrieval, and text classification. The model requires specific text prefixes ("query:" or "passage:") to maintain optimal performance.

Implementation Details

The model architecture consists of 12 transformer layers with an embedding dimension of 768. It uses average pooling over the last hidden states to generate embeddings, which are then normalized using L2 normalization. The model was trained using a contrastive learning approach with a low temperature of 0.01 for the InfoNCE loss.

Supports both PyTorch and Sentence-Transformers frameworks
Maximum sequence length of 512 tokens
Optimized for English language content
Achieves strong performance on MTEB benchmark tasks

Core Capabilities

Text Retrieval and Semantic Search
Semantic Similarity Assessment
Document Classification
Clustering and Information Organization
Paraphrase Detection

Frequently Asked Questions

Q: What makes this model unique?

E5-base stands out for its efficient architecture and strong performance across various tasks while maintaining a relatively small parameter count of 109M. Its unique prefix-based approach ("query:" and "passage:") enables optimal performance across different use cases.

Q: What are the recommended use cases?

The model excels in semantic search, document retrieval, and similarity matching tasks. It's particularly well-suited for applications requiring symmetric (text-to-text comparison) and asymmetric (query-to-document matching) capabilities.

e5-base