e5-base

Maintained By
intfloat

E5-base Text Embedding Model

PropertyValue
Parameters109M
Architecture12 layers, 768d embeddings
LicenseMIT
PaperText Embeddings by Weakly-Supervised Contrastive Pre-training

What is e5-base?

E5-base is a powerful text embedding model developed through weakly-supervised contrastive pre-training. It's designed to create high-quality semantic representations of text, particularly excelling in tasks like semantic similarity, information retrieval, and text classification. The model requires specific text prefixes ("query:" or "passage:") to maintain optimal performance.

Implementation Details

The model architecture consists of 12 transformer layers with an embedding dimension of 768. It uses average pooling over the last hidden states to generate embeddings, which are then normalized using L2 normalization. The model was trained using a contrastive learning approach with a low temperature of 0.01 for the InfoNCE loss.

  • Supports both PyTorch and Sentence-Transformers frameworks
  • Maximum sequence length of 512 tokens
  • Optimized for English language content
  • Achieves strong performance on MTEB benchmark tasks

Core Capabilities

  • Text Retrieval and Semantic Search
  • Semantic Similarity Assessment
  • Document Classification
  • Clustering and Information Organization
  • Paraphrase Detection

Frequently Asked Questions

Q: What makes this model unique?

E5-base stands out for its efficient architecture and strong performance across various tasks while maintaining a relatively small parameter count of 109M. Its unique prefix-based approach ("query:" and "passage:") enables optimal performance across different use cases.

Q: What are the recommended use cases?

The model excels in semantic search, document retrieval, and similarity matching tasks. It's particularly well-suited for applications requiring symmetric (text-to-text comparison) and asymmetric (query-to-document matching) capabilities.

The first platform built for prompt engineering