E5-base-v2

Property	Value
Parameter Count	109M
Architecture	12 layers with 768-dim embeddings
License	MIT
Paper	Text Embeddings by Weakly-Supervised Contrastive Pre-training

What is e5-base-v2?

E5-base-v2 is a powerful text embedding model developed through weakly-supervised contrastive pre-training. It's specifically designed for generating high-quality text embeddings that excel in semantic similarity tasks, information retrieval, and classification tasks. The model represents an optimal balance between model size and performance, with 109M parameters structured across 12 layers.

Implementation Details

The model requires specific text prefixes ("query:" or "passage:") for optimal performance and can process sequences up to 512 tokens. It uses average pooling over the last hidden states to generate embeddings, which are then normalized using L2 normalization.

Embedding dimension: 768
Maximum sequence length: 512 tokens
Architecture: 12-layer transformer
Training method: Weakly-supervised contrastive learning

Core Capabilities

Semantic text similarity scoring
Information retrieval and passage ranking
Text classification tasks
Clustering and semantic search
Supports integration with Sentence Transformers library

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its use of prefix-tuned inputs ("query:" and "passage:") and its optimization through weakly-supervised contrastive pre-training, resulting in robust performance across various text similarity tasks.

Q: What are the recommended use cases?

The model excels in semantic search, passage retrieval, and text similarity tasks. It's particularly effective for asymmetric tasks like open QA and ad-hoc information retrieval when using the appropriate query/passage prefixes.

e5-base-v2