E5-base-v2
Property | Value |
---|---|
Parameter Count | 109M |
Architecture | 12 layers with 768-dim embeddings |
License | MIT |
Paper | Text Embeddings by Weakly-Supervised Contrastive Pre-training |
What is e5-base-v2?
E5-base-v2 is a powerful text embedding model developed through weakly-supervised contrastive pre-training. It's specifically designed for generating high-quality text embeddings that excel in semantic similarity tasks, information retrieval, and classification tasks. The model represents an optimal balance between model size and performance, with 109M parameters structured across 12 layers.
Implementation Details
The model requires specific text prefixes ("query:" or "passage:") for optimal performance and can process sequences up to 512 tokens. It uses average pooling over the last hidden states to generate embeddings, which are then normalized using L2 normalization.
- Embedding dimension: 768
- Maximum sequence length: 512 tokens
- Architecture: 12-layer transformer
- Training method: Weakly-supervised contrastive learning
Core Capabilities
- Semantic text similarity scoring
- Information retrieval and passage ranking
- Text classification tasks
- Clustering and semantic search
- Supports integration with Sentence Transformers library
Frequently Asked Questions
Q: What makes this model unique?
The model's distinctive feature is its use of prefix-tuned inputs ("query:" and "passage:") and its optimization through weakly-supervised contrastive pre-training, resulting in robust performance across various text similarity tasks.
Q: What are the recommended use cases?
The model excels in semantic search, passage retrieval, and text similarity tasks. It's particularly effective for asymmetric tasks like open QA and ad-hoc information retrieval when using the appropriate query/passage prefixes.