E5-small-v2

Property	Value
Parameter Count	33.4M
Architecture	12-layer Transformer
Paper	Text Embeddings by Weakly-Supervised Contrastive Pre-training
License	MIT

What is e5-small-v2?

E5-small-v2 is a compact but powerful text embedding model developed for semantic similarity tasks and information retrieval. It uses weakly-supervised contrastive pre-training to generate high-quality text embeddings with a dimension size of 384. The model requires specific prefixes ("query:" or "passage:") for optimal performance and can handle sequences up to 512 tokens in length.

Implementation Details

The model implements a 12-layer transformer architecture with 33.4M parameters. It uses average pooling over the last hidden states and requires text normalization for optimal performance. The model supports both PyTorch and Sentence Transformers frameworks, making it versatile for different implementation needs.

Embedding dimension: 384
Maximum sequence length: 512 tokens
Requires input prefixes: "query:" or "passage:"
Supports multiple frameworks including PyTorch and Sentence Transformers

Core Capabilities

Semantic text similarity assessment
Information retrieval and passage ranking
Text classification tasks
Clustering applications
Cross-lingual semantic analysis (English only)

Frequently Asked Questions

Q: What makes this model unique?

The model's unique strength lies in its efficient architecture (only 33.4M parameters) while maintaining strong performance across various tasks. It uses a distinctive prefix-based input format and weakly-supervised contrastive pre-training approach.

Q: What are the recommended use cases?

The model excels in semantic similarity tasks, passage retrieval, and information retrieval applications. It's particularly well-suited for applications requiring efficient text embeddings while maintaining high accuracy.

e5-small-v2