multi-qa-mpnet-base-dot-v1

Maintained By
sentence-transformers

multi-qa-mpnet-base-dot-v1

PropertyValue
Parameter Count109M
Embedding Dimensions768
Training Data215M Q&A pairs
ArchitectureMPNet-based
Authorsentence-transformers

What is multi-qa-mpnet-base-dot-v1?

multi-qa-mpnet-base-dot-v1 is a specialized sentence embedding model designed for semantic search applications. Built on the MPNet architecture, it maps sentences and paragraphs to a 768-dimensional dense vector space, enabling efficient similarity comparisons between texts. The model has been extensively trained on over 215 million question-answer pairs from diverse sources including WikiAnswers, Stack Exchange, and MS MARCO.

Implementation Details

The model utilizes CLS pooling and dot-product similarity scoring, operating without normalized embeddings. It processes text sequences up to 512 tokens, though optimal performance is achieved with inputs under 250 tokens. Implementation is straightforward using either the sentence-transformers library or HuggingFace Transformers.

  • Trained using MultipleNegativesRankingLoss
  • Employs CLS-pooling strategy
  • Optimized for dot-product similarity calculations
  • Supports both PyTorch and ONNX formats

Core Capabilities

  • Semantic search optimization
  • Question-answer similarity matching
  • Dense vector representation of text
  • Cross-lingual support (primarily English)
  • Efficient processing of short to medium-length texts

Frequently Asked Questions

Q: What makes this model unique?

The model's extensive training on 215M diverse Q&A pairs and its optimization for dot-product similarity make it particularly effective for semantic search applications. Its architecture combines the power of MPNet with efficient CLS pooling, resulting in high-quality sentence embeddings.

Q: What are the recommended use cases?

The model excels in semantic search applications, question-answer matching, and document similarity tasks. It's particularly well-suited for applications requiring fast and accurate retrieval of relevant text passages based on semantic meaning rather than exact keyword matching.

The first platform built for prompt engineering