paraphrase-multilingual-mpnet-base-v2
Property | Value |
---|---|
Parameter Count | 278M |
License | Apache 2.0 |
Paper | Sentence-BERT Paper |
Supported Languages | 50+ languages |
Output Dimensions | 768 |
What is paraphrase-multilingual-mpnet-base-v2?
This is a powerful multilingual sentence transformer model developed by the sentence-transformers team. It's designed to convert sentences and paragraphs from over 50 languages into fixed-size dense vector representations (embeddings) of 768 dimensions. The model builds upon the MPNet architecture and is specifically optimized for paraphrase detection and semantic similarity tasks across multiple languages.
Implementation Details
The model implements a two-stage architecture combining an XLMRobertaModel transformer with a pooling layer. It processes text with a maximum sequence length of 128 tokens and employs mean pooling to generate sentence embeddings. The model can be easily integrated using either the sentence-transformers library or HuggingFace's transformers library.
- Architecture: XLMRobertaModel with mean pooling
- Input Processing: Supports sequences up to 128 tokens
- Output: 768-dimensional dense vectors
- Framework Support: PyTorch, TensorFlow, ONNX, OpenVINO
Core Capabilities
- Multilingual sentence embedding generation
- Semantic similarity computation
- Cross-lingual text matching
- Document clustering
- Semantic search implementations
Frequently Asked Questions
Q: What makes this model unique?
The model's key strength lies in its multilingual capabilities, supporting over 50 languages while maintaining high-quality semantic representations. It's particularly effective for cross-lingual applications and has been extensively downloaded (over 2 million times) by the community.
Q: What are the recommended use cases?
The model excels in applications requiring semantic similarity matching across languages, including multilingual search engines, cross-lingual document clustering, and paraphrase detection. It's particularly suitable for production environments due to its support for multiple inference frameworks.