paraphrase-multilingual-mpnet-base-v2

Maintained By
sentence-transformers

paraphrase-multilingual-mpnet-base-v2

PropertyValue
Parameter Count278M
LicenseApache 2.0
PaperSentence-BERT Paper
Supported Languages50+ languages
Output Dimensions768

What is paraphrase-multilingual-mpnet-base-v2?

This is a powerful multilingual sentence transformer model developed by the sentence-transformers team. It's designed to convert sentences and paragraphs from over 50 languages into fixed-size dense vector representations (embeddings) of 768 dimensions. The model builds upon the MPNet architecture and is specifically optimized for paraphrase detection and semantic similarity tasks across multiple languages.

Implementation Details

The model implements a two-stage architecture combining an XLMRobertaModel transformer with a pooling layer. It processes text with a maximum sequence length of 128 tokens and employs mean pooling to generate sentence embeddings. The model can be easily integrated using either the sentence-transformers library or HuggingFace's transformers library.

  • Architecture: XLMRobertaModel with mean pooling
  • Input Processing: Supports sequences up to 128 tokens
  • Output: 768-dimensional dense vectors
  • Framework Support: PyTorch, TensorFlow, ONNX, OpenVINO

Core Capabilities

  • Multilingual sentence embedding generation
  • Semantic similarity computation
  • Cross-lingual text matching
  • Document clustering
  • Semantic search implementations

Frequently Asked Questions

Q: What makes this model unique?

The model's key strength lies in its multilingual capabilities, supporting over 50 languages while maintaining high-quality semantic representations. It's particularly effective for cross-lingual applications and has been extensively downloaded (over 2 million times) by the community.

Q: What are the recommended use cases?

The model excels in applications requiring semantic similarity matching across languages, including multilingual search engines, cross-lingual document clustering, and paraphrase detection. It's particularly suitable for production environments due to its support for multiple inference frameworks.

The first platform built for prompt engineering