xlm-r-100langs-bert-base-nli-stsb-mean-tokens
Property | Value |
---|---|
Parameter Count | 278M |
License | Apache 2.0 |
Paper | Sentence-BERT Paper |
Architecture | XLM-RoBERTA with Mean Pooling |
What is xlm-r-100langs-bert-base-nli-stsb-mean-tokens?
This is a deprecated sentence transformer model designed to create 768-dimensional dense vector embeddings for sentences and paragraphs. Based on XLM-RoBERTA architecture, it supports 100 languages but is no longer recommended for production use due to its relatively low quality compared to newer alternatives.
Implementation Details
The model combines an XLM-RoBERTA base model with mean pooling strategy. It processes input text through the transformer architecture and applies mean pooling on the output to generate fixed-size sentence embeddings. The model can be easily used with both sentence-transformers library and HuggingFace Transformers.
- 768-dimensional output embeddings
- Supports 128 maximum sequence length
- Implements mean pooling strategy
- Compatible with multiple deep learning frameworks (PyTorch, TensorFlow, ONNX)
Core Capabilities
- Multilingual sentence embedding generation
- Semantic similarity comparison
- Text clustering
- Cross-lingual information retrieval
Frequently Asked Questions
Q: What makes this model unique?
This model was one of the early attempts at creating multilingual sentence embeddings supporting 100 languages. However, it's now deprecated and users should refer to newer models on SBERT.net.
Q: What are the recommended use cases?
While historically used for multilingual sentence similarity and clustering, it's recommended to use newer models like paraphrase-multilingual-mpnet-base-v2 instead for better performance.