BERT Multilingual Passage Reranking Model
Property | Value |
---|---|
License | Apache 2.0 |
Paper | Research Paper |
Training Dataset | MS MARCO |
Languages Supported | 102 languages |
What is bert-multilingual-passage-reranking-msmarco?
This is a specialized BERT model designed for passage reranking across multiple languages. Built on top of BERT's multilingual architecture, it uses a densely connected neural network that processes the 768-dimensional [CLS] token to evaluate the relevance between search queries and passages. The model can improve search results by up to 100% compared to traditional methods.
Implementation Details
The model was trained for 400,000 steps on a TPU V3-8, taking approximately 12 hours. It processes both queries and passages within a 512 token limit, producing a relevance score between -10 and 10, with higher scores indicating better matches.
- Built on multilingual BERT architecture
- Trained on MS MARCO dataset with 400M query-passage pairs
- Inference time of approximately 300ms per query
- Compatible with Nboost Library for direct Elasticsearch integration
Core Capabilities
- Multilingual support for 102 languages
- Passage reranking for search optimization
- Query-passage relevance scoring
- Elasticsearch results enhancement
- Cross-lingual search capability
Frequently Asked Questions
Q: What makes this model unique?
The model's ability to handle 102 languages while maintaining performance comparable to English-only models makes it exceptional. It shows particularly strong performance in German despite being trained on English data.
Q: What are the recommended use cases?
This model is ideal for improving search engine results, especially in multilingual environments. It's particularly useful for reranking the top dozens of search results in enterprise search systems and content discovery platforms.