BERT Multilingual Passage Reranking Model

Property	Value
License	Apache 2.0
Paper	Research Paper
Training Dataset	MS MARCO
Languages Supported	102 languages

What is bert-multilingual-passage-reranking-msmarco?

This is a specialized BERT model designed for passage reranking across multiple languages. Built on top of BERT's multilingual architecture, it uses a densely connected neural network that processes the 768-dimensional [CLS] token to evaluate the relevance between search queries and passages. The model can improve search results by up to 100% compared to traditional methods.

Implementation Details

The model was trained for 400,000 steps on a TPU V3-8, taking approximately 12 hours. It processes both queries and passages within a 512 token limit, producing a relevance score between -10 and 10, with higher scores indicating better matches.

Built on multilingual BERT architecture
Trained on MS MARCO dataset with 400M query-passage pairs
Inference time of approximately 300ms per query
Compatible with Nboost Library for direct Elasticsearch integration

Core Capabilities

Multilingual support for 102 languages
Passage reranking for search optimization
Query-passage relevance scoring
Elasticsearch results enhancement
Cross-lingual search capability

Frequently Asked Questions

Q: What makes this model unique?

The model's ability to handle 102 languages while maintaining performance comparable to English-only models makes it exceptional. It shows particularly strong performance in German despite being trained on English data.

Q: What are the recommended use cases?

This model is ideal for improving search engine results, especially in multilingual environments. It's particularly useful for reranking the top dozens of search results in enterprise search systems and content discovery platforms.

bert-multilingual-passage-reranking-msmarco