msmarco-bert-co-condensor

Property	Value
Parameter Count	109M
License	Apache 2.0
Paper	Unsupervised Corpus Aware Language Model Pre-training for Dense Passage Retrieval
Embedding Dimension	768

What is msmarco-bert-co-condensor?

The msmarco-bert-co-condensor is a specialized transformer model designed for semantic search applications. It's a port of the Luyu/co-condenser-marco-retriever model adapted for sentence-transformers framework, capable of mapping sentences and paragraphs into 768-dimensional dense vector spaces.

Implementation Details

Built on the BERT architecture, this model implements CLS token pooling and achieves competitive performance on the MS MARCO benchmark with an MRR@10 score of 35.51. The model can be easily integrated using either sentence-transformers or HuggingFace Transformers libraries.

Maximum sequence length of 256 tokens
Optimized for passage retrieval tasks
Implements efficient CLS token pooling strategy

Core Capabilities

Dense passage retrieval with strong performance on TREC benchmarks
Semantic similarity computation
Document ranking and retrieval
Cross-encoder style text matching

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its unsupervised corpus-aware pre-training approach and its strong performance on various retrieval benchmarks, particularly achieving 35.51 MRR@10 on MS MARCO Dev without requiring additional document title information.

Q: What are the recommended use cases?

The model is particularly well-suited for semantic search applications, passage ranking, and information retrieval tasks. It's especially effective for scenarios requiring dense vector representations of text for similarity matching.