msmarco-bert-co-condensor
Property | Value |
---|---|
Parameter Count | 109M |
License | Apache 2.0 |
Paper | Unsupervised Corpus Aware Language Model Pre-training for Dense Passage Retrieval |
Embedding Dimension | 768 |
What is msmarco-bert-co-condensor?
The msmarco-bert-co-condensor is a specialized transformer model designed for semantic search applications. It's a port of the Luyu/co-condenser-marco-retriever model adapted for sentence-transformers framework, capable of mapping sentences and paragraphs into 768-dimensional dense vector spaces.
Implementation Details
Built on the BERT architecture, this model implements CLS token pooling and achieves competitive performance on the MS MARCO benchmark with an MRR@10 score of 35.51. The model can be easily integrated using either sentence-transformers or HuggingFace Transformers libraries.
- Maximum sequence length of 256 tokens
- Optimized for passage retrieval tasks
- Implements efficient CLS token pooling strategy
Core Capabilities
- Dense passage retrieval with strong performance on TREC benchmarks
- Semantic similarity computation
- Document ranking and retrieval
- Cross-encoder style text matching
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its unsupervised corpus-aware pre-training approach and its strong performance on various retrieval benchmarks, particularly achieving 35.51 MRR@10 on MS MARCO Dev without requiring additional document title information.
Q: What are the recommended use cases?
The model is particularly well-suited for semantic search applications, passage ranking, and information retrieval tasks. It's especially effective for scenarios requiring dense vector representations of text for similarity matching.