msmarco-bert-co-condensor

Maintained By
sentence-transformers

msmarco-bert-co-condensor

PropertyValue
Parameter Count109M
LicenseApache 2.0
PaperUnsupervised Corpus Aware Language Model Pre-training for Dense Passage Retrieval
Embedding Dimension768

What is msmarco-bert-co-condensor?

The msmarco-bert-co-condensor is a specialized transformer model designed for semantic search applications. It's a port of the Luyu/co-condenser-marco-retriever model adapted for sentence-transformers framework, capable of mapping sentences and paragraphs into 768-dimensional dense vector spaces.

Implementation Details

Built on the BERT architecture, this model implements CLS token pooling and achieves competitive performance on the MS MARCO benchmark with an MRR@10 score of 35.51. The model can be easily integrated using either sentence-transformers or HuggingFace Transformers libraries.

  • Maximum sequence length of 256 tokens
  • Optimized for passage retrieval tasks
  • Implements efficient CLS token pooling strategy

Core Capabilities

  • Dense passage retrieval with strong performance on TREC benchmarks
  • Semantic similarity computation
  • Document ranking and retrieval
  • Cross-encoder style text matching

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its unsupervised corpus-aware pre-training approach and its strong performance on various retrieval benchmarks, particularly achieving 35.51 MRR@10 on MS MARCO Dev without requiring additional document title information.

Q: What are the recommended use cases?

The model is particularly well-suited for semantic search applications, passage ranking, and information retrieval tasks. It's especially effective for scenarios requiring dense vector representations of text for similarity matching.

The first platform built for prompt engineering