msmarco-bert-co-condensor

Maintained By
sentence-transformers

msmarco-bert-co-condensor

PropertyValue
Parameter Count109M
LicenseApache 2.0
PaperUnsupervised Corpus Aware Language Model Pre-training for Dense Passage Retrieval
Embedding Dimension768

What is msmarco-bert-co-condensor?

The msmarco-bert-co-condensor is a specialized transformer model designed for semantic search applications. It's a port of the Luyu/co-condenser-marco-retriever model adapted for sentence-transformers framework, capable of mapping sentences and paragraphs into 768-dimensional dense vector spaces.

Implementation Details

Built on the BERT architecture, this model implements CLS token pooling and achieves competitive performance on the MS MARCO benchmark with an MRR@10 score of 35.51. The model can be easily integrated using either sentence-transformers or HuggingFace Transformers libraries.

  • Maximum sequence length of 256 tokens
  • Optimized for passage retrieval tasks
  • Implements efficient CLS token pooling strategy

Core Capabilities

  • Dense passage retrieval with strong performance on TREC benchmarks
  • Semantic similarity computation
  • Document ranking and retrieval
  • Cross-encoder style text matching

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its unsupervised corpus-aware pre-training approach and its strong performance on various retrieval benchmarks, particularly achieving 35.51 MRR@10 on MS MARCO Dev without requiring additional document title information.

Q: What are the recommended use cases?

The model is particularly well-suited for semantic search applications, passage ranking, and information retrieval tasks. It's especially effective for scenarios requiring dense vector representations of text for similarity matching.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.