miCSE: Mutual Information Contrastive Sentence Embedding
Property | Value |
---|---|
Parameter Count | 109M |
License | Apache 2.0 |
Paper | arXiv:2211.04928 |
Benchmark Score | 78.13% (STS Average) |
What is miCSE?
miCSE is an innovative sentence embedding model that leverages mutual information-based contrastive learning to create high-quality sentence representations, particularly excelling in few-shot scenarios. The model employs a unique approach of aligning attention patterns between different views during the contrastive learning process, making it especially efficient with limited training data.
Implementation Details
The model architecture is built on transformer-based technology with 109M parameters, utilizing attention mutual information (AMI) computation to enforce syntactic consistency across dropout augmented views. It processes input text to produce vector embeddings that capture semantic meaning, with sentence representations corresponding to the [CLS] token embedding.
- Trained on English Wikipedia sentences
- Supports variable-length inputs with maximum token length configuration
- Implements cosine similarity for sentence comparison
- Optimized for both full-shot and few-shot scenarios
Core Capabilities
- Sentence similarity computation
- Text retrieval tasks
- Semantic clustering
- Few-shot learning applications
- Integration with SentenceTransformers framework
Frequently Asked Questions
Q: What makes this model unique?
miCSE's distinctive feature is its ability to perform effectively in low-resource scenarios through its mutual information-based approach to contrastive learning. It achieves this by enforcing structural consistency across augmented views of sentences, making it particularly efficient with limited training data.
Q: What are the recommended use cases?
The model is ideal for applications requiring semantic text similarity, including document retrieval, sentence clustering, and semantic search. It's particularly valuable in scenarios where training data is limited, making it suitable for specialized domain applications.