nb-sbert-base
Property | Value |
---|---|
Parameter Count | 178M |
License | Apache 2.0 |
Architecture | BERT-based Sentence Transformer |
Language | Norwegian/English |
What is nb-sbert-base?
nb-sbert-base is a specialized sentence transformer model designed for Norwegian language processing, with cross-lingual capabilities between Norwegian and English. Built on the foundation of nb-bert-base, this model maps sentences and paragraphs to 768-dimensional dense vector space, enabling sophisticated semantic analysis and similarity computations.
Implementation Details
The model leverages the SentenceTransformers framework and was trained on a machine-translated version of the MNLI dataset. It employs mean pooling and achieves impressive performance scores, with a Pearson correlation of 0.8275 on similarity tasks.
- Trained using MultipleNegativesRankingLoss with a scale of 20.0
- Implements cosine similarity as the primary similarity function
- Supports batch processing with size 32
- Features automatic mean pooling of token embeddings
Core Capabilities
- Semantic similarity computation between sentences
- Cross-lingual sentence matching (Norwegian-English)
- Keyword extraction using KeyBERT integration
- Topic modeling with BERTopic
- Vector-based similarity search
Frequently Asked Questions
Q: What makes this model unique?
The model's ability to handle both Norwegian and English content while maintaining high performance in semantic similarity tasks makes it particularly valuable for Nordic NLP applications. Its versatility in supporting multiple downstream tasks like keyword extraction and topic modeling sets it apart.
Q: What are the recommended use cases?
The model excels in various applications including semantic search, document clustering, cross-lingual information retrieval, and automated keyword extraction. It's particularly useful for organizations working with Norwegian-English bilingual content or requiring sophisticated text analysis in Norwegian.