KR-SBERT-V40K-klueNLI-augSTS
Property | Value |
---|---|
Author | snunlp |
Embedding Dimension | 768 |
Downloads | 27,989 |
Language | Korean |
Max Sequence Length | 128 |
What is KR-SBERT-V40K-klueNLI-augSTS?
KR-SBERT-V40K-klueNLI-augSTS is a specialized Korean Sentence-BERT model designed for semantic similarity tasks. It maps sentences and paragraphs to a 768-dimensional dense vector space, making it particularly effective for clustering and semantic search applications. The model represents a significant advancement in Korean language processing, achieving an impressive 86.28% accuracy in document classification tasks.
Implementation Details
The model is built on the BERT architecture and implements sentence transformers with mean pooling. It can be easily utilized through either the sentence-transformers library or HuggingFace Transformers framework. The architecture consists of a transformer layer with a maximum sequence length of 128 tokens and a pooling layer that performs mean pooling on token embeddings.
- Implements both word embedding and sentence embedding capabilities
- Uses mean pooling strategy for generating sentence representations
- Supports batch processing and attention masking
- Optimized for Korean language understanding
Core Capabilities
- Sentence similarity computation
- Document classification (86.28% accuracy)
- Semantic search functionality
- Feature extraction for downstream NLP tasks
- Clustering of text data
Frequently Asked Questions
Q: What makes this model unique?
This model stands out due to its specialized optimization for Korean language processing and its superior performance in document classification tasks. It combines KLUE NLI data with augmented STS training, resulting in state-of-the-art performance among Korean SBERT models.
Q: What are the recommended use cases?
The model is particularly well-suited for: semantic similarity tasks in Korean text, document classification systems, information retrieval applications, and text clustering projects. It's especially effective for applications requiring fine-grained understanding of Korean sentence meanings.