KR-SBERT-V40K-klueNLI-augSTS

Property	Value
Author	snunlp
Embedding Dimension	768
Downloads	27,989
Language	Korean
Max Sequence Length	128

What is KR-SBERT-V40K-klueNLI-augSTS?

KR-SBERT-V40K-klueNLI-augSTS is a specialized Korean Sentence-BERT model designed for semantic similarity tasks. It maps sentences and paragraphs to a 768-dimensional dense vector space, making it particularly effective for clustering and semantic search applications. The model represents a significant advancement in Korean language processing, achieving an impressive 86.28% accuracy in document classification tasks.

Implementation Details

The model is built on the BERT architecture and implements sentence transformers with mean pooling. It can be easily utilized through either the sentence-transformers library or HuggingFace Transformers framework. The architecture consists of a transformer layer with a maximum sequence length of 128 tokens and a pooling layer that performs mean pooling on token embeddings.

Implements both word embedding and sentence embedding capabilities
Uses mean pooling strategy for generating sentence representations
Supports batch processing and attention masking
Optimized for Korean language understanding

Core Capabilities

Sentence similarity computation
Document classification (86.28% accuracy)
Semantic search functionality
Feature extraction for downstream NLP tasks
Clustering of text data

Frequently Asked Questions

Q: What makes this model unique?

This model stands out due to its specialized optimization for Korean language processing and its superior performance in document classification tasks. It combines KLUE NLI data with augmented STS training, resulting in state-of-the-art performance among Korean SBERT models.

Q: What are the recommended use cases?

The model is particularly well-suited for: semantic similarity tasks in Korean text, document classification systems, information retrieval applications, and text clustering projects. It's especially effective for applications requiring fine-grained understanding of Korean sentence meanings.