KoE5

Property	Value
Parameter Count	560M
Base Model	intfloat/multilingual-e5-large
License	MIT
Languages	Korean, English
Training Dataset	ko-triplet-v1.0

What is KoE5?

KoE5 is a state-of-the-art text embedding model specifically optimized for Korean language retrieval tasks. Built upon the multilingual-e5-large architecture, it has been fine-tuned on a dataset of over 700,000 Korean query-document pairs to deliver superior performance in text retrieval applications.

Implementation Details

The model utilizes CachedMultipleNegativesRankingLoss for training, with a batch size of 512 and learning rate of 1e-05. It processes input texts up to 512 tokens and requires specific prefixing ("query:" or "passage:") for optimal performance.

Transformer-based architecture with 560M parameters
F32 tensor type for precise computations
Trained using sentence-transformers framework
Supports both Korean and English text processing

Core Capabilities

Advanced text retrieval and similarity matching
Semantic search optimization
Cross-lingual embedding generation
Query-passage matching with high accuracy
Support for both symmetric and asymmetric tasks

Frequently Asked Questions

Q: What makes this model unique?

KoE5 stands out for its specialized optimization for Korean text retrieval, overwhelming most multilingual embedding models in performance. It's one of the best publicly available Korean retrieval models, offering state-of-the-art results on various benchmarks.

Q: What are the recommended use cases?

The model excels in passage retrieval for open QA, ad-hoc information retrieval, semantic similarity tasks, bitext mining, and paraphrase retrieval. It can also be used for feature extraction in classification and clustering tasks.

KoE5

KoE5

What is KoE5?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models