KoE5
Property | Value |
---|---|
Parameter Count | 560M |
Base Model | intfloat/multilingual-e5-large |
License | MIT |
Languages | Korean, English |
Training Dataset | ko-triplet-v1.0 |
What is KoE5?
KoE5 is a state-of-the-art text embedding model specifically optimized for Korean language retrieval tasks. Built upon the multilingual-e5-large architecture, it has been fine-tuned on a dataset of over 700,000 Korean query-document pairs to deliver superior performance in text retrieval applications.
Implementation Details
The model utilizes CachedMultipleNegativesRankingLoss for training, with a batch size of 512 and learning rate of 1e-05. It processes input texts up to 512 tokens and requires specific prefixing ("query:" or "passage:") for optimal performance.
- Transformer-based architecture with 560M parameters
- F32 tensor type for precise computations
- Trained using sentence-transformers framework
- Supports both Korean and English text processing
Core Capabilities
- Advanced text retrieval and similarity matching
- Semantic search optimization
- Cross-lingual embedding generation
- Query-passage matching with high accuracy
- Support for both symmetric and asymmetric tasks
Frequently Asked Questions
Q: What makes this model unique?
KoE5 stands out for its specialized optimization for Korean text retrieval, overwhelming most multilingual embedding models in performance. It's one of the best publicly available Korean retrieval models, offering state-of-the-art results on various benchmarks.
Q: What are the recommended use cases?
The model excels in passage retrieval for open QA, ad-hoc information retrieval, semantic similarity tasks, bitext mining, and paraphrase retrieval. It can also be used for feature extraction in classification and clustering tasks.