KoE5

Maintained By
nlpai-lab

KoE5

PropertyValue
Parameter Count560M
Base Modelintfloat/multilingual-e5-large
LicenseMIT
LanguagesKorean, English
Training Datasetko-triplet-v1.0

What is KoE5?

KoE5 is a state-of-the-art text embedding model specifically optimized for Korean language retrieval tasks. Built upon the multilingual-e5-large architecture, it has been fine-tuned on a dataset of over 700,000 Korean query-document pairs to deliver superior performance in text retrieval applications.

Implementation Details

The model utilizes CachedMultipleNegativesRankingLoss for training, with a batch size of 512 and learning rate of 1e-05. It processes input texts up to 512 tokens and requires specific prefixing ("query:" or "passage:") for optimal performance.

  • Transformer-based architecture with 560M parameters
  • F32 tensor type for precise computations
  • Trained using sentence-transformers framework
  • Supports both Korean and English text processing

Core Capabilities

  • Advanced text retrieval and similarity matching
  • Semantic search optimization
  • Cross-lingual embedding generation
  • Query-passage matching with high accuracy
  • Support for both symmetric and asymmetric tasks

Frequently Asked Questions

Q: What makes this model unique?

KoE5 stands out for its specialized optimization for Korean text retrieval, overwhelming most multilingual embedding models in performance. It's one of the best publicly available Korean retrieval models, offering state-of-the-art results on various benchmarks.

Q: What are the recommended use cases?

The model excels in passage retrieval for open QA, ad-hoc information retrieval, semantic similarity tasks, bitext mining, and paraphrase retrieval. It can also be used for feature extraction in classification and clustering tasks.

The first platform built for prompt engineering