BGE-m3-ko

Maintained By
dragonkue

BGE-m3-ko

PropertyValue
Parameter Count568M
Model TypeSentence Transformer
LicenseApache 2.0
Base ModelBAAI/bge-m3
PaperBGE M3-Embedding Paper

What is BGE-m3-ko?

BGE-m3-ko is a specialized Korean-optimized version of the BGE-M3 multilingual embedding model. It's designed to generate high-quality text embeddings for Korean and English content, with particular strength in handling longer text sequences. The model leverages the XLM-RoBERTa architecture and has been fine-tuned specifically for Korean language understanding while maintaining multilingual capabilities.

Implementation Details

The model implements a transformer-based architecture with 568M parameters, utilizing a maximum sequence length of 8192 tokens and producing 1024-dimensional embeddings. It employs both cosine similarity and dot product metrics for text similarity calculations, with demonstrated strong performance on Korean-specific benchmarks.

  • Maximum sequence length: 8192 tokens
  • Output embedding dimension: 1024
  • Optimized batch size: 32768
  • Learning rate: 3e-05 with linear scheduler

Core Capabilities

  • Achieves 74.56% accuracy on Korean embedding benchmarks (top-1)
  • Excellent performance on long-form text similarity tasks
  • Supports both Korean and English text processing
  • Optimized for various retrieval metrics (NDCG, MRR, MAP)

Frequently Asked Questions

Q: What makes this model unique?

The model's unique strength lies in its optimization for Korean language processing while maintaining multilingual capabilities. It shows particular strength in handling longer text sequences, making it especially valuable for document similarity and retrieval tasks in Korean-language contexts.

Q: What are the recommended use cases?

The model is particularly well-suited for: semantic search in Korean content, document similarity analysis, text classification tasks, and multilingual information retrieval applications. It performs especially well with longer text sequences, making it ideal for document-level analysis.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.