ruri-large-v2

Maintained By
cl-nagoya

Ruri-large-v2

PropertyValue
Parameter Count337M
Model TypeSentence Transformer
Output Dimensions1024
Max Sequence Length512 tokens
LicenseApache 2.0
PaperarXiv:2409.07737

What is ruri-large-v2?

Ruri-large-v2 is a state-of-the-art Japanese text embedding model developed by cl-nagoya. It represents the latest iteration in the Ruri model series, designed specifically for Japanese language understanding. The model achieves an impressive 74.55% average score on JMTEB benchmarks, demonstrating significant improvements over its predecessors and competing models.

Implementation Details

Built on the sentence-transformers framework, Ruri-large-v2 utilizes a sophisticated architecture combining a BERT-based transformer with specialized pooling mechanisms. The model requires specific text prefixes ("クエリ: " for queries and "文章: " for passages) and processes sequences up to 512 tokens, outputting 1024-dimensional embeddings.

  • Advanced pooling layer with mean tokens strategy
  • Cosine similarity-based text comparison
  • Optimized for Japanese language processing
  • Requires minimal preprocessing with fugashi and sentencepiece

Core Capabilities

  • Strong performance in retrieval tasks (76.34% on JMTEB retrieval benchmark)
  • Excellent semantic textual similarity (83.17% on STS tasks)
  • Robust classification capabilities (77.18% accuracy)
  • High-quality reranking performance (93.21%)

Frequently Asked Questions

Q: What makes this model unique?

Ruri-large-v2 stands out for its specialized Japanese language understanding and state-of-the-art performance across multiple NLP tasks. It particularly excels in retrieval and semantic similarity tasks, outperforming many multilingual alternatives.

Q: What are the recommended use cases?

The model is ideal for Japanese text embedding tasks including semantic search, document similarity analysis, text classification, and information retrieval systems. It's particularly well-suited for applications requiring high-precision text matching and understanding in Japanese.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.