Ruri-large-v2
Property | Value |
---|---|
Parameter Count | 337M |
Model Type | Sentence Transformer |
Output Dimensions | 1024 |
Max Sequence Length | 512 tokens |
License | Apache 2.0 |
Paper | arXiv:2409.07737 |
What is ruri-large-v2?
Ruri-large-v2 is a state-of-the-art Japanese text embedding model developed by cl-nagoya. It represents the latest iteration in the Ruri model series, designed specifically for Japanese language understanding. The model achieves an impressive 74.55% average score on JMTEB benchmarks, demonstrating significant improvements over its predecessors and competing models.
Implementation Details
Built on the sentence-transformers framework, Ruri-large-v2 utilizes a sophisticated architecture combining a BERT-based transformer with specialized pooling mechanisms. The model requires specific text prefixes ("クエリ: " for queries and "文章: " for passages) and processes sequences up to 512 tokens, outputting 1024-dimensional embeddings.
- Advanced pooling layer with mean tokens strategy
- Cosine similarity-based text comparison
- Optimized for Japanese language processing
- Requires minimal preprocessing with fugashi and sentencepiece
Core Capabilities
- Strong performance in retrieval tasks (76.34% on JMTEB retrieval benchmark)
- Excellent semantic textual similarity (83.17% on STS tasks)
- Robust classification capabilities (77.18% accuracy)
- High-quality reranking performance (93.21%)
Frequently Asked Questions
Q: What makes this model unique?
Ruri-large-v2 stands out for its specialized Japanese language understanding and state-of-the-art performance across multiple NLP tasks. It particularly excels in retrieval and semantic similarity tasks, outperforming many multilingual alternatives.
Q: What are the recommended use cases?
The model is ideal for Japanese text embedding tasks including semantic search, document similarity analysis, text classification, and information retrieval systems. It's particularly well-suited for applications requiring high-precision text matching and understanding in Japanese.