Sarashina2.2-3B
Property | Value |
---|---|
Parameter Count | 3 billion |
Training Data | 10 trillion tokens |
License | MIT |
Author | SB Intuitions |
Model URL | https://huggingface.co/sbintuitions/sarashina2.2-3b |
What is sarashina2.2-3b?
Sarashina2.2-3B is a sophisticated language model developed by SB Intuitions, featuring approximately 3 billion parameters. This model represents a significant advancement in Japanese-English language processing, trained through a unique three-phase approach on a diverse dataset of 10 trillion tokens.
Implementation Details
The model utilizes a three-phase training process: initial training on Japanese, English, and code data from web corpora, followed by synthetic data training for mathematical and coding tasks, and finally, fine-tuning on specific application tasks. The model demonstrates exceptional performance across various Japanese language tasks, outperforming larger models in specific areas.
- Three-phase training methodology
- Trained on 10 trillion tokens including Japanese, English, and code data
- Optimized for both general language understanding and specialized tasks
- Implements bfloat16 precision for efficient computation
Core Capabilities
- Strong performance in Japanese QA tasks (NIILC: 63.0)
- Advanced mathematical reasoning (MGSM-ja: 63.6)
- Superior coding capabilities (JHumanEval: 39.0)
- Comprehensive Japanese language understanding (JMMLU: 52.7)
Frequently Asked Questions
Q: What makes this model unique?
The model's distinctive feature is its ability to outperform larger models (including Sarashina2-70B) in specific Japanese tasks, despite its smaller parameter count. Its three-phase training approach enables superior performance in specialized areas like math and coding.
Q: What are the recommended use cases?
The model is particularly suited for Japanese language processing tasks, mathematical reasoning, and code generation. However, users should note that this is a pre-trained model without instruction tuning, and may require additional fine-tuning for specific applications.