Sarashina2.2-3B

Property	Value
Parameter Count	3 billion
Training Data	10 trillion tokens
License	MIT
Author	SB Intuitions
Model URL	https://huggingface.co/sbintuitions/sarashina2.2-3b

What is sarashina2.2-3b?

Sarashina2.2-3B is a sophisticated language model developed by SB Intuitions, featuring approximately 3 billion parameters. This model represents a significant advancement in Japanese-English language processing, trained through a unique three-phase approach on a diverse dataset of 10 trillion tokens.

Implementation Details

The model utilizes a three-phase training process: initial training on Japanese, English, and code data from web corpora, followed by synthetic data training for mathematical and coding tasks, and finally, fine-tuning on specific application tasks. The model demonstrates exceptional performance across various Japanese language tasks, outperforming larger models in specific areas.

Three-phase training methodology
Trained on 10 trillion tokens including Japanese, English, and code data
Optimized for both general language understanding and specialized tasks
Implements bfloat16 precision for efficient computation

Core Capabilities

Strong performance in Japanese QA tasks (NIILC: 63.0)
Advanced mathematical reasoning (MGSM-ja: 63.6)
Superior coding capabilities (JHumanEval: 39.0)
Comprehensive Japanese language understanding (JMMLU: 52.7)

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its ability to outperform larger models (including Sarashina2-70B) in specific Japanese tasks, despite its smaller parameter count. Its three-phase training approach enables superior performance in specialized areas like math and coding.

Q: What are the recommended use cases?

The model is particularly suited for Japanese language processing tasks, mathematical reasoning, and code generation. However, users should note that this is a pre-trained model without instruction tuning, and may require additional fine-tuning for specific applications.

sarashina2.2-3b