sarashina2.2-3b

Maintained By
sbintuitions

Sarashina2.2-3B

PropertyValue
Parameter Count3 billion
Training Data10 trillion tokens
LicenseMIT
AuthorSB Intuitions
Model URLhttps://huggingface.co/sbintuitions/sarashina2.2-3b

What is sarashina2.2-3b?

Sarashina2.2-3B is a sophisticated language model developed by SB Intuitions, featuring approximately 3 billion parameters. This model represents a significant advancement in Japanese-English language processing, trained through a unique three-phase approach on a diverse dataset of 10 trillion tokens.

Implementation Details

The model utilizes a three-phase training process: initial training on Japanese, English, and code data from web corpora, followed by synthetic data training for mathematical and coding tasks, and finally, fine-tuning on specific application tasks. The model demonstrates exceptional performance across various Japanese language tasks, outperforming larger models in specific areas.

  • Three-phase training methodology
  • Trained on 10 trillion tokens including Japanese, English, and code data
  • Optimized for both general language understanding and specialized tasks
  • Implements bfloat16 precision for efficient computation

Core Capabilities

  • Strong performance in Japanese QA tasks (NIILC: 63.0)
  • Advanced mathematical reasoning (MGSM-ja: 63.6)
  • Superior coding capabilities (JHumanEval: 39.0)
  • Comprehensive Japanese language understanding (JMMLU: 52.7)

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its ability to outperform larger models (including Sarashina2-70B) in specific Japanese tasks, despite its smaller parameter count. Its three-phase training approach enables superior performance in specialized areas like math and coding.

Q: What are the recommended use cases?

The model is particularly suited for Japanese language processing tasks, mathematical reasoning, and code generation. However, users should note that this is a pre-trained model without instruction tuning, and may require additional fine-tuning for specific applications.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.