Gemma2 9B CPT Sahabat-AI v1 Base
Property | Value |
---|---|
Parameter Count | 10.2B |
Languages | English, Indonesian, Javanese, Sundanese |
Context Length | 8192 tokens |
License | Gemma Community License |
Training Data | 50B tokens |
What is gemma2-9b-cpt-sahabatai-v1-base?
Sahabat-AI v1 base is a continued pre-trained language model built on the Gemma2 9B architecture, specifically optimized for Indonesian and regional languages. Co-initiated by GoTo Group and Indosat Ooredoo Hutchison, this model represents a significant advancement in multilingual AI capabilities for Southeast Asian languages.
Implementation Details
The model was trained using MosaicML Composer on 32 Nvidia H100 80GB GPUs over 7 days. It implements bfloat16 precision and uses a decoupled AdamW optimizer with weight stable decay scheduling. The training process involved a learning rate of 1.0e-5 and a global batch size of 256.
- Achieves state-of-the-art performance of 64.123% on overall regional language tasks
- Trained on a diverse dataset including Dolma Refined Web, Stack V2, and specialized regional language corpora
- Implements advanced tokenization using the Gemma-2-9B tokenizer
Core Capabilities
- Exceptional performance in Indonesian (60.040%), Javanese (69.882%), and Sundanese (62.446%) language tasks
- Strong multilingual understanding and generation capabilities
- Maintains competitive performance on English tasks with 19.62% average score
- Supports context length of up to 8192 tokens
Frequently Asked Questions
Q: What makes this model unique?
The model's primary strength lies in its exceptional performance across Indonesian and regional languages, significantly outperforming other models in Javanese and Sundanese language tasks while maintaining strong capabilities in Indonesian and English.
Q: What are the recommended use cases?
The model is particularly well-suited for applications requiring deep understanding of Indonesian, Javanese, and Sundanese languages, including text analysis, content generation, and language processing tasks in these languages. However, as a base model, it requires additional safety fine-tuning for production use.