Gemma2 9B CPT Sahabat-AI v1 Base

Property	Value
Parameter Count	10.2B
Languages	English, Indonesian, Javanese, Sundanese
Context Length	8192 tokens
License	Gemma Community License
Training Data	50B tokens

What is gemma2-9b-cpt-sahabatai-v1-base?

Sahabat-AI v1 base is a continued pre-trained language model built on the Gemma2 9B architecture, specifically optimized for Indonesian and regional languages. Co-initiated by GoTo Group and Indosat Ooredoo Hutchison, this model represents a significant advancement in multilingual AI capabilities for Southeast Asian languages.

Implementation Details

The model was trained using MosaicML Composer on 32 Nvidia H100 80GB GPUs over 7 days. It implements bfloat16 precision and uses a decoupled AdamW optimizer with weight stable decay scheduling. The training process involved a learning rate of 1.0e-5 and a global batch size of 256.

Achieves state-of-the-art performance of 64.123% on overall regional language tasks
Trained on a diverse dataset including Dolma Refined Web, Stack V2, and specialized regional language corpora
Implements advanced tokenization using the Gemma-2-9B tokenizer

Core Capabilities

Exceptional performance in Indonesian (60.040%), Javanese (69.882%), and Sundanese (62.446%) language tasks
Strong multilingual understanding and generation capabilities
Maintains competitive performance on English tasks with 19.62% average score
Supports context length of up to 8192 tokens

Frequently Asked Questions

Q: What makes this model unique?

The model's primary strength lies in its exceptional performance across Indonesian and regional languages, significantly outperforming other models in Javanese and Sundanese language tasks while maintaining strong capabilities in Indonesian and English.

Q: What are the recommended use cases?

The model is particularly well-suited for applications requiring deep understanding of Indonesian, Javanese, and Sundanese languages, including text analysis, content generation, and language processing tasks in these languages. However, as a base model, it requires additional safety fine-tuning for production use.