gemma2-9b-cpt-sahabatai-v1-base

Maintained By
GoToCompany

Gemma2 9B CPT Sahabat-AI v1 Base

PropertyValue
Parameter Count10.2B
LanguagesEnglish, Indonesian, Javanese, Sundanese
Context Length8192 tokens
LicenseGemma Community License
Training Data50B tokens

What is gemma2-9b-cpt-sahabatai-v1-base?

Sahabat-AI v1 base is a continued pre-trained language model built on the Gemma2 9B architecture, specifically optimized for Indonesian and regional languages. Co-initiated by GoTo Group and Indosat Ooredoo Hutchison, this model represents a significant advancement in multilingual AI capabilities for Southeast Asian languages.

Implementation Details

The model was trained using MosaicML Composer on 32 Nvidia H100 80GB GPUs over 7 days. It implements bfloat16 precision and uses a decoupled AdamW optimizer with weight stable decay scheduling. The training process involved a learning rate of 1.0e-5 and a global batch size of 256.

  • Achieves state-of-the-art performance of 64.123% on overall regional language tasks
  • Trained on a diverse dataset including Dolma Refined Web, Stack V2, and specialized regional language corpora
  • Implements advanced tokenization using the Gemma-2-9B tokenizer

Core Capabilities

  • Exceptional performance in Indonesian (60.040%), Javanese (69.882%), and Sundanese (62.446%) language tasks
  • Strong multilingual understanding and generation capabilities
  • Maintains competitive performance on English tasks with 19.62% average score
  • Supports context length of up to 8192 tokens

Frequently Asked Questions

Q: What makes this model unique?

The model's primary strength lies in its exceptional performance across Indonesian and regional languages, significantly outperforming other models in Javanese and Sundanese language tasks while maintaining strong capabilities in Indonesian and English.

Q: What are the recommended use cases?

The model is particularly well-suited for applications requiring deep understanding of Indonesian, Javanese, and Sundanese languages, including text analysis, content generation, and language processing tasks in these languages. However, as a base model, it requires additional safety fine-tuning for production use.

The first platform built for prompt engineering