gemma2-9b-cpt-sahabatai-v1-instruct

Maintained By
GoToCompany

Gemma2 9B CPT Sahabat-AI v1 Instruct

PropertyValue
Parameter Count9.24B
Model TypeDecoder
LanguagesEnglish, Indonesian, Javanese, Sundanese
LicenseGemma Community License
Context Length8192 tokens

What is gemma2-9b-cpt-sahabatai-v1-instruct?

Sahabat-AI v1 Instruct is an advanced multilingual language model specifically designed for Indonesian languages and dialects. Co-initiated by GoTo Group and Indosat Ooredoo Hutchison, this model has been fine-tuned on an extensive dataset of 448,000 Indonesian instruction pairs, along with 96,000 Javanese and 98,000 Sundanese instruction pairs, plus 129,000 English instruction pairs.

Implementation Details

Built on the Gemma2 architecture, this model leverages advanced decoder technology and has been fine-tuned through a combination of full parameter tuning and on-policy alignment. The training process involved 4 hours of fine-tuning and 2 hours of alignment on 8x H100-80GB GPUs.

  • Context length of 8192 tokens
  • BF16 tensor type optimization
  • Comprehensive multilingual capabilities
  • State-of-the-art performance on regional language benchmarks

Core Capabilities

  • Achieves 61.169% overall score on SEA HELM benchmark
  • 62.6% performance on IndoMMLU evaluation
  • Superior performance in Indonesian (64.154%), Javanese (64.439%), and Sundanese (54.913%) tasks
  • Maintains strong English language capabilities with 33.67% average score on standard benchmarks

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its specialized optimization for Indonesian languages and dialects, while maintaining strong performance across multiple languages. It's particularly notable for achieving state-of-the-art results on regional language benchmarks like SEA HELM and IndoMMLU.

Q: What are the recommended use cases?

The model is ideal for applications requiring multilingual capabilities in Southeast Asian languages, particularly Indonesian, Javanese, and Sundanese. It's suitable for tasks like question answering, sentiment analysis, translation, and abstractive summarization in these languages.

The first platform built for prompt engineering