Gemma-2-9B-Chinese-Chat

Maintained By
shenzhi-wang

Gemma-2-9B-Chinese-Chat

PropertyValue
Parameter Count9.24B
Base Modelgoogle/gemma-2-9b-it
Context Length8K tokens
LicenseGemma License
Training FrameworkLLaMA-Factory
PaperORPO Paper

What is Gemma-2-9B-Chinese-Chat?

Gemma-2-9B-Chinese-Chat is the first instruction-tuned language model built upon Google's Gemma-2-9b-it, specifically optimized for Chinese and English users. The model was fine-tuned using the ORPO (Reference-free Monolithic Preference Optimization) approach with over 100K preference pairs, making it particularly effective for bilingual applications.

Implementation Details

The model leverages flash-attn-2 instead of the default eager attention used in Gemma2, significantly improving performance. It was trained for 3 epochs using a learning rate of 3e-6 with cosine scheduling, warmup ratio of 0.1, and a global batch size of 128. The implementation uses paged_adamw_32bit optimizer with full parameter fine-tuning.

  • Context length: 8K tokens
  • ORPO beta (λ): 0.05
  • Training framework: LLaMA-Factory
  • Optimization: Full parameter fine-tuning

Core Capabilities

  • Improved Chinese-English response consistency
  • Enhanced roleplay capabilities
  • Advanced tool-using functionality
  • Improved mathematical reasoning
  • Reduced language mixing in responses

Frequently Asked Questions

Q: What makes this model unique?

This is the first instruction-tuned model based on Gemma-2-9b-it specifically optimized for Chinese and English users, featuring significant improvements in reducing Chinese-English mixing and enhanced capabilities in roleplay and tool usage.

Q: What are the recommended use cases?

The model excels in bilingual applications, roleplay scenarios, tool-using tasks, and mathematical reasoning. It's particularly suitable for applications requiring consistent language handling between Chinese and English.

The first platform built for prompt engineering