Gemma-2-9B-Chinese-Chat

Property	Value
Parameter Count	9.24B
Base Model	google/gemma-2-9b-it
Context Length	8K tokens
License	Gemma License
Training Framework	LLaMA-Factory
Paper	ORPO Paper

What is Gemma-2-9B-Chinese-Chat?

Gemma-2-9B-Chinese-Chat is the first instruction-tuned language model built upon Google's Gemma-2-9b-it, specifically optimized for Chinese and English users. The model was fine-tuned using the ORPO (Reference-free Monolithic Preference Optimization) approach with over 100K preference pairs, making it particularly effective for bilingual applications.

Implementation Details

The model leverages flash-attn-2 instead of the default eager attention used in Gemma2, significantly improving performance. It was trained for 3 epochs using a learning rate of 3e-6 with cosine scheduling, warmup ratio of 0.1, and a global batch size of 128. The implementation uses paged_adamw_32bit optimizer with full parameter fine-tuning.

Context length: 8K tokens
ORPO beta (λ): 0.05
Training framework: LLaMA-Factory
Optimization: Full parameter fine-tuning

Core Capabilities

Improved Chinese-English response consistency
Enhanced roleplay capabilities
Advanced tool-using functionality
Improved mathematical reasoning
Reduced language mixing in responses

Frequently Asked Questions

Q: What makes this model unique?

This is the first instruction-tuned model based on Gemma-2-9b-it specifically optimized for Chinese and English users, featuring significant improvements in reducing Chinese-English mixing and enhanced capabilities in roleplay and tool usage.

Q: What are the recommended use cases?

The model excels in bilingual applications, roleplay scenarios, tool-using tasks, and mathematical reasoning. It's particularly suitable for applications requiring consistent language handling between Chinese and English.