Gemma-2-9B-Chinese-Chat
Property | Value |
---|---|
Parameter Count | 9.24B |
Base Model | google/gemma-2-9b-it |
Context Length | 8K tokens |
License | Gemma License |
Training Framework | LLaMA-Factory |
Paper | ORPO Paper |
What is Gemma-2-9B-Chinese-Chat?
Gemma-2-9B-Chinese-Chat is the first instruction-tuned language model built upon Google's Gemma-2-9b-it, specifically optimized for Chinese and English users. The model was fine-tuned using the ORPO (Reference-free Monolithic Preference Optimization) approach with over 100K preference pairs, making it particularly effective for bilingual applications.
Implementation Details
The model leverages flash-attn-2 instead of the default eager attention used in Gemma2, significantly improving performance. It was trained for 3 epochs using a learning rate of 3e-6 with cosine scheduling, warmup ratio of 0.1, and a global batch size of 128. The implementation uses paged_adamw_32bit optimizer with full parameter fine-tuning.
- Context length: 8K tokens
- ORPO beta (λ): 0.05
- Training framework: LLaMA-Factory
- Optimization: Full parameter fine-tuning
Core Capabilities
- Improved Chinese-English response consistency
- Enhanced roleplay capabilities
- Advanced tool-using functionality
- Improved mathematical reasoning
- Reduced language mixing in responses
Frequently Asked Questions
Q: What makes this model unique?
This is the first instruction-tuned model based on Gemma-2-9b-it specifically optimized for Chinese and English users, featuring significant improvements in reducing Chinese-English mixing and enhanced capabilities in roleplay and tool usage.
Q: What are the recommended use cases?
The model excels in bilingual applications, roleplay scenarios, tool-using tasks, and mathematical reasoning. It's particularly suitable for applications requiring consistent language handling between Chinese and English.