Llama3.1-8B-Chinese-Chat
Property | Value |
---|---|
Parameter Count | 8.03B |
Model Type | Instruction-tuned LLM |
Base Model | Meta-Llama-3.1-8B-Instruct |
License | Llama-3.1 |
Context Length | 128K (reported) |
Training Framework | LLaMA-Factory |
What is Llama3.1-8B-Chinese-Chat?
Llama3.1-8B-Chinese-Chat represents a significant advancement in bilingual language models, being the first model specifically fine-tuned for Chinese and English users based on Meta's Llama-3.1-8B-Instruct model. Developed by a team led by Shenzhi Wang and Yaowei Zheng, this model employs the ORPO (Reference-free Monolithic Preference Optimization with Odds Ratio) fine-tuning algorithm to enhance its capabilities.
Implementation Details
The model utilizes a sophisticated training approach with carefully chosen hyperparameters: 3 epochs, 3e-6 learning rate with cosine scheduling, 0.1 warmup ratio, and 8192 token context length. The training process involves full parameter fine-tuning using the paged_adamw_32bit optimizer with a global batch size of 128.
- BF16 precision for optimal performance
- GGUF versions available for efficient deployment
- Trained on >100K preference pairs
- Implements ORPO with beta value of 0.05
Core Capabilities
- Advanced roleplay functionality
- Robust function calling capabilities
- Enhanced mathematical reasoning
- Bilingual proficiency in Chinese and English
- Extended context handling up to 128K tokens
Frequently Asked Questions
Q: What makes this model unique?
This is the first Llama-3.1 model specifically optimized for Chinese and English users, featuring enhanced capabilities in roleplay, function calling, and mathematical reasoning through ORPO fine-tuning.
Q: What are the recommended use cases?
The model excels in bilingual applications, particularly for tasks requiring natural language understanding in both Chinese and English contexts, including conversation, mathematical problem-solving, and role-playing scenarios.