Llama3-8B-Chinese-Chat-GGUF-8bit
Property | Value |
---|---|
Parameter Count | 8.03B |
Context Length | 8K tokens |
License | Llama-3 License |
Base Model | Meta-Llama-3-8B-Instruct |
Paper | ORPO Paper |
What is Llama3-8B-Chinese-Chat-GGUF-8bit?
Llama3-8B-Chinese-Chat-GGUF-8bit is a specialized 8-bit quantized version of the Llama3 model optimized for Chinese and English conversations. Built upon Meta's Llama-3-8B-Instruct model, it has been fine-tuned using the ORPO (Reference-free Monolithic Preference Optimization) method with approximately 100K preference pairs, making it particularly effective for Chinese language interactions while maintaining English capabilities.
Implementation Details
The model was trained using the LLaMA-Factory framework with specific hyperparameters including a learning rate of 5e-6, cosine scheduler, 0.1 warmup ratio, and 8192 context length. The training process involved full parameter fine-tuning using the paged_adamw_32bit optimizer with a global batch size of 128.
- 8-bit quantization for efficient deployment
- Trained on ~100K preference pairs
- Uses ORPO methodology for optimization
- Supports 8K context length
Core Capabilities
- Enhanced Chinese language understanding and generation
- Advanced roleplay capabilities
- Improved function calling abilities
- Strong mathematical reasoning
- Bilingual support (Chinese and English)
- Safety-aware responses
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its specialized optimization for Chinese language processing while maintaining English capabilities, achieved through ORPO fine-tuning with a large preference dataset. The 8-bit quantization makes it more efficient for deployment while preserving performance.
Q: What are the recommended use cases?
The model excels in Chinese-English bilingual conversations, roleplay scenarios, mathematical problem-solving, and function calling tasks. It's particularly suitable for applications requiring natural Chinese language interaction while maintaining English capabilities.