Llama3-8B-Chinese-Chat

Maintained By
shenzhi-wang

Llama3-8B-Chinese-Chat

PropertyValue
Parameter Count8.03B
Context Length8K tokens
Base ModelMeta-Llama-3-8B-Instruct
LicenseLlama3 License
Training FrameworkLLaMA-Factory

What is Llama3-8B-Chinese-Chat?

Llama3-8B-Chinese-Chat is an advanced bilingual language model specifically fine-tuned for Chinese and English interactions. Built upon Meta's Llama-3-8B-Instruct model, it has been optimized using ORPO (Reference-free Monolithic Preference Optimization) on approximately 100K preference pairs, making it particularly effective for Chinese-language tasks while maintaining strong English capabilities.

Implementation Details

The model was trained using full parameter fine-tuning with specific hyperparameters including a learning rate of 3e-6, cosine scheduler, and a context length of 8192 tokens. The training process utilized the ORPO methodology with a beta value of 0.05 and a global batch size of 128.

  • Trained using paged_adamw_32bit optimizer
  • 2 epochs of training with 0.1 warmup ratio
  • BF16 precision for optimal performance
  • Implements flash attention for efficient processing

Core Capabilities

  • Advanced bilingual dialogue generation
  • Enhanced roleplay capabilities
  • Sophisticated function calling
  • Improved mathematical reasoning
  • Context-aware responses in both Chinese and English
  • Reduced tendency to mix languages in responses

Frequently Asked Questions

Q: What makes this model unique?

This model represents the first Llama3-based model specifically optimized for Chinese-English bilingual interactions using ORPO methodology. It significantly reduces issues with language mixing and improves upon the base model's capabilities in roleplay, function calling, and mathematical reasoning.

Q: What are the recommended use cases?

The model excels in bilingual conversations, creative writing, mathematical problem-solving, and roleplay scenarios. It's particularly well-suited for applications requiring natural Chinese language generation while maintaining English capabilities.

The first platform built for prompt engineering