Mixtral_34Bx2_MoE_60B
Property | Value |
---|---|
Parameter Count | 60.8B |
License | Apache 2.0 |
Tensor Type | BF16 |
Architecture | Mixture of Experts (MoE) |
What is Mixtral_34Bx2_MoE_60B?
Mixtral_34Bx2_MoE_60B is a sophisticated Mixture of Experts (MoE) model that combines two 34B models to create a powerful 60.8B parameter language model. It achieved impressive scores on the Open LLM Leaderboard, including a 45.38% accuracy on IFEval and 41.21% on BBH benchmarks. The model supports both English and Chinese languages, making it versatile for multilingual applications.
Implementation Details
The model is built upon two base models: jondurbin/bagel-dpo-34b-v0.2 and SUSTech/SUS-Chat-34B. It utilizes BF16 precision and can be deployed on both GPU and CPU environments with appropriate configurations. The model supports various inference settings and includes built-in safety measures.
- Supports both GPU and CPU deployment
- Implements repetition penalty for better output quality
- Configurable max token generation
- Built-in tokenizer with customizable system prompts
Core Capabilities
- Strong performance on multiple benchmarks (76.66 average score on old leaderboard)
- Excellent results on HellaSwag (85.25%) and Winogrande (84.85%)
- Capable of handling complex reasoning tasks
- Bilingual support for English and Chinese
- Efficient text generation with customizable parameters
Frequently Asked Questions
Q: What makes this model unique?
This model's unique strength lies in its Mixture of Experts architecture, combining two powerful 34B models to create a more capable system. It achieves strong performance across various benchmarks while maintaining multilingual capabilities.
Q: What are the recommended use cases?
The model excels in text generation, reasoning tasks, and multilingual applications. It's particularly well-suited for complex problem-solving, creative writing, and applications requiring both English and Chinese language processing.