Mixtral_34Bx2_MoE_60B

Property	Value
Parameter Count	60.8B
License	Apache 2.0
Tensor Type	BF16
Architecture	Mixture of Experts (MoE)

What is Mixtral_34Bx2_MoE_60B?

Mixtral_34Bx2_MoE_60B is a sophisticated Mixture of Experts (MoE) model that combines two 34B models to create a powerful 60.8B parameter language model. It achieved impressive scores on the Open LLM Leaderboard, including a 45.38% accuracy on IFEval and 41.21% on BBH benchmarks. The model supports both English and Chinese languages, making it versatile for multilingual applications.

Implementation Details

The model is built upon two base models: jondurbin/bagel-dpo-34b-v0.2 and SUSTech/SUS-Chat-34B. It utilizes BF16 precision and can be deployed on both GPU and CPU environments with appropriate configurations. The model supports various inference settings and includes built-in safety measures.

Supports both GPU and CPU deployment
Implements repetition penalty for better output quality
Configurable max token generation
Built-in tokenizer with customizable system prompts

Core Capabilities

Strong performance on multiple benchmarks (76.66 average score on old leaderboard)
Excellent results on HellaSwag (85.25%) and Winogrande (84.85%)
Capable of handling complex reasoning tasks
Bilingual support for English and Chinese
Efficient text generation with customizable parameters

Frequently Asked Questions

Q: What makes this model unique?

This model's unique strength lies in its Mixture of Experts architecture, combining two powerful 34B models to create a more capable system. It achieves strong performance across various benchmarks while maintaining multilingual capabilities.

Q: What are the recommended use cases?

The model excels in text generation, reasoning tasks, and multilingual applications. It's particularly well-suited for complex problem-solving, creative writing, and applications requiring both English and Chinese language processing.