Mixtral_34Bx2_MoE_60B

Maintained By
cloudyu

Mixtral_34Bx2_MoE_60B

PropertyValue
Parameter Count60.8B
LicenseApache 2.0
Tensor TypeBF16
ArchitectureMixture of Experts (MoE)

What is Mixtral_34Bx2_MoE_60B?

Mixtral_34Bx2_MoE_60B is a sophisticated Mixture of Experts (MoE) model that combines two 34B models to create a powerful 60.8B parameter language model. It achieved impressive scores on the Open LLM Leaderboard, including a 45.38% accuracy on IFEval and 41.21% on BBH benchmarks. The model supports both English and Chinese languages, making it versatile for multilingual applications.

Implementation Details

The model is built upon two base models: jondurbin/bagel-dpo-34b-v0.2 and SUSTech/SUS-Chat-34B. It utilizes BF16 precision and can be deployed on both GPU and CPU environments with appropriate configurations. The model supports various inference settings and includes built-in safety measures.

  • Supports both GPU and CPU deployment
  • Implements repetition penalty for better output quality
  • Configurable max token generation
  • Built-in tokenizer with customizable system prompts

Core Capabilities

  • Strong performance on multiple benchmarks (76.66 average score on old leaderboard)
  • Excellent results on HellaSwag (85.25%) and Winogrande (84.85%)
  • Capable of handling complex reasoning tasks
  • Bilingual support for English and Chinese
  • Efficient text generation with customizable parameters

Frequently Asked Questions

Q: What makes this model unique?

This model's unique strength lies in its Mixture of Experts architecture, combining two powerful 34B models to create a more capable system. It achieves strong performance across various benchmarks while maintaining multilingual capabilities.

Q: What are the recommended use cases?

The model excels in text generation, reasoning tasks, and multilingual applications. It's particularly well-suited for complex problem-solving, creative writing, and applications requiring both English and Chinese language processing.

The first platform built for prompt engineering