Yi-34Bx2-MoE-60B

Maintained By
cloudyu

Yi-34Bx2-MoE-60B

PropertyValue
Parameter Count60.8B
Model TypeMixture of Experts (MoE)
LicenseApache 2.0
Tensor TypeBF16
Average Score76.72 (OpenLLM Leaderboard)

What is Yi-34Bx2-MoE-60B?

Yi-34Bx2-MoE-60B is a sophisticated Mixture of Experts model that combines the architecture of Yi and Mixtral, designed for both English and Chinese language processing. It represents a significant advancement in large language models, achieving the highest score on the Open LLM Leaderboard as of January 2024.

Implementation Details

The model is built upon the foundation of two 34B expert models, implementing a Mixtral-style architecture. It supports both GPU and CPU deployment, with optimized configurations for different hardware setups, including 4-bit quantization options for efficient inference.

  • Supports both BF16 and 4-bit quantization
  • Compatible with Hugging Face Transformers library
  • Includes built-in repetition penalty for better text generation
  • Available in both standard and GGUF formats

Core Capabilities

  • MMLU (5-Shot): 77.47%
  • HellaSwag (10-Shot): 85.23%
  • AI2 Reasoning Challenge: 71.08%
  • GSM8k (5-shot): 75.51%
  • Bilingual support for English and Chinese
  • Strong performance in reasoning and truthfulness tasks

Frequently Asked Questions

Q: What makes this model unique?

This model uniquely combines Yi and Mixtral architectures in a MoE configuration, achieving state-of-the-art performance across multiple benchmarks while maintaining efficient computation through its expert-based architecture.

Q: What are the recommended use cases?

The model excels in multilingual applications, reasoning tasks, and general text generation. It's particularly well-suited for applications requiring strong performance in both English and Chinese, with robust capabilities in reasoning, truthfulness assessment, and complex problem-solving.

The first platform built for prompt engineering