Yi-34Bx2-MoE-60B

Property	Value
Parameter Count	60.8B
Model Type	Mixture of Experts (MoE)
License	Apache 2.0
Tensor Type	BF16
Average Score	76.72 (OpenLLM Leaderboard)

What is Yi-34Bx2-MoE-60B?

Yi-34Bx2-MoE-60B is a sophisticated Mixture of Experts model that combines the architecture of Yi and Mixtral, designed for both English and Chinese language processing. It represents a significant advancement in large language models, achieving the highest score on the Open LLM Leaderboard as of January 2024.

Implementation Details

The model is built upon the foundation of two 34B expert models, implementing a Mixtral-style architecture. It supports both GPU and CPU deployment, with optimized configurations for different hardware setups, including 4-bit quantization options for efficient inference.

Supports both BF16 and 4-bit quantization
Compatible with Hugging Face Transformers library
Includes built-in repetition penalty for better text generation
Available in both standard and GGUF formats

Core Capabilities

MMLU (5-Shot): 77.47%
HellaSwag (10-Shot): 85.23%
AI2 Reasoning Challenge: 71.08%
GSM8k (5-shot): 75.51%
Bilingual support for English and Chinese
Strong performance in reasoning and truthfulness tasks

Frequently Asked Questions

Q: What makes this model unique?

This model uniquely combines Yi and Mixtral architectures in a MoE configuration, achieving state-of-the-art performance across multiple benchmarks while maintaining efficient computation through its expert-based architecture.

Q: What are the recommended use cases?

The model excels in multilingual applications, reasoning tasks, and general text generation. It's particularly well-suited for applications requiring strong performance in both English and Chinese, with robust capabilities in reasoning, truthfulness assessment, and complex problem-solving.

Yi-34Bx2-MoE-60B

Yi-34Bx2-MoE-60B

What is Yi-34Bx2-MoE-60B?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models