Yi-34Bx2-MoE-60B
Property | Value |
---|---|
Parameter Count | 60.8B |
Model Type | Mixture of Experts (MoE) |
License | Apache 2.0 |
Tensor Type | BF16 |
Average Score | 76.72 (OpenLLM Leaderboard) |
What is Yi-34Bx2-MoE-60B?
Yi-34Bx2-MoE-60B is a sophisticated Mixture of Experts model that combines the architecture of Yi and Mixtral, designed for both English and Chinese language processing. It represents a significant advancement in large language models, achieving the highest score on the Open LLM Leaderboard as of January 2024.
Implementation Details
The model is built upon the foundation of two 34B expert models, implementing a Mixtral-style architecture. It supports both GPU and CPU deployment, with optimized configurations for different hardware setups, including 4-bit quantization options for efficient inference.
- Supports both BF16 and 4-bit quantization
- Compatible with Hugging Face Transformers library
- Includes built-in repetition penalty for better text generation
- Available in both standard and GGUF formats
Core Capabilities
- MMLU (5-Shot): 77.47%
- HellaSwag (10-Shot): 85.23%
- AI2 Reasoning Challenge: 71.08%
- GSM8k (5-shot): 75.51%
- Bilingual support for English and Chinese
- Strong performance in reasoning and truthfulness tasks
Frequently Asked Questions
Q: What makes this model unique?
This model uniquely combines Yi and Mixtral architectures in a MoE configuration, achieving state-of-the-art performance across multiple benchmarks while maintaining efficient computation through its expert-based architecture.
Q: What are the recommended use cases?
The model excels in multilingual applications, reasoning tasks, and general text generation. It's particularly well-suited for applications requiring strong performance in both English and Chinese, with robust capabilities in reasoning, truthfulness assessment, and complex problem-solving.