DiscoLM-mixtral-8x7b-v2
Property | Value |
---|---|
Parameter Count | 46.7B |
Model Type | Mixtral MoE |
License | Apache 2.0 |
Tensor Type | FP16 |
What is DiscoLM-mixtral-8x7b-v2?
DiscoLM-mixtral-8x7b-v2 is an experimental Mixture of Experts (MoE) model developed by DiscoResearch, based on Mistral AI's Mixtral architecture. It represents a significant advancement in language model development, combining the power of multiple expert networks with state-of-the-art architecture design.
Implementation Details
The model utilizes a sophisticated implementation requiring trust_remote_code=True for operation until the architecture is merged into the transformers library. It follows the ChatML format for interactions and can be easily integrated using the Hugging Face Transformers library.
- Built on Mistral AI's Mixtral 8x7b architecture
- Fine-tuned on Synthia, MetaMathQA, and Capybara datasets
- Implements FP16 precision for efficient computation
- Supports chat template formatting for streamlined interaction
Core Capabilities
- Strong performance on ARC (67.32%) and HellaSwag (86.25%)
- Impressive MMLU score of 70.72%
- Excellent performance in humanities (9.75) and STEM (9.45) categories
- Specialized in writing, roleplay, and extraction tasks
Frequently Asked Questions
Q: What makes this model unique?
The model's unique strength lies in its Mixture of Experts architecture combined with careful fine-tuning on specialized datasets, making it particularly effective for diverse tasks while maintaining high performance across various benchmarks.
Q: What are the recommended use cases?
The model excels in conversational AI, academic content generation, and complex reasoning tasks. It's particularly well-suited for applications requiring strong performance in humanities and STEM fields, as evidenced by its MTBench scores.