OLMoE-1B-7B-0125
Property | Value |
---|---|
Active Parameters | 1.3B |
Total Parameters | 7B |
Paper | arxiv.org/abs/2409.02060 |
License | Open Source |
Author | Allen AI |
What is OLMoE-1B-7B-0125?
OLMoE-1B-7B-0125 is a state-of-the-art Mixture-of-Experts (MoE) language model that achieves remarkable efficiency by utilizing 1.3B active parameters while maintaining access to 7B total parameters. This innovative architecture allows it to compete with much larger models like Llama2-13B while maintaining a smaller computational footprint.
Implementation Details
The model is implemented using the Transformers library and can be easily deployed using PyTorch. It features a sophisticated MoE architecture that dynamically routes computations through different expert networks, optimizing both performance and efficiency.
- Supports both FP32 and BF16 weight formats
- Multiple checkpoints available for different use cases
- Comprehensive pretraining with over 5033B tokens
- Includes specialized versions for instruction-tuning and SFT
Core Capabilities
- Strong performance on MMLU (56.3%)
- Excellent results on HellaSwag (81.7%)
- High accuracy on ARC-Challenge (67.5%)
- Superior performance compared to other 1B parameter models
Frequently Asked Questions
Q: What makes this model unique?
OLMoE-1B-7B-0125 stands out for its efficient use of the Mixture-of-Experts architecture, achieving performance comparable to much larger models while using only 1.3B active parameters. It's fully open-source and achieves state-of-the-art results in its parameter class.
Q: What are the recommended use cases?
The model is well-suited for general language understanding tasks, particularly excelling in multiple-choice reasoning, common sense understanding, and scientific knowledge. It's ideal for applications requiring high performance with limited computational resources.