OLMoE-1B-7B-0125

Property	Value
Active Parameters	1.3B
Total Parameters	7B
Paper	arxiv.org/abs/2409.02060
License	Open Source
Author	Allen AI

What is OLMoE-1B-7B-0125?

OLMoE-1B-7B-0125 is a state-of-the-art Mixture-of-Experts (MoE) language model that achieves remarkable efficiency by utilizing 1.3B active parameters while maintaining access to 7B total parameters. This innovative architecture allows it to compete with much larger models like Llama2-13B while maintaining a smaller computational footprint.

Implementation Details

The model is implemented using the Transformers library and can be easily deployed using PyTorch. It features a sophisticated MoE architecture that dynamically routes computations through different expert networks, optimizing both performance and efficiency.

Supports both FP32 and BF16 weight formats
Multiple checkpoints available for different use cases
Comprehensive pretraining with over 5033B tokens
Includes specialized versions for instruction-tuning and SFT

Core Capabilities

Strong performance on MMLU (56.3%)
Excellent results on HellaSwag (81.7%)
High accuracy on ARC-Challenge (67.5%)
Superior performance compared to other 1B parameter models

Frequently Asked Questions

Q: What makes this model unique?

OLMoE-1B-7B-0125 stands out for its efficient use of the Mixture-of-Experts architecture, achieving performance comparable to much larger models while using only 1.3B active parameters. It's fully open-source and achieves state-of-the-art results in its parameter class.

Q: What are the recommended use cases?

The model is well-suited for general language understanding tasks, particularly excelling in multiple-choice reasoning, common sense understanding, and scientific knowledge. It's ideal for applications requiring high performance with limited computational resources.

OLMoE-1B-7B-0125

OLMoE-1B-7B-0125

What is OLMoE-1B-7B-0125?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models