OLMoE-1B-7B-0125

Maintained By
allenai

OLMoE-1B-7B-0125

PropertyValue
Active Parameters1.3B
Total Parameters7B
Paperarxiv.org/abs/2409.02060
LicenseOpen Source
AuthorAllen AI

What is OLMoE-1B-7B-0125?

OLMoE-1B-7B-0125 is a state-of-the-art Mixture-of-Experts (MoE) language model that achieves remarkable efficiency by utilizing 1.3B active parameters while maintaining access to 7B total parameters. This innovative architecture allows it to compete with much larger models like Llama2-13B while maintaining a smaller computational footprint.

Implementation Details

The model is implemented using the Transformers library and can be easily deployed using PyTorch. It features a sophisticated MoE architecture that dynamically routes computations through different expert networks, optimizing both performance and efficiency.

  • Supports both FP32 and BF16 weight formats
  • Multiple checkpoints available for different use cases
  • Comprehensive pretraining with over 5033B tokens
  • Includes specialized versions for instruction-tuning and SFT

Core Capabilities

  • Strong performance on MMLU (56.3%)
  • Excellent results on HellaSwag (81.7%)
  • High accuracy on ARC-Challenge (67.5%)
  • Superior performance compared to other 1B parameter models

Frequently Asked Questions

Q: What makes this model unique?

OLMoE-1B-7B-0125 stands out for its efficient use of the Mixture-of-Experts architecture, achieving performance comparable to much larger models while using only 1.3B active parameters. It's fully open-source and achieves state-of-the-art results in its parameter class.

Q: What are the recommended use cases?

The model is well-suited for general language understanding tasks, particularly excelling in multiple-choice reasoning, common sense understanding, and scientific knowledge. It's ideal for applications requiring high performance with limited computational resources.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.