OLMoE-1B-7B-0125-Instruct

Property	Value
License	Apache 2.0
Paper	arXiv:2409.02060
Base Model	OLMoE-1B-7B-0125-DPO
Primary Language	English

What is OLMoE-1B-7B-0125-Instruct?

OLMoE-1B-7B-0125-Instruct is an advanced language model developed by Allen AI, representing the final stage of their OLMo series after undergoing supervised finetuning, DPO training, and RLVR optimization. The model is trained on the Tülu 3 dataset, specifically designed to excel at diverse tasks including mathematical reasoning, problem-solving, and general chat capabilities.

Implementation Details

The model builds upon the OLMoE architecture, implementing a mixture-of-experts approach to language modeling. It has undergone a sophisticated training pipeline including supervised fine-tuning (SFT), direct preference optimization (DPO), and reinforcement learning from value ranking (RLVR).

Trained on Dolma dataset with additional Tülu 3 optimization
Implements specific chat template format for consistent interaction
Supports HuggingFace Transformers integration
Achieves strong performance on MATH, GSM8K, and IFEval benchmarks

Core Capabilities

Mathematical reasoning (72.40% on GSM8K)
Code generation (62.30% on HumanEval)
Truthful QA responses (50.56% accuracy)
Strong safety performance (90.40% average)
Effective performance on MMLLU (55.08% with Chain-of-Thought)

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its mixture-of-experts architecture combined with comprehensive training on the Tülu 3 dataset, resulting in state-of-the-art performance across diverse tasks, particularly in mathematical reasoning and coding.

Q: What are the recommended use cases?

The model excels in mathematical problem-solving, code generation, truthful QA, and general assistance tasks. It's particularly suited for research and educational applications, with strong safety measures in place.