OLMoE-1B-7B-0125-Instruct
Property | Value |
---|---|
License | Apache 2.0 |
Paper | arXiv:2409.02060 |
Base Model | OLMoE-1B-7B-0125-DPO |
Primary Language | English |
What is OLMoE-1B-7B-0125-Instruct?
OLMoE-1B-7B-0125-Instruct is an advanced language model developed by Allen AI, representing the final stage of their OLMo series after undergoing supervised finetuning, DPO training, and RLVR optimization. The model is trained on the Tülu 3 dataset, specifically designed to excel at diverse tasks including mathematical reasoning, problem-solving, and general chat capabilities.
Implementation Details
The model builds upon the OLMoE architecture, implementing a mixture-of-experts approach to language modeling. It has undergone a sophisticated training pipeline including supervised fine-tuning (SFT), direct preference optimization (DPO), and reinforcement learning from value ranking (RLVR).
- Trained on Dolma dataset with additional Tülu 3 optimization
- Implements specific chat template format for consistent interaction
- Supports HuggingFace Transformers integration
- Achieves strong performance on MATH, GSM8K, and IFEval benchmarks
Core Capabilities
- Mathematical reasoning (72.40% on GSM8K)
- Code generation (62.30% on HumanEval)
- Truthful QA responses (50.56% accuracy)
- Strong safety performance (90.40% average)
- Effective performance on MMLLU (55.08% with Chain-of-Thought)
Frequently Asked Questions
Q: What makes this model unique?
The model's distinctive feature is its mixture-of-experts architecture combined with comprehensive training on the Tülu 3 dataset, resulting in state-of-the-art performance across diverse tasks, particularly in mathematical reasoning and coding.
Q: What are the recommended use cases?
The model excels in mathematical problem-solving, code generation, truthful QA, and general assistance tasks. It's particularly suited for research and educational applications, with strong safety measures in place.