OpenThinker2-7B
Property | Value |
---|---|
Base Model | Qwen2.5-7B-Instruct |
Training Data | OpenThoughts2-1M |
License | Apache 2.0 |
Training Infrastructure | 32 8xA100 nodes |
Training Duration | 36 hours |
What is OpenThinker2-7B?
OpenThinker2-7B represents a significant advancement in open-source reasoning models, built upon the Qwen2.5-7B-Instruct architecture. This model demonstrates exceptional performance across various mathematical and reasoning tasks, achieving scores comparable to state-of-the-art models like DeepSeek-R1-Distill-7B.
Implementation Details
The model was trained using a sophisticated setup of 32 8xA100 nodes over 36 hours, utilizing key hyperparameters including a learning rate of 8e-05 and a cosine scheduler with 0.1 warmup ratio. The training process employed the ADAMW_TORCH optimizer with carefully tuned parameters and a total batch size of 512.
- Trained on OpenThoughts2-1M dataset, an enhanced version of OpenThoughts-114k
- Implements 26 different question generation methodologies
- Utilizes advanced distributed training across 256 devices
Core Capabilities
- AIME24 Performance: 50.0%
- AIME25 Performance: 33.3%
- AMC23 Performance: 89.5%
- MATH500 Performance: 88.4%
- GPQA-D Performance: 49.3%
- LCBv2 Performance: 55.6%
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its exceptional performance in mathematical reasoning tasks, achieved through extensive training on a carefully curated dataset and innovative question generation methodologies.
Q: What are the recommended use cases?
The model excels in mathematical problem-solving, particularly in competitive mathematics (AIME, AMC) and general problem-solving scenarios. It's particularly well-suited for educational applications and complex reasoning tasks.