II-Thought-1.5B-Preview
Property | Value |
---|---|
Base Model | DeepSeek-R1-Distill-Qwen-1.5B |
Training Algorithm | GRPO (Generalized Reinforcement Policy Optimization) |
Model URL | https://huggingface.co/Intelligent-Internet/II-Thought-1.5B-Preview |
Context Length | 32,768 tokens |
What is II-Thought-1.5B-Preview?
II-Thought-1.5B-Preview is a specialized language model enhanced through reinforcement learning, trained on a carefully curated 50K mathematics problem subset of the II-Thought-RL-v0 dataset. This model represents a significant advancement in mathematical reasoning capabilities, demonstrating superior performance across multiple mathematical benchmarks compared to its base model.
Implementation Details
The model leverages the GRPO algorithm and incorporates dual reward mechanisms for answer correctness and format adherence. It's built upon the DeepSeek-R1-Distill-Qwen-1.5B architecture and optimized for mathematical problem-solving with impressive benchmark results, including 79.77% accuracy on AMC23 and 87.2% on Math500.
- Utilizes advanced sampling configurations with temperature 0.6 and top_p 0.95
- Supports extensive context length of 32,768 tokens
- Implements specialized reward modeling for mathematical accuracy
- Compatible with vLLM and SGLang deployment frameworks
Core Capabilities
- Excellence in mathematical reasoning and problem-solving
- High accuracy in competitive mathematics benchmarks
- Robust performance across diverse mathematical domains
- Structured output formatting with LaTeX support
Frequently Asked Questions
Q: What makes this model unique?
The model's distinctive feature is its specialized training on mathematical content using reinforcement learning, resulting in state-of-the-art performance across multiple mathematical benchmarks, significantly outperforming its base model and competitors.
Q: What are the recommended use cases?
The model excels in mathematical problem-solving scenarios, particularly when step-by-step reasoning is required. It's recommended to use with temperature=0.6 and top_p=0.95, and specifically request step-by-step reasoning with final answers formatted in \boxed{}.