II-Thought-1.5B-Preview

Property	Value
Base Model	DeepSeek-R1-Distill-Qwen-1.5B
Training Algorithm	GRPO (Generalized Reinforcement Policy Optimization)
Model URL	https://huggingface.co/Intelligent-Internet/II-Thought-1.5B-Preview
Context Length	32,768 tokens

What is II-Thought-1.5B-Preview?

II-Thought-1.5B-Preview is a specialized language model enhanced through reinforcement learning, trained on a carefully curated 50K mathematics problem subset of the II-Thought-RL-v0 dataset. This model represents a significant advancement in mathematical reasoning capabilities, demonstrating superior performance across multiple mathematical benchmarks compared to its base model.

Implementation Details

The model leverages the GRPO algorithm and incorporates dual reward mechanisms for answer correctness and format adherence. It's built upon the DeepSeek-R1-Distill-Qwen-1.5B architecture and optimized for mathematical problem-solving with impressive benchmark results, including 79.77% accuracy on AMC23 and 87.2% on Math500.

Utilizes advanced sampling configurations with temperature 0.6 and top_p 0.95
Supports extensive context length of 32,768 tokens
Implements specialized reward modeling for mathematical accuracy
Compatible with vLLM and SGLang deployment frameworks

Core Capabilities

Excellence in mathematical reasoning and problem-solving
High accuracy in competitive mathematics benchmarks
Robust performance across diverse mathematical domains
Structured output formatting with LaTeX support

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its specialized training on mathematical content using reinforcement learning, resulting in state-of-the-art performance across multiple mathematical benchmarks, significantly outperforming its base model and competitors.

Q: What are the recommended use cases?

The model excels in mathematical problem-solving scenarios, particularly when step-by-step reasoning is required. It's recommended to use with temperature=0.6 and top_p=0.95, and specifically request step-by-step reasoning with final answers formatted in \boxed{}.