II-Thought-1.5B-Preview

Maintained By
Intelligent-Internet

II-Thought-1.5B-Preview

PropertyValue
Base ModelDeepSeek-R1-Distill-Qwen-1.5B
Training AlgorithmGRPO (Generalized Reinforcement Policy Optimization)
Model URLhttps://huggingface.co/Intelligent-Internet/II-Thought-1.5B-Preview
Context Length32,768 tokens

What is II-Thought-1.5B-Preview?

II-Thought-1.5B-Preview is a specialized language model enhanced through reinforcement learning, trained on a carefully curated 50K mathematics problem subset of the II-Thought-RL-v0 dataset. This model represents a significant advancement in mathematical reasoning capabilities, demonstrating superior performance across multiple mathematical benchmarks compared to its base model.

Implementation Details

The model leverages the GRPO algorithm and incorporates dual reward mechanisms for answer correctness and format adherence. It's built upon the DeepSeek-R1-Distill-Qwen-1.5B architecture and optimized for mathematical problem-solving with impressive benchmark results, including 79.77% accuracy on AMC23 and 87.2% on Math500.

  • Utilizes advanced sampling configurations with temperature 0.6 and top_p 0.95
  • Supports extensive context length of 32,768 tokens
  • Implements specialized reward modeling for mathematical accuracy
  • Compatible with vLLM and SGLang deployment frameworks

Core Capabilities

  • Excellence in mathematical reasoning and problem-solving
  • High accuracy in competitive mathematics benchmarks
  • Robust performance across diverse mathematical domains
  • Structured output formatting with LaTeX support

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its specialized training on mathematical content using reinforcement learning, resulting in state-of-the-art performance across multiple mathematical benchmarks, significantly outperforming its base model and competitors.

Q: What are the recommended use cases?

The model excels in mathematical problem-solving scenarios, particularly when step-by-step reasoning is required. It's recommended to use with temperature=0.6 and top_p=0.95, and specifically request step-by-step reasoning with final answers formatted in \boxed{}.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.