SmallThinker-3B-Preview

Property	Value
Base Model	Qwen2.5-3b-Instruct
Parameters	3 Billion
Training Hardware	8 H100 GPUs
Model URL	Hugging Face

What is SmallThinker-3B-Preview?

SmallThinker-3B-Preview is an innovative language model fine-tuned from Qwen2.5-3b-Instruct, specifically designed for edge deployment and efficient processing. The model demonstrates impressive performance improvements across various benchmarks, particularly in STEM and mathematical reasoning tasks, outperforming its base model and even competing with GPT-4 in certain areas.

Implementation Details

The model underwent a two-phase training process using 8 H100 GPUs with a global batch size of 16. The first phase utilized the PowerInfer/QWQ-LONGCOT-500K dataset for 1.5 epochs, while the second phase combined this with PowerInfer/LONGCOT-Refine datasets for an additional 2 epochs. The training leveraged full fine-tuning with DeepSpeed optimization.

Cutoff length of 16384 tokens
Neat packing enabled for efficient training
Comprehensive preprocessing with 16 workers
Regular checkpointing every 1000 steps

Core Capabilities

Enhanced performance on mathematical reasoning (AIME24: 16.667%, AMC23: 57.5%)
Strong STEM capabilities (MMLU_STEM: 68.2%)
Efficient edge deployment compatibility
Serves as a draft model for QwQ-32B-Preview with 70% speedup

Frequently Asked Questions

Q: What makes this model unique?

SmallThinker-3B-Preview stands out for its optimized performance in edge deployment scenarios while maintaining strong capabilities in mathematical and STEM reasoning. It achieves this with a relatively small parameter count, making it highly efficient for resource-constrained environments.

Q: What are the recommended use cases?

The model is particularly suited for edge deployment on resource-constrained devices and as a draft model for the larger QwQ-32B-Preview, offering significant speed improvements from 40 to 70 tokens/s in llama.cpp.