SmallThinker-3B-Preview
Property | Value |
---|---|
Base Model | Qwen2.5-3b-Instruct |
Parameters | 3 Billion |
Training Hardware | 8 H100 GPUs |
Model URL | Hugging Face |
What is SmallThinker-3B-Preview?
SmallThinker-3B-Preview is an innovative language model fine-tuned from Qwen2.5-3b-Instruct, specifically designed for edge deployment and efficient processing. The model demonstrates impressive performance improvements across various benchmarks, particularly in STEM and mathematical reasoning tasks, outperforming its base model and even competing with GPT-4 in certain areas.
Implementation Details
The model underwent a two-phase training process using 8 H100 GPUs with a global batch size of 16. The first phase utilized the PowerInfer/QWQ-LONGCOT-500K dataset for 1.5 epochs, while the second phase combined this with PowerInfer/LONGCOT-Refine datasets for an additional 2 epochs. The training leveraged full fine-tuning with DeepSpeed optimization.
- Cutoff length of 16384 tokens
- Neat packing enabled for efficient training
- Comprehensive preprocessing with 16 workers
- Regular checkpointing every 1000 steps
Core Capabilities
- Enhanced performance on mathematical reasoning (AIME24: 16.667%, AMC23: 57.5%)
- Strong STEM capabilities (MMLU_STEM: 68.2%)
- Efficient edge deployment compatibility
- Serves as a draft model for QwQ-32B-Preview with 70% speedup
Frequently Asked Questions
Q: What makes this model unique?
SmallThinker-3B-Preview stands out for its optimized performance in edge deployment scenarios while maintaining strong capabilities in mathematical and STEM reasoning. It achieves this with a relatively small parameter count, making it highly efficient for resource-constrained environments.
Q: What are the recommended use cases?
The model is particularly suited for edge deployment on resource-constrained devices and as a draft model for the larger QwQ-32B-Preview, offering significant speed improvements from 40 to 70 tokens/s in llama.cpp.