SmolLM2-135M-Instruct
Property | Value |
---|---|
Parameter Count | 135M |
Training Tokens | 2 trillion |
License | Apache 2.0 |
Architecture | Transformer decoder |
Precision | BFloat16 |
What is SmolLM2-135M-Instruct?
SmolLM2-135M-Instruct is a compact yet powerful language model designed for efficient instruction following and general text generation. As part of the SmolLM2 family, it represents a significant advancement over its predecessor, particularly excelling in instruction following, knowledge application, and reasoning capabilities. The model was trained on an extensive and diverse dataset of 2 trillion tokens, including FineWeb-Edu, DCLM, and The Stack.
Implementation Details
The model underwent a sophisticated training process involving supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) using the UltraFeedback dataset. It was trained using 64 H100 GPUs and the nanotron framework, demonstrating impressive performance metrics across various benchmarks.
- Zero-shot performance improvements over predecessor in multiple benchmarks
- Supports text rewriting and summarization tasks
- Optimized for efficient on-device deployment
- Implements chat template for conversational applications
Core Capabilities
- Instruction following with 29.9% average performance on IFEval
- Strong performance on reasoning tasks (28.2% on BBH 3-shot)
- Efficient text generation and summarization
- Lightweight deployment options with ONNX and Transformers.js support
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its exceptional performance-to-size ratio, delivering strong capabilities in instruction following and reasoning tasks despite its compact 135M parameter size. It's specifically optimized for on-device deployment while maintaining competitive performance metrics.
Q: What are the recommended use cases?
The model is well-suited for text generation, summarization, and instruction-following tasks. It's particularly valuable for applications requiring efficient on-device deployment or where computational resources are limited, while still needing reliable language model capabilities.