SmolLM2-1.7B
Property | Value |
---|---|
Parameter Count | 1.7B |
Training Tokens | 11 Trillion |
License | Apache 2.0 |
Precision | BF16 |
Architecture | Transformer decoder |
What is SmolLM2-1.7B?
SmolLM2-1.7B is part of the SmolLM2 family of compact language models, designed to provide powerful capabilities while remaining lightweight enough for on-device deployment. This model represents a significant advancement over its predecessor, particularly excelling in instruction following, knowledge processing, reasoning, and mathematics tasks. Trained on a diverse dataset including FineWeb-Edu, DCLM, and The Stack, it delivers impressive performance across various benchmarks.
Implementation Details
The model employs a transformer decoder architecture and was trained using 256 H100 GPUs with the nanotron framework. It supports both CPU and GPU deployment, with optimized inference using bfloat16 precision to maintain a manageable memory footprint of approximately 3.4GB.
- Supports multiple deployment options including CPU, single GPU, and multi-GPU setups
- Integrates with the Transformers library for easy implementation
- Optimized for both full precision and bfloat16 operation
Core Capabilities
- Outperforms comparable models in HellaSwag (68.7%), ARC, and PIQA benchmarks
- Demonstrates strong performance in mathematical reasoning with 31.0% accuracy on GSM8K
- Excels in instruction following with 56.7% average performance on IFEval
- Supports text rewriting, summarization, and function calling tasks
Frequently Asked Questions
Q: What makes this model unique?
SmolLM2-1.7B stands out for its exceptional balance between model size and performance, offering competitive results against larger models while maintaining a compact form factor suitable for on-device deployment. Its training on 11 trillion tokens and specialized datasets enables superior performance in instruction following and reasoning tasks.
Q: What are the recommended use cases?
The model is particularly well-suited for applications requiring on-device AI capabilities, including text generation, summarization, mathematics problem-solving, and general instruction following tasks. It's ideal for scenarios where a balance between performance and resource efficiency is crucial.