SmolLM2-1.7B

Property	Value
Parameter Count	1.7B
Training Tokens	11 Trillion
License	Apache 2.0
Precision	BF16
Architecture	Transformer decoder

What is SmolLM2-1.7B?

SmolLM2-1.7B is part of the SmolLM2 family of compact language models, designed to provide powerful capabilities while remaining lightweight enough for on-device deployment. This model represents a significant advancement over its predecessor, particularly excelling in instruction following, knowledge processing, reasoning, and mathematics tasks. Trained on a diverse dataset including FineWeb-Edu, DCLM, and The Stack, it delivers impressive performance across various benchmarks.

Implementation Details

The model employs a transformer decoder architecture and was trained using 256 H100 GPUs with the nanotron framework. It supports both CPU and GPU deployment, with optimized inference using bfloat16 precision to maintain a manageable memory footprint of approximately 3.4GB.

Supports multiple deployment options including CPU, single GPU, and multi-GPU setups
Integrates with the Transformers library for easy implementation
Optimized for both full precision and bfloat16 operation

Core Capabilities

Outperforms comparable models in HellaSwag (68.7%), ARC, and PIQA benchmarks
Demonstrates strong performance in mathematical reasoning with 31.0% accuracy on GSM8K
Excels in instruction following with 56.7% average performance on IFEval
Supports text rewriting, summarization, and function calling tasks

Frequently Asked Questions

Q: What makes this model unique?

SmolLM2-1.7B stands out for its exceptional balance between model size and performance, offering competitive results against larger models while maintaining a compact form factor suitable for on-device deployment. Its training on 11 trillion tokens and specialized datasets enables superior performance in instruction following and reasoning tasks.

Q: What are the recommended use cases?

The model is particularly well-suited for applications requiring on-device AI capabilities, including text generation, summarization, mathematics problem-solving, and general instruction following tasks. It's ideal for scenarios where a balance between performance and resource efficiency is crucial.

SmolLM2-1.7B

SmolLM2-1.7B

What is SmolLM2-1.7B?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models