SmolLM2-1.7B

Maintained By
HuggingFaceTB

SmolLM2-1.7B

PropertyValue
Parameter Count1.7B
Training Tokens11 Trillion
LicenseApache 2.0
PrecisionBF16
ArchitectureTransformer decoder

What is SmolLM2-1.7B?

SmolLM2-1.7B is part of the SmolLM2 family of compact language models, designed to provide powerful capabilities while remaining lightweight enough for on-device deployment. This model represents a significant advancement over its predecessor, particularly excelling in instruction following, knowledge processing, reasoning, and mathematics tasks. Trained on a diverse dataset including FineWeb-Edu, DCLM, and The Stack, it delivers impressive performance across various benchmarks.

Implementation Details

The model employs a transformer decoder architecture and was trained using 256 H100 GPUs with the nanotron framework. It supports both CPU and GPU deployment, with optimized inference using bfloat16 precision to maintain a manageable memory footprint of approximately 3.4GB.

  • Supports multiple deployment options including CPU, single GPU, and multi-GPU setups
  • Integrates with the Transformers library for easy implementation
  • Optimized for both full precision and bfloat16 operation

Core Capabilities

  • Outperforms comparable models in HellaSwag (68.7%), ARC, and PIQA benchmarks
  • Demonstrates strong performance in mathematical reasoning with 31.0% accuracy on GSM8K
  • Excels in instruction following with 56.7% average performance on IFEval
  • Supports text rewriting, summarization, and function calling tasks

Frequently Asked Questions

Q: What makes this model unique?

SmolLM2-1.7B stands out for its exceptional balance between model size and performance, offering competitive results against larger models while maintaining a compact form factor suitable for on-device deployment. Its training on 11 trillion tokens and specialized datasets enables superior performance in instruction following and reasoning tasks.

Q: What are the recommended use cases?

The model is particularly well-suited for applications requiring on-device AI capabilities, including text generation, summarization, mathematics problem-solving, and general instruction following tasks. It's ideal for scenarios where a balance between performance and resource efficiency is crucial.

The first platform built for prompt engineering