SmolLM2-135M

Property	Value
Parameter Count	135M
Training Tokens	2 Trillion
License	Apache 2.0
Precision	BFloat16
Memory Footprint	723.56 MB

What is SmolLM2-135M?

SmolLM2-135M is a compact yet powerful language model that represents a significant advancement in efficient AI. As part of the SmolLM2 family, it's designed to provide robust language understanding and generation capabilities while maintaining a small footprint suitable for on-device deployment. The model was trained on an extensive and diverse dataset of 2 trillion tokens, incorporating FineWeb-Edu, DCLM, and The Stack.

Implementation Details

Built on a Transformer decoder architecture, SmolLM2-135M is optimized for both performance and efficiency. The model utilizes BFloat16 precision and can be deployed on both CPU and GPU environments with minimal setup requirements. Training was conducted using 64 H100 GPUs and the nanotron framework.

Zero-shot performance across multiple benchmarks
Supports both base pre-trained and instruction-tuned versions
Optimized for memory efficiency (723.56 MB footprint)
Compatible with Hugging Face Transformers library

Core Capabilities

Strong performance on benchmarks like HellaSwag (42.1%) and PIQA (68.4%)
Improved instruction following compared to predecessor
Enhanced knowledge and reasoning capabilities
Text generation and completion tasks
Supports multiple deployment options (CPU/GPU/multi-GPU)

Frequently Asked Questions

Q: What makes this model unique?

SmolLM2-135M stands out for its exceptional balance between model size and performance. Despite being only 135M parameters, it achieves significant improvements over its predecessor across multiple benchmarks while maintaining a small enough footprint for on-device deployment.

Q: What are the recommended use cases?

The model is ideal for applications requiring efficient language processing with limited computational resources. It's particularly suitable for text generation, completion tasks, and basic reasoning applications where deployment size is a constraint.

SmolLM2-135M

SmolLM2-135M

What is SmolLM2-135M?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models