SmolLM2-135M
Property | Value |
---|---|
Parameter Count | 135M |
Training Tokens | 2 Trillion |
License | Apache 2.0 |
Precision | BFloat16 |
Memory Footprint | 723.56 MB |
What is SmolLM2-135M?
SmolLM2-135M is a compact yet powerful language model that represents a significant advancement in efficient AI. As part of the SmolLM2 family, it's designed to provide robust language understanding and generation capabilities while maintaining a small footprint suitable for on-device deployment. The model was trained on an extensive and diverse dataset of 2 trillion tokens, incorporating FineWeb-Edu, DCLM, and The Stack.
Implementation Details
Built on a Transformer decoder architecture, SmolLM2-135M is optimized for both performance and efficiency. The model utilizes BFloat16 precision and can be deployed on both CPU and GPU environments with minimal setup requirements. Training was conducted using 64 H100 GPUs and the nanotron framework.
- Zero-shot performance across multiple benchmarks
- Supports both base pre-trained and instruction-tuned versions
- Optimized for memory efficiency (723.56 MB footprint)
- Compatible with Hugging Face Transformers library
Core Capabilities
- Strong performance on benchmarks like HellaSwag (42.1%) and PIQA (68.4%)
- Improved instruction following compared to predecessor
- Enhanced knowledge and reasoning capabilities
- Text generation and completion tasks
- Supports multiple deployment options (CPU/GPU/multi-GPU)
Frequently Asked Questions
Q: What makes this model unique?
SmolLM2-135M stands out for its exceptional balance between model size and performance. Despite being only 135M parameters, it achieves significant improvements over its predecessor across multiple benchmarks while maintaining a small enough footprint for on-device deployment.
Q: What are the recommended use cases?
The model is ideal for applications requiring efficient language processing with limited computational resources. It's particularly suitable for text generation, completion tasks, and basic reasoning applications where deployment size is a constraint.