SmolLM-135M
Property | Value |
---|---|
Parameter Count | 135M |
License | Apache 2.0 |
Training Tokens | 600B |
Training Hardware | 64 H100 GPUs |
Framework | Nanotron |
What is SmolLM-135M?
SmolLM-135M is part of the innovative SmolLM series, representing a compact yet powerful language model designed for efficient text generation. Trained on the carefully curated Cosmo-Corpus, which includes Cosmopedia v2, Python-Edu, and FineWeb-Edu, this model delivers impressive performance despite its relatively small size of 135 million parameters.
Implementation Details
The model supports multiple deployment options, including full precision (F32), bfloat16, and quantized versions (8-bit and 4-bit) through bitsandbytes. Memory footprint varies significantly based on precision: 12.6GB for full precision, 269MB for bfloat16, and as low as 109.78MB for 4-bit quantization.
- Trained for 600k steps on 600B tokens
- Uses the Cosmo2 tokenizer
- Supports CPU, GPU, and multi-GPU deployment
- Compatible with Hugging Face Transformers library
Core Capabilities
- Text generation in English
- Common sense reasoning
- World knowledge application
- Educational content generation
- Code completion (especially Python)
Frequently Asked Questions
Q: What makes this model unique?
SmolLM-135M stands out for its efficient architecture and high-quality training data, delivering impressive performance despite its compact size. It's particularly notable for its ability to handle both general text and coding tasks while maintaining a small computational footprint.
Q: What are the recommended use cases?
The model is well-suited for educational content generation, code completion, and general text generation tasks where computational resources are limited. It's particularly effective for applications requiring a balance between performance and resource efficiency.