SmolLM-135M

Property	Value
Parameter Count	135M
License	Apache 2.0
Training Tokens	600B
Training Hardware	64 H100 GPUs
Framework	Nanotron

What is SmolLM-135M?

SmolLM-135M is part of the innovative SmolLM series, representing a compact yet powerful language model designed for efficient text generation. Trained on the carefully curated Cosmo-Corpus, which includes Cosmopedia v2, Python-Edu, and FineWeb-Edu, this model delivers impressive performance despite its relatively small size of 135 million parameters.

Implementation Details

The model supports multiple deployment options, including full precision (F32), bfloat16, and quantized versions (8-bit and 4-bit) through bitsandbytes. Memory footprint varies significantly based on precision: 12.6GB for full precision, 269MB for bfloat16, and as low as 109.78MB for 4-bit quantization.

Trained for 600k steps on 600B tokens
Uses the Cosmo2 tokenizer
Supports CPU, GPU, and multi-GPU deployment
Compatible with Hugging Face Transformers library

Core Capabilities

Text generation in English
Common sense reasoning
World knowledge application
Educational content generation
Code completion (especially Python)

Frequently Asked Questions

Q: What makes this model unique?

SmolLM-135M stands out for its efficient architecture and high-quality training data, delivering impressive performance despite its compact size. It's particularly notable for its ability to handle both general text and coding tasks while maintaining a small computational footprint.

Q: What are the recommended use cases?

The model is well-suited for educational content generation, code completion, and general text generation tasks where computational resources are limited. It's particularly effective for applications requiring a balance between performance and resource efficiency.

SmolLM-135M

SmolLM-135M

What is SmolLM-135M?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models