SmolLM-135M

Maintained By
HuggingFaceTB

SmolLM-135M

PropertyValue
Parameter Count135M
LicenseApache 2.0
Training Tokens600B
Training Hardware64 H100 GPUs
FrameworkNanotron

What is SmolLM-135M?

SmolLM-135M is part of the innovative SmolLM series, representing a compact yet powerful language model designed for efficient text generation. Trained on the carefully curated Cosmo-Corpus, which includes Cosmopedia v2, Python-Edu, and FineWeb-Edu, this model delivers impressive performance despite its relatively small size of 135 million parameters.

Implementation Details

The model supports multiple deployment options, including full precision (F32), bfloat16, and quantized versions (8-bit and 4-bit) through bitsandbytes. Memory footprint varies significantly based on precision: 12.6GB for full precision, 269MB for bfloat16, and as low as 109.78MB for 4-bit quantization.

  • Trained for 600k steps on 600B tokens
  • Uses the Cosmo2 tokenizer
  • Supports CPU, GPU, and multi-GPU deployment
  • Compatible with Hugging Face Transformers library

Core Capabilities

  • Text generation in English
  • Common sense reasoning
  • World knowledge application
  • Educational content generation
  • Code completion (especially Python)

Frequently Asked Questions

Q: What makes this model unique?

SmolLM-135M stands out for its efficient architecture and high-quality training data, delivering impressive performance despite its compact size. It's particularly notable for its ability to handle both general text and coding tasks while maintaining a small computational footprint.

Q: What are the recommended use cases?

The model is well-suited for educational content generation, code completion, and general text generation tasks where computational resources are limited. It's particularly effective for applications requiring a balance between performance and resource efficiency.

The first platform built for prompt engineering