SmolLM-360M

Property	Value
Parameter Count	362M
License	Apache 2.0
Training Tokens	600B
Training Hardware	64 H100 GPUs
Framework	Nanotron

What is SmolLM-360M?

SmolLM-360M is part of the SmolLM series, representing a middle-ground solution in the family with 362M parameters. It's trained on the comprehensive Cosmo-Corpus, which includes Cosmopedia v2, Python-Edu, and FineWeb-Edu, totaling over 252B tokens of high-quality educational and synthetic content.

Implementation Details

The model supports multiple deployment options, including full precision (F32), bfloat16, and quantized versions (8-bit and 4-bit) through bitsandbytes. It can be easily integrated using the Transformers library, with memory footprints ranging from 723.56MB in full precision to 251.79MB in 4-bit quantization.

Trained for 600k steps on diverse educational content
Supports both CPU and GPU inference
Compatible with various precision formats for optimal deployment
Uses the Cosmo2 tokenizer

Core Capabilities

Text generation and completion
Common sense reasoning
World knowledge applications
Educational content generation
Python code understanding and generation

Frequently Asked Questions

Q: What makes this model unique?

SmolLM-360M stands out for its efficient architecture and high-quality training data, achieving strong performance despite its relatively small size. It's particularly well-suited for educational and programming-related tasks, having been trained on a carefully curated dataset including Python educational content.

Q: What are the recommended use cases?

The model is best suited for educational content generation, programming assistance, and general text generation tasks where efficiency and accuracy are important. It's particularly effective for scenarios requiring a balance between model size and performance.

SmolLM-360M

SmolLM-360M

What is SmolLM-360M?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models