SmolLM-360M

Maintained By
HuggingFaceTB

SmolLM-360M

PropertyValue
Parameter Count362M
LicenseApache 2.0
Training Tokens600B
Training Hardware64 H100 GPUs
FrameworkNanotron

What is SmolLM-360M?

SmolLM-360M is part of the SmolLM series, representing a middle-ground solution in the family with 362M parameters. It's trained on the comprehensive Cosmo-Corpus, which includes Cosmopedia v2, Python-Edu, and FineWeb-Edu, totaling over 252B tokens of high-quality educational and synthetic content.

Implementation Details

The model supports multiple deployment options, including full precision (F32), bfloat16, and quantized versions (8-bit and 4-bit) through bitsandbytes. It can be easily integrated using the Transformers library, with memory footprints ranging from 723.56MB in full precision to 251.79MB in 4-bit quantization.

  • Trained for 600k steps on diverse educational content
  • Supports both CPU and GPU inference
  • Compatible with various precision formats for optimal deployment
  • Uses the Cosmo2 tokenizer

Core Capabilities

  • Text generation and completion
  • Common sense reasoning
  • World knowledge applications
  • Educational content generation
  • Python code understanding and generation

Frequently Asked Questions

Q: What makes this model unique?

SmolLM-360M stands out for its efficient architecture and high-quality training data, achieving strong performance despite its relatively small size. It's particularly well-suited for educational and programming-related tasks, having been trained on a carefully curated dataset including Python educational content.

Q: What are the recommended use cases?

The model is best suited for educational content generation, programming assistance, and general text generation tasks where efficiency and accuracy are important. It's particularly effective for scenarios requiring a balance between model size and performance.

The first platform built for prompt engineering