SmolLM-360M
Property | Value |
---|---|
Parameter Count | 362M |
License | Apache 2.0 |
Training Tokens | 600B |
Training Hardware | 64 H100 GPUs |
Framework | Nanotron |
What is SmolLM-360M?
SmolLM-360M is part of the SmolLM series, representing a middle-ground solution in the family with 362M parameters. It's trained on the comprehensive Cosmo-Corpus, which includes Cosmopedia v2, Python-Edu, and FineWeb-Edu, totaling over 252B tokens of high-quality educational and synthetic content.
Implementation Details
The model supports multiple deployment options, including full precision (F32), bfloat16, and quantized versions (8-bit and 4-bit) through bitsandbytes. It can be easily integrated using the Transformers library, with memory footprints ranging from 723.56MB in full precision to 251.79MB in 4-bit quantization.
- Trained for 600k steps on diverse educational content
- Supports both CPU and GPU inference
- Compatible with various precision formats for optimal deployment
- Uses the Cosmo2 tokenizer
Core Capabilities
- Text generation and completion
- Common sense reasoning
- World knowledge applications
- Educational content generation
- Python code understanding and generation
Frequently Asked Questions
Q: What makes this model unique?
SmolLM-360M stands out for its efficient architecture and high-quality training data, achieving strong performance despite its relatively small size. It's particularly well-suited for educational and programming-related tasks, having been trained on a carefully curated dataset including Python educational content.
Q: What are the recommended use cases?
The model is best suited for educational content generation, programming assistance, and general text generation tasks where efficiency and accuracy are important. It's particularly effective for scenarios requiring a balance between model size and performance.