SmolLM-360M-Instruct

Property	Value
Parameter Count	362M
License	Apache 2.0
Tensor Type	BF16
Training Datasets	4 specialized datasets

What is SmolLM-360M-Instruct?

SmolLM-360M-Instruct is part of the SmolLM series, representing a medium-sized variant with 362M parameters. It's an instruction-tuned language model designed for efficient text generation and conversational tasks. The model has been fine-tuned on a carefully curated mix of datasets including everyday conversations, Magpie-Pro, StarCoder2, and OpenHermes-2.5.

Implementation Details

The model utilizes the Transformers architecture and is trained using the alignment-handbook framework. Notable technical specifications include a learning rate of 1e-3, cosine scheduling, and a warmup ratio of 0.1, with training conducted over one epoch with a global batch size of 262k tokens.

BF16 precision for optimal performance
Supports both CPU and GPU deployment
Implements chat template for conversation
Available in optimized formats (MLC, GGUF, Transformers.js)

Core Capabilities

General knowledge question answering
Creative writing tasks
Basic Python programming
Conversational interactions
English language processing

Frequently Asked Questions

Q: What makes this model unique?

SmolLM-360M-Instruct stands out for its efficient balance between model size and performance, achieving a 63.3% win rate over its previous version on AlpacaEval. It's specifically optimized for everyday conversations and practical tasks while maintaining a compact size.

Q: What are the recommended use cases?

The model is best suited for general knowledge queries, creative writing, basic programming tasks, and conversational applications. It's particularly effective when used with temperature 0.2 and top-p 0.9 settings, though users should note its limitations with arithmetic, editing tasks, and complex reasoning.