SmolLM-360M-Instruct
Property | Value |
---|---|
Parameter Count | 362M |
License | Apache 2.0 |
Tensor Type | BF16 |
Training Datasets | 4 specialized datasets |
What is SmolLM-360M-Instruct?
SmolLM-360M-Instruct is part of the SmolLM series, representing a medium-sized variant with 362M parameters. It's an instruction-tuned language model designed for efficient text generation and conversational tasks. The model has been fine-tuned on a carefully curated mix of datasets including everyday conversations, Magpie-Pro, StarCoder2, and OpenHermes-2.5.
Implementation Details
The model utilizes the Transformers architecture and is trained using the alignment-handbook framework. Notable technical specifications include a learning rate of 1e-3, cosine scheduling, and a warmup ratio of 0.1, with training conducted over one epoch with a global batch size of 262k tokens.
- BF16 precision for optimal performance
- Supports both CPU and GPU deployment
- Implements chat template for conversation
- Available in optimized formats (MLC, GGUF, Transformers.js)
Core Capabilities
- General knowledge question answering
- Creative writing tasks
- Basic Python programming
- Conversational interactions
- English language processing
Frequently Asked Questions
Q: What makes this model unique?
SmolLM-360M-Instruct stands out for its efficient balance between model size and performance, achieving a 63.3% win rate over its previous version on AlpacaEval. It's specifically optimized for everyday conversations and practical tasks while maintaining a compact size.
Q: What are the recommended use cases?
The model is best suited for general knowledge queries, creative writing, basic programming tasks, and conversational applications. It's particularly effective when used with temperature 0.2 and top-p 0.9 settings, though users should note its limitations with arithmetic, editing tasks, and complex reasoning.