OpenELM-1_1B-Instruct
Property | Value |
---|---|
Parameter Count | 1.08B |
Model Type | Instruction-tuned Language Model |
License | Apple Sample Code License |
Paper | ArXiv Paper |
Format | BF16 |
What is OpenELM-1_1B-Instruct?
OpenELM-1_1B-Instruct is part of Apple's OpenELM family of efficient language models. This 1.08B parameter model represents a balanced compromise between model size and performance, featuring a unique layer-wise scaling strategy that optimizes parameter allocation within transformer layers. The model has been pretrained on approximately 1.8 trillion tokens from diverse sources including RefinedWeb, PILE, RedPajama, and Dolma v1.6.
Implementation Details
The model employs advanced architectural optimizations including:
- Layer-wise parameter scaling for enhanced efficiency
- Built using the CoreNet library for optimal performance
- Supports speculative generation for faster inference
- Compatible with both lookup token and model-wise assisted generation
Core Capabilities
- Strong performance on multiple benchmarks (71.20% on HellaSwag, 70.00% on BoolQ)
- Versatile text generation capabilities
- Efficient parameter utilization through innovative scaling
- Instruction-tuned for better task alignment
Frequently Asked Questions
Q: What makes this model unique?
OpenELM-1_1B-Instruct stands out for its efficient parameter allocation strategy and strong performance metrics despite its relatively modest size. The model achieves impressive results across various benchmarks, often outperforming its base variant in instruction-following tasks.
Q: What are the recommended use cases?
The model is well-suited for a range of natural language processing tasks, particularly those requiring instruction following. It performs especially well in multiple-choice tasks, question answering, and general text generation while maintaining computational efficiency.