OpenELM-270M-Instruct
Property | Value |
---|---|
Parameter Count | 272M |
Model Type | Instruction-tuned Language Model |
License | Apple Sample Code License |
Paper | arXiv:2404.14619 |
Tensor Type | BF16 |
What is OpenELM-270M-Instruct?
OpenELM-270M-Instruct is part of Apple's OpenELM family of efficient language models. This instruction-tuned variant contains 272 million parameters and implements a novel layer-wise scaling strategy for optimal parameter allocation. The model was trained on a diverse dataset of approximately 1.8 trillion tokens, including RefinedWeb, PILE, RedPajama, and Dolma v1.6.
Implementation Details
The model utilizes the Transformers architecture with several key optimizations. It's implemented using BF16 precision and requires a Hugging Face access token for usage. The model supports various generation strategies including lookup token speculative generation and model-wise speculative generation with assistive models.
- Efficient layer-wise parameter scaling
- Instruction-tuned architecture
- Support for speculative generation
- Built on CoreNet library
Core Capabilities
- Strong zero-shot performance across multiple benchmarks (55.11% average)
- Improved performance on ARC-c (30.55%) and HellaSwag (52.07%)
- Enhanced instruction following capabilities
- Efficient text generation with customizable parameters
Frequently Asked Questions
Q: What makes this model unique?
OpenELM-270M-Instruct stands out for its efficient parameter allocation strategy and strong performance despite its relatively small size. The model achieves competitive results on various benchmarks while maintaining a compact architecture.
Q: What are the recommended use cases?
The model is well-suited for general text generation tasks, especially those requiring instruction following. It performs particularly well in zero-shot scenarios and can be effectively used for tasks like question answering, text completion, and general language understanding.