OpenELM-270M-Instruct

Property	Value
Parameter Count	272M
Model Type	Instruction-tuned Language Model
License	Apple Sample Code License
Paper	arXiv:2404.14619
Tensor Type	BF16

What is OpenELM-270M-Instruct?

OpenELM-270M-Instruct is part of Apple's OpenELM family of efficient language models. This instruction-tuned variant contains 272 million parameters and implements a novel layer-wise scaling strategy for optimal parameter allocation. The model was trained on a diverse dataset of approximately 1.8 trillion tokens, including RefinedWeb, PILE, RedPajama, and Dolma v1.6.

Implementation Details

The model utilizes the Transformers architecture with several key optimizations. It's implemented using BF16 precision and requires a Hugging Face access token for usage. The model supports various generation strategies including lookup token speculative generation and model-wise speculative generation with assistive models.

Efficient layer-wise parameter scaling
Instruction-tuned architecture
Support for speculative generation
Built on CoreNet library

Core Capabilities

Strong zero-shot performance across multiple benchmarks (55.11% average)
Improved performance on ARC-c (30.55%) and HellaSwag (52.07%)
Enhanced instruction following capabilities
Efficient text generation with customizable parameters

Frequently Asked Questions

Q: What makes this model unique?

OpenELM-270M-Instruct stands out for its efficient parameter allocation strategy and strong performance despite its relatively small size. The model achieves competitive results on various benchmarks while maintaining a compact architecture.

Q: What are the recommended use cases?

The model is well-suited for general text generation tasks, especially those requiring instruction following. It performs particularly well in zero-shot scenarios and can be effectively used for tasks like question answering, text completion, and general language understanding.