OpenELM-3B-Instruct
Property | Value |
---|---|
Parameter Count | 3.04B parameters |
Tensor Type | BF16 |
License | Apple Sample Code License |
Research Paper | arXiv:2404.14619 |
What is OpenELM-3B-Instruct?
OpenELM-3B-Instruct is a state-of-the-art instruction-tuned language model developed by Apple, part of their OpenELM family of Efficient Language Models. The model utilizes an innovative layer-wise scaling strategy to optimize parameter allocation within transformer layers, resulting in enhanced performance across various NLP tasks.
Implementation Details
The model was trained on a diverse dataset of approximately 1.8 trillion tokens, including RefinedWeb, deduplicated PILE, and subsets of RedPajama and Dolma v1.6. It employs the LLaMA tokenizer architecture and supports various generation strategies including speculative generation for improved inference speed.
- Achieves 69.15% average performance across zero-shot tasks
- Demonstrates strong performance in multiple benchmarks including ARC, HellaSwag, and MMLU
- Supports both standard and accelerated inference through lookup token speculative generation
Core Capabilities
- Zero-shot task performance across multiple domains
- Strong performance in reasoning and knowledge-based tasks
- Efficient parameter utilization through layer-wise scaling
- Support for both CPU and GPU inference
- Compatible with various generation optimization techniques
Frequently Asked Questions
Q: What makes this model unique?
OpenELM-3B-Instruct stands out for its efficient parameter allocation strategy and strong performance despite its relatively moderate size. It achieves impressive results across various benchmarks while maintaining computational efficiency.
Q: What are the recommended use cases?
The model is well-suited for general language understanding tasks, including question answering, reasoning, and knowledge-based applications. It's particularly effective in zero-shot scenarios and can be used in both research and applied settings where efficient, accurate language processing is required.