OpenELM-3B
Property | Value |
---|---|
Parameter Count | 3.04B |
License | Apple Sample Code License |
Paper | arXiv:2404.14619 |
Training Data | 1.8T tokens |
Model Type | Transformer-based Language Model |
What is OpenELM-3B?
OpenELM-3B is part of Apple's Open Efficient Language Model family, representing their largest publicly released model with 3.04 billion parameters. It utilizes an innovative layer-wise scaling strategy to optimize parameter allocation within transformer layers, resulting in enhanced performance across various NLP tasks.
Implementation Details
The model was trained on a diverse dataset comprising RefinedWeb, deduplicated PILE, RedPajama subset, and Dolma v1.6, totaling approximately 1.8 trillion tokens. It employs the CoreNet library for pre-training and supports various generation strategies including lookup token speculative generation for improved inference speed.
- Advanced layer-wise parameter scaling architecture
- Compatible with Hugging Face's transformers library
- Supports both vanilla and instruction-tuned variants
- Implements efficient inference optimization techniques
Core Capabilities
- Strong performance on zero-shot tasks (67.39% average across standard benchmarks)
- Excellent results on complex reasoning tasks (ARC-c: 35.58%)
- High accuracy on common sense tasks (HellaSwag: 72.44%)
- Superior performance on scientific knowledge (SciQ: 92.70%)
Frequently Asked Questions
Q: What makes this model unique?
OpenELM-3B stands out for its efficient parameter allocation strategy and comprehensive open-source framework that includes data preparation, training, fine-tuning, and evaluation procedures. It achieves strong performance while maintaining computational efficiency.
Q: What are the recommended use cases?
The model excels in text generation, reasoning tasks, and scientific question-answering. It's particularly well-suited for applications requiring strong zero-shot performance and can be used with speculative generation for faster inference.