OpenELM-270M
Property | Value |
---|---|
Parameter Count | 272M parameters |
Model Type | Text Generation, Transformers |
License | Apple Sample Code License |
Paper | arXiv:2404.14619 |
Tensor Type | F32 |
What is OpenELM-270M?
OpenELM-270M is part of Apple's Open Efficient Language Model family, designed with an innovative layer-wise scaling strategy for optimal parameter allocation. The model has been extensively pretrained on approximately 1.8 trillion tokens from diverse sources including RefinedWeb, PILE, RedPajama, and Dolma v1.6.
Implementation Details
The model utilizes the CoreNet library for pretraining and implements a sophisticated layer-wise scaling approach to enhance accuracy. It's compatible with the Hugging Face ecosystem and supports various generation strategies including lookup token speculative generation and assistive model generation for improved inference speed.
- Efficient parameter allocation through layer-wise scaling
- Comprehensive pretraining on 1.8T tokens
- Support for advanced generation strategies
- Integration with HuggingFace ecosystem
Core Capabilities
- Zero-shot performance: 54.37% average across multiple benchmarks
- Strong performance on tasks like SciQ (84.70%) and PIQA (69.75%)
- Efficient text generation with customizable parameters
- Flexible deployment options with various inference optimizations
Frequently Asked Questions
Q: What makes this model unique?
OpenELM-270M stands out for its efficient layer-wise scaling strategy and comprehensive open-source framework, including complete training and evaluation procedures. It achieves impressive performance metrics despite its relatively small size of 272M parameters.
Q: What are the recommended use cases?
The model is well-suited for general text generation tasks, particularly in scenarios requiring a balance between performance and computational efficiency. It shows strong capabilities in scientific question answering, physical commonsense reasoning, and general language understanding tasks.