OpenELM-270M

Property	Value
Parameter Count	272M parameters
Model Type	Text Generation, Transformers
License	Apple Sample Code License
Paper	arXiv:2404.14619
Tensor Type	F32

What is OpenELM-270M?

OpenELM-270M is part of Apple's Open Efficient Language Model family, designed with an innovative layer-wise scaling strategy for optimal parameter allocation. The model has been extensively pretrained on approximately 1.8 trillion tokens from diverse sources including RefinedWeb, PILE, RedPajama, and Dolma v1.6.

Implementation Details

The model utilizes the CoreNet library for pretraining and implements a sophisticated layer-wise scaling approach to enhance accuracy. It's compatible with the Hugging Face ecosystem and supports various generation strategies including lookup token speculative generation and assistive model generation for improved inference speed.

Efficient parameter allocation through layer-wise scaling
Comprehensive pretraining on 1.8T tokens
Support for advanced generation strategies
Integration with HuggingFace ecosystem

Core Capabilities

Zero-shot performance: 54.37% average across multiple benchmarks
Strong performance on tasks like SciQ (84.70%) and PIQA (69.75%)
Efficient text generation with customizable parameters
Flexible deployment options with various inference optimizations

Frequently Asked Questions

Q: What makes this model unique?

OpenELM-270M stands out for its efficient layer-wise scaling strategy and comprehensive open-source framework, including complete training and evaluation procedures. It achieves impressive performance metrics despite its relatively small size of 272M parameters.

Q: What are the recommended use cases?

The model is well-suited for general text generation tasks, particularly in scenarios requiring a balance between performance and computational efficiency. It shows strong capabilities in scientific question answering, physical commonsense reasoning, and general language understanding tasks.

OpenELM-270M

OpenELM-270M

What is OpenELM-270M?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models