OpenELM-270M

Maintained By
apple

OpenELM-270M

PropertyValue
Parameter Count272M parameters
Model TypeText Generation, Transformers
LicenseApple Sample Code License
PaperarXiv:2404.14619
Tensor TypeF32

What is OpenELM-270M?

OpenELM-270M is part of Apple's Open Efficient Language Model family, designed with an innovative layer-wise scaling strategy for optimal parameter allocation. The model has been extensively pretrained on approximately 1.8 trillion tokens from diverse sources including RefinedWeb, PILE, RedPajama, and Dolma v1.6.

Implementation Details

The model utilizes the CoreNet library for pretraining and implements a sophisticated layer-wise scaling approach to enhance accuracy. It's compatible with the Hugging Face ecosystem and supports various generation strategies including lookup token speculative generation and assistive model generation for improved inference speed.

  • Efficient parameter allocation through layer-wise scaling
  • Comprehensive pretraining on 1.8T tokens
  • Support for advanced generation strategies
  • Integration with HuggingFace ecosystem

Core Capabilities

  • Zero-shot performance: 54.37% average across multiple benchmarks
  • Strong performance on tasks like SciQ (84.70%) and PIQA (69.75%)
  • Efficient text generation with customizable parameters
  • Flexible deployment options with various inference optimizations

Frequently Asked Questions

Q: What makes this model unique?

OpenELM-270M stands out for its efficient layer-wise scaling strategy and comprehensive open-source framework, including complete training and evaluation procedures. It achieves impressive performance metrics despite its relatively small size of 272M parameters.

Q: What are the recommended use cases?

The model is well-suited for general text generation tasks, particularly in scenarios requiring a balance between performance and computational efficiency. It shows strong capabilities in scientific question answering, physical commonsense reasoning, and general language understanding tasks.

The first platform built for prompt engineering