OpenELM-3B

Maintained By
apple

OpenELM-3B

PropertyValue
Parameter Count3.04B
LicenseApple Sample Code License
PaperarXiv:2404.14619
Training Data1.8T tokens
Model TypeTransformer-based Language Model

What is OpenELM-3B?

OpenELM-3B is part of Apple's Open Efficient Language Model family, representing their largest publicly released model with 3.04 billion parameters. It utilizes an innovative layer-wise scaling strategy to optimize parameter allocation within transformer layers, resulting in enhanced performance across various NLP tasks.

Implementation Details

The model was trained on a diverse dataset comprising RefinedWeb, deduplicated PILE, RedPajama subset, and Dolma v1.6, totaling approximately 1.8 trillion tokens. It employs the CoreNet library for pre-training and supports various generation strategies including lookup token speculative generation for improved inference speed.

  • Advanced layer-wise parameter scaling architecture
  • Compatible with Hugging Face's transformers library
  • Supports both vanilla and instruction-tuned variants
  • Implements efficient inference optimization techniques

Core Capabilities

  • Strong performance on zero-shot tasks (67.39% average across standard benchmarks)
  • Excellent results on complex reasoning tasks (ARC-c: 35.58%)
  • High accuracy on common sense tasks (HellaSwag: 72.44%)
  • Superior performance on scientific knowledge (SciQ: 92.70%)

Frequently Asked Questions

Q: What makes this model unique?

OpenELM-3B stands out for its efficient parameter allocation strategy and comprehensive open-source framework that includes data preparation, training, fine-tuning, and evaluation procedures. It achieves strong performance while maintaining computational efficiency.

Q: What are the recommended use cases?

The model excels in text generation, reasoning tasks, and scientific question-answering. It's particularly well-suited for applications requiring strong zero-shot performance and can be used with speculative generation for faster inference.

The first platform built for prompt engineering