MobileLLM-125M

Maintained By
facebook

MobileLLM-125M

PropertyValue
Parameter Count124.6M
LicenseCC-BY-NC-4.0
Training Data1T tokens of public data
Context Length2k tokens
PaperarXiv:2402.14905

What is MobileLLM-125M?

MobileLLM-125M is an innovative language model specifically engineered for on-device applications, developed by Meta. It represents a significant advancement in efficient AI modeling, achieving a 2.7% accuracy improvement over previous state-of-the-art models of similar size in zero-shot commonsense reasoning tasks.

Implementation Details

The model features a sophisticated architecture with 30 layers, 9 attention heads, and 3 KV heads, operating with a token dimension of 576. It utilizes several optimization techniques including SwiGLU activation, deep and thin architectures, embedding sharing, and grouped-query attention.

  • Training completed in approximately 3 days using 32 NVIDIA A100 80G GPUs
  • Implements FP16 precision for efficient computation
  • Features grouped-query attention (GQA) for improved performance
  • Utilizes shared embeddings to reduce parameter count

Core Capabilities

  • Zero-shot commonsense reasoning with superior performance on multiple benchmarks
  • Efficient text generation optimized for mobile devices
  • Handles context lengths up to 2000 tokens
  • Achieves 46.3% average accuracy across major benchmarks (BoolQ, PIQA, SIQA, etc.)

Frequently Asked Questions

Q: What makes this model unique?

MobileLLM-125M stands out for its optimized architecture specifically designed for on-device use cases, combining efficiency with strong performance through innovative techniques like grouped-query attention and shared embeddings.

Q: What are the recommended use cases?

The model is ideal for mobile and edge device applications requiring language understanding and generation capabilities while maintaining resource efficiency. It's particularly well-suited for tasks requiring commonsense reasoning within constrained computational environments.

The first platform built for prompt engineering