MobileLLM-125M

Property	Value
Parameter Count	124.6M
License	CC-BY-NC-4.0
Training Data	1T tokens of public data
Context Length	2k tokens
Paper	arXiv:2402.14905

What is MobileLLM-125M?

MobileLLM-125M is an innovative language model specifically engineered for on-device applications, developed by Meta. It represents a significant advancement in efficient AI modeling, achieving a 2.7% accuracy improvement over previous state-of-the-art models of similar size in zero-shot commonsense reasoning tasks.

Implementation Details

The model features a sophisticated architecture with 30 layers, 9 attention heads, and 3 KV heads, operating with a token dimension of 576. It utilizes several optimization techniques including SwiGLU activation, deep and thin architectures, embedding sharing, and grouped-query attention.

Training completed in approximately 3 days using 32 NVIDIA A100 80G GPUs
Implements FP16 precision for efficient computation
Features grouped-query attention (GQA) for improved performance
Utilizes shared embeddings to reduce parameter count

Core Capabilities

Zero-shot commonsense reasoning with superior performance on multiple benchmarks
Efficient text generation optimized for mobile devices
Handles context lengths up to 2000 tokens
Achieves 46.3% average accuracy across major benchmarks (BoolQ, PIQA, SIQA, etc.)

Frequently Asked Questions

Q: What makes this model unique?

MobileLLM-125M stands out for its optimized architecture specifically designed for on-device use cases, combining efficiency with strong performance through innovative techniques like grouped-query attention and shared embeddings.

Q: What are the recommended use cases?

The model is ideal for mobile and edge device applications requiring language understanding and generation capabilities while maintaining resource efficiency. It's particularly well-suited for tasks requiring commonsense reasoning within constrained computational environments.

MobileLLM-125M

MobileLLM-125M

What is MobileLLM-125M?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models

The first platform built for prompt engineering