MobileLLM-125M
Property | Value |
---|---|
Parameter Count | 124.6M |
License | CC-BY-NC-4.0 |
Training Data | 1T tokens of public data |
Context Length | 2k tokens |
Paper | arXiv:2402.14905 |
What is MobileLLM-125M?
MobileLLM-125M is an innovative language model specifically engineered for on-device applications, developed by Meta. It represents a significant advancement in efficient AI modeling, achieving a 2.7% accuracy improvement over previous state-of-the-art models of similar size in zero-shot commonsense reasoning tasks.
Implementation Details
The model features a sophisticated architecture with 30 layers, 9 attention heads, and 3 KV heads, operating with a token dimension of 576. It utilizes several optimization techniques including SwiGLU activation, deep and thin architectures, embedding sharing, and grouped-query attention.
- Training completed in approximately 3 days using 32 NVIDIA A100 80G GPUs
- Implements FP16 precision for efficient computation
- Features grouped-query attention (GQA) for improved performance
- Utilizes shared embeddings to reduce parameter count
Core Capabilities
- Zero-shot commonsense reasoning with superior performance on multiple benchmarks
- Efficient text generation optimized for mobile devices
- Handles context lengths up to 2000 tokens
- Achieves 46.3% average accuracy across major benchmarks (BoolQ, PIQA, SIQA, etc.)
Frequently Asked Questions
Q: What makes this model unique?
MobileLLM-125M stands out for its optimized architecture specifically designed for on-device use cases, combining efficiency with strong performance through innovative techniques like grouped-query attention and shared embeddings.
Q: What are the recommended use cases?
The model is ideal for mobile and edge device applications requiring language understanding and generation capabilities while maintaining resource efficiency. It's particularly well-suited for tasks requiring commonsense reasoning within constrained computational environments.