MobileLLM-1B

Maintained By
facebook

MobileLLM-1B

PropertyValue
Parameter Count1.01B parameters
Model TypeAuto-regressive Language Model
Architecture54 layers, 20 attention heads, 5 KV heads
LicenseCC-BY-NC-4.0
PaperarXiv:2402.14905

What is MobileLLM-1B?

MobileLLM-1B is a state-of-the-art language model specifically engineered for on-device applications. Developed by Meta, it represents a significant advancement in efficient AI model design, achieving remarkable performance while maintaining a relatively compact size of 1.01B parameters. The model was trained on 1T tokens of publicly available online data and features a context length of 2k tokens.

Implementation Details

The model incorporates several innovative architectural elements to optimize performance:

  • Deep and thin architecture with 54 layers and 20 attention heads
  • Grouped-query attention (GQA) with 5 KV heads for efficient processing
  • SwiGLU activation function for enhanced model capabilities
  • Shared embeddings to reduce parameter count
  • Token dimension of 1280

Core Capabilities

  • Achieves 57.3% average accuracy on zero-shot common sense reasoning tasks
  • Outperforms comparable models like TinyLlama-1.1B and Falcon-1B
  • Specifically optimized for mobile and on-device applications
  • Supports text generation with a 2k token context window

Frequently Asked Questions

Q: What makes this model unique?

MobileLLM-1B stands out for its efficient architecture that combines deep and thin design with grouped-query attention, making it particularly suitable for resource-constrained environments while maintaining competitive performance.

Q: What are the recommended use cases?

The model is specifically designed for on-device applications where computational resources are limited. It excels in tasks requiring common sense reasoning and general text generation, making it suitable for mobile applications and edge devices.

The first platform built for prompt engineering