LiteLlama-460M-1T

Property	Value
Parameter Count	460M
Training Tokens	1T (0.98T)
License	MIT
Author	Xiaotian Han (Texas A&M University)
Model Link	huggingface.co/ahxt/LiteLlama-460M-1T

What is LiteLlama-460M-1T?

LiteLlama-460M-1T is an innovative, compact reproduction of Meta AI's LLaMA 2 architecture that achieves impressive performance with just 460M parameters. Trained on approximately 1T tokens from the RedPajama dataset, this model represents a significant achievement in efficient language model design.

Implementation Details

The model utilizes the GPT2Tokenizer for text processing and was trained on a carefully curated portion of the RedPajama dataset. The training process involved approximately 0.98T tokens, calculated through 499,679 steps with a batch size of 1024x192.

Achieves 21.13% accuracy on zero-shot MMLU tasks
Demonstrates 26.39% accuracy on 5-shot MMLU evaluation
Performs competitively against larger models like TinyLlama-1.1B

Core Capabilities

Text generation and completion tasks
Strong performance on various benchmark tests including TruthfulQA (41.59%) and Winogrande (49.88%)
Efficient inference with reduced computational requirements
Easy integration with HuggingFace Transformers library

Frequently Asked Questions

Q: What makes this model unique?

LiteLlama-460M-1T stands out for achieving impressive performance metrics despite having significantly fewer parameters than its competitors. At just 460M parameters, it performs comparably to larger models on several benchmarks.

Q: What are the recommended use cases?

The model is well-suited for research applications, text generation tasks, and scenarios where computational efficiency is crucial. It performs particularly well on tasks like TruthfulQA and Winogrande, making it suitable for general language understanding applications.