LiteLlama-460M-1T
Property | Value |
---|---|
Parameter Count | 460M |
Training Tokens | 1T (0.98T) |
License | MIT |
Author | Xiaotian Han (Texas A&M University) |
Model Link | huggingface.co/ahxt/LiteLlama-460M-1T |
What is LiteLlama-460M-1T?
LiteLlama-460M-1T is an innovative, compact reproduction of Meta AI's LLaMA 2 architecture that achieves impressive performance with just 460M parameters. Trained on approximately 1T tokens from the RedPajama dataset, this model represents a significant achievement in efficient language model design.
Implementation Details
The model utilizes the GPT2Tokenizer for text processing and was trained on a carefully curated portion of the RedPajama dataset. The training process involved approximately 0.98T tokens, calculated through 499,679 steps with a batch size of 1024x192.
- Achieves 21.13% accuracy on zero-shot MMLU tasks
- Demonstrates 26.39% accuracy on 5-shot MMLU evaluation
- Performs competitively against larger models like TinyLlama-1.1B
Core Capabilities
- Text generation and completion tasks
- Strong performance on various benchmark tests including TruthfulQA (41.59%) and Winogrande (49.88%)
- Efficient inference with reduced computational requirements
- Easy integration with HuggingFace Transformers library
Frequently Asked Questions
Q: What makes this model unique?
LiteLlama-460M-1T stands out for achieving impressive performance metrics despite having significantly fewer parameters than its competitors. At just 460M parameters, it performs comparably to larger models on several benchmarks.
Q: What are the recommended use cases?
The model is well-suited for research applications, text generation tasks, and scenarios where computational efficiency is crucial. It performs particularly well on tasks like TruthfulQA and Winogrande, making it suitable for general language understanding applications.