Sheared-LLaMA-1.3B
Property | Value |
---|---|
Base Model | LLaMA-2-7B |
Training Tokens | 50B |
License | Apache 2.0 (following LLaMA2 terms) |
Paper | Research Paper |
What is Sheared-LLaMA-1.3B?
Sheared-LLaMA-1.3B is an optimized language model derived from LLaMA-2-7B through structured pruning and efficient pre-training. It achieves impressive performance metrics while requiring significantly fewer computational resources than its competitors. The model uses the RedPajama dataset for training, with 0.4B tokens for pruning and 50B tokens for continued pre-training.
Implementation Details
The model maintains the same vocabulary as LLaMA1 and LLaMA2 while reducing the parameter count to 1.3B. It can be easily implemented using HuggingFace's AutoModelForCausalLM class, making it accessible for various applications.
- Efficient architecture derived from LLaMA-2-7B
- Structured pruning methodology for parameter reduction
- Dynamic data loading from diverse domains
- Optimized for both performance and resource efficiency
Core Capabilities
- Achieves 51.0 average performance, outperforming similar-sized models
- Strong performance in reasoning and reading comprehension tasks
- Effective knowledge handling with reduced parameters
- Competitive results on benchmark tasks including ARC, HellaSwag, and MMLU
Frequently Asked Questions
Q: What makes this model unique?
The model's unique value proposition lies in its ability to achieve strong performance metrics with just 50B training tokens, compared to competitors requiring 300B+ tokens. It demonstrates that efficient pruning and training strategies can maintain performance while significantly reducing computational requirements.
Q: What are the recommended use cases?
Sheared-LLaMA-1.3B is well-suited for applications requiring balanced performance and efficiency, particularly in scenarios where computational resources are limited. It excels in tasks like reasoning, reading comprehension, and general language understanding.