Sheared-LLaMA-2.7B
Property | Value |
---|---|
Base Model | LLaMA2-7B |
Training Tokens | 50B |
License | Apache 2.0 (Must comply with LLaMA2 license) |
Paper | Research Paper |
What is Sheared-LLaMA-2.7B?
Sheared-LLaMA-2.7B is an innovative language model derived from LLaMA2-7B through structured pruning and efficient pre-training. It represents a significant advancement in model efficiency, achieving superior performance with just 50B training tokens compared to models trained on much larger datasets.
Implementation Details
The model employs a sophisticated pruning and pre-training approach, utilizing 0.4B tokens for initial pruning followed by 50B tokens for continued pre-training. It leverages the RedPajama dataset with dynamic loading from various domains, maintaining the same vocabulary as LLaMA1 and LLaMA2.
- Efficiently pruned architecture from LLaMA2-7B
- Trained on dynamically loaded RedPajama dataset
- Implements structured pruning techniques
- Compatible with Hugging Face's AutoModelForCausalLM
Core Capabilities
- Achieves 56.7% average performance across various tasks
- Outperforms larger models like OPT-2.7B and Pythia-2.8B
- Excels in reasoning, reading comprehension, and knowledge-intensive tasks
- Maintains strong performance despite reduced parameter count
Frequently Asked Questions
Q: What makes this model unique?
The model achieves remarkable efficiency through structured pruning, requiring only 50B training tokens while outperforming models trained on 300B-1T tokens. This makes it more accessible for deployment while maintaining high performance.
Q: What are the recommended use cases?
The model is well-suited for general language tasks including reasoning, reading comprehension, and knowledge-intensive applications. It's particularly valuable in scenarios where computational efficiency is crucial but high performance is required.