Sheared-LLaMA-2.7B

Property	Value
Base Model	LLaMA2-7B
Training Tokens	50B
License	Apache 2.0 (Must comply with LLaMA2 license)
Paper	Research Paper

What is Sheared-LLaMA-2.7B?

Sheared-LLaMA-2.7B is an innovative language model derived from LLaMA2-7B through structured pruning and efficient pre-training. It represents a significant advancement in model efficiency, achieving superior performance with just 50B training tokens compared to models trained on much larger datasets.

Implementation Details

The model employs a sophisticated pruning and pre-training approach, utilizing 0.4B tokens for initial pruning followed by 50B tokens for continued pre-training. It leverages the RedPajama dataset with dynamic loading from various domains, maintaining the same vocabulary as LLaMA1 and LLaMA2.

Efficiently pruned architecture from LLaMA2-7B
Trained on dynamically loaded RedPajama dataset
Implements structured pruning techniques
Compatible with Hugging Face's AutoModelForCausalLM

Core Capabilities

Achieves 56.7% average performance across various tasks
Outperforms larger models like OPT-2.7B and Pythia-2.8B
Excels in reasoning, reading comprehension, and knowledge-intensive tasks
Maintains strong performance despite reduced parameter count

Frequently Asked Questions

Q: What makes this model unique?

The model achieves remarkable efficiency through structured pruning, requiring only 50B training tokens while outperforming models trained on 300B-1T tokens. This makes it more accessible for deployment while maintaining high performance.

Q: What are the recommended use cases?

The model is well-suited for general language tasks including reasoning, reading comprehension, and knowledge-intensive applications. It's particularly valuable in scenarios where computational efficiency is crucial but high performance is required.