Sheared-LLaMA-2.7B

Maintained By
princeton-nlp

Sheared-LLaMA-2.7B

PropertyValue
Base ModelLLaMA2-7B
Training Tokens50B
LicenseApache 2.0 (Must comply with LLaMA2 license)
PaperResearch Paper

What is Sheared-LLaMA-2.7B?

Sheared-LLaMA-2.7B is an innovative language model derived from LLaMA2-7B through structured pruning and efficient pre-training. It represents a significant advancement in model efficiency, achieving superior performance with just 50B training tokens compared to models trained on much larger datasets.

Implementation Details

The model employs a sophisticated pruning and pre-training approach, utilizing 0.4B tokens for initial pruning followed by 50B tokens for continued pre-training. It leverages the RedPajama dataset with dynamic loading from various domains, maintaining the same vocabulary as LLaMA1 and LLaMA2.

  • Efficiently pruned architecture from LLaMA2-7B
  • Trained on dynamically loaded RedPajama dataset
  • Implements structured pruning techniques
  • Compatible with Hugging Face's AutoModelForCausalLM

Core Capabilities

  • Achieves 56.7% average performance across various tasks
  • Outperforms larger models like OPT-2.7B and Pythia-2.8B
  • Excels in reasoning, reading comprehension, and knowledge-intensive tasks
  • Maintains strong performance despite reduced parameter count

Frequently Asked Questions

Q: What makes this model unique?

The model achieves remarkable efficiency through structured pruning, requiring only 50B training tokens while outperforming models trained on 300B-1T tokens. This makes it more accessible for deployment while maintaining high performance.

Q: What are the recommended use cases?

The model is well-suited for general language tasks including reasoning, reading comprehension, and knowledge-intensive applications. It's particularly valuable in scenarios where computational efficiency is crucial but high performance is required.

The first platform built for prompt engineering