Sheared-LLaMA-1.3B

Maintained By
princeton-nlp

Sheared-LLaMA-1.3B

PropertyValue
Base ModelLLaMA-2-7B
Training Tokens50B
LicenseApache 2.0 (following LLaMA2 terms)
PaperResearch Paper

What is Sheared-LLaMA-1.3B?

Sheared-LLaMA-1.3B is an optimized language model derived from LLaMA-2-7B through structured pruning and efficient pre-training. It achieves impressive performance metrics while requiring significantly fewer computational resources than its competitors. The model uses the RedPajama dataset for training, with 0.4B tokens for pruning and 50B tokens for continued pre-training.

Implementation Details

The model maintains the same vocabulary as LLaMA1 and LLaMA2 while reducing the parameter count to 1.3B. It can be easily implemented using HuggingFace's AutoModelForCausalLM class, making it accessible for various applications.

  • Efficient architecture derived from LLaMA-2-7B
  • Structured pruning methodology for parameter reduction
  • Dynamic data loading from diverse domains
  • Optimized for both performance and resource efficiency

Core Capabilities

  • Achieves 51.0 average performance, outperforming similar-sized models
  • Strong performance in reasoning and reading comprehension tasks
  • Effective knowledge handling with reduced parameters
  • Competitive results on benchmark tasks including ARC, HellaSwag, and MMLU

Frequently Asked Questions

Q: What makes this model unique?

The model's unique value proposition lies in its ability to achieve strong performance metrics with just 50B training tokens, compared to competitors requiring 300B+ tokens. It demonstrates that efficient pruning and training strategies can maintain performance while significantly reducing computational requirements.

Q: What are the recommended use cases?

Sheared-LLaMA-1.3B is well-suited for applications requiring balanced performance and efficiency, particularly in scenarios where computational resources are limited. It excels in tasks like reasoning, reading comprehension, and general language understanding.

The first platform built for prompt engineering