Sheared-LLaMA-1.3B

Property	Value
Base Model	LLaMA-2-7B
Training Tokens	50B
License	Apache 2.0 (following LLaMA2 terms)
Paper	Research Paper

What is Sheared-LLaMA-1.3B?

Sheared-LLaMA-1.3B is an optimized language model derived from LLaMA-2-7B through structured pruning and efficient pre-training. It achieves impressive performance metrics while requiring significantly fewer computational resources than its competitors. The model uses the RedPajama dataset for training, with 0.4B tokens for pruning and 50B tokens for continued pre-training.

Implementation Details

The model maintains the same vocabulary as LLaMA1 and LLaMA2 while reducing the parameter count to 1.3B. It can be easily implemented using HuggingFace's AutoModelForCausalLM class, making it accessible for various applications.

Efficient architecture derived from LLaMA-2-7B
Structured pruning methodology for parameter reduction
Dynamic data loading from diverse domains
Optimized for both performance and resource efficiency

Core Capabilities

Achieves 51.0 average performance, outperforming similar-sized models
Strong performance in reasoning and reading comprehension tasks
Effective knowledge handling with reduced parameters
Competitive results on benchmark tasks including ARC, HellaSwag, and MMLU

Frequently Asked Questions

Q: What makes this model unique?

The model's unique value proposition lies in its ability to achieve strong performance metrics with just 50B training tokens, compared to competitors requiring 300B+ tokens. It demonstrates that efficient pruning and training strategies can maintain performance while significantly reducing computational requirements.

Q: What are the recommended use cases?

Sheared-LLaMA-1.3B is well-suited for applications requiring balanced performance and efficiency, particularly in scenarios where computational resources are limited. It excels in tasks like reasoning, reading comprehension, and general language understanding.