Llama-2-13b-hf
Property | Value |
---|---|
Parameter Count | 13B |
Training Tokens | 2 trillion |
Context Length | 4k tokens |
License | Meta Custom Commercial License |
Training Period | January 2023 - July 2023 |
What is Llama-2-13b-hf?
Llama-2-13b-hf is part of Meta's advanced Llama 2 family of large language models, specifically optimized for the Hugging Face Transformers format. This 13B parameter model represents a sweet spot between computational efficiency and powerful language understanding capabilities. Trained on 2 trillion tokens of publicly available data, it offers impressive performance across various natural language tasks.
Implementation Details
The model utilizes an optimized transformer architecture, trained with a global batch size of 4M tokens and a learning rate of 3.0 x 10^-4. Unlike its larger 70B counterpart, this model doesn't implement Grouped-Query Attention (GQA), maintaining a simpler architecture while still delivering strong performance.
- Pretraining data cutoff: September 2022
- Training infrastructure: Meta's Research Super Cluster
- Environmental impact: 62.44 tCO2eq (100% offset)
- Supports both F32 and FP16 tensor types
Core Capabilities
- Strong performance in commonsense reasoning (66.9% accuracy)
- Effective reading comprehension (65.8% accuracy)
- Enhanced world knowledge capabilities (55.4% accuracy)
- Improved truthfulness (41.86% on TruthfulQA)
- Mathematical reasoning capabilities (28.7% accuracy)
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its balanced approach between model size and performance, offering significant improvements over its predecessor while maintaining practical deployment requirements. It demonstrates particularly strong gains in code generation and mathematical reasoning compared to Llama 1.
Q: What are the recommended use cases?
The model is specifically designed for commercial and research use in English, supporting various natural language generation tasks. It's particularly well-suited for applications requiring strong reasoning capabilities and factual accuracy, though it should be used with appropriate safety testing and tuning for specific applications.