Llama-2-13b-hf

Property	Value
Parameter Count	13B
Training Tokens	2 trillion
Context Length	4k tokens
License	Meta Custom Commercial License
Training Period	January 2023 - July 2023

What is Llama-2-13b-hf?

Llama-2-13b-hf is part of Meta's advanced Llama 2 family of large language models, specifically optimized for the Hugging Face Transformers format. This 13B parameter model represents a sweet spot between computational efficiency and powerful language understanding capabilities. Trained on 2 trillion tokens of publicly available data, it offers impressive performance across various natural language tasks.

Implementation Details

The model utilizes an optimized transformer architecture, trained with a global batch size of 4M tokens and a learning rate of 3.0 x 10^-4. Unlike its larger 70B counterpart, this model doesn't implement Grouped-Query Attention (GQA), maintaining a simpler architecture while still delivering strong performance.

Pretraining data cutoff: September 2022
Training infrastructure: Meta's Research Super Cluster
Environmental impact: 62.44 tCO2eq (100% offset)
Supports both F32 and FP16 tensor types

Core Capabilities

Strong performance in commonsense reasoning (66.9% accuracy)
Effective reading comprehension (65.8% accuracy)
Enhanced world knowledge capabilities (55.4% accuracy)
Improved truthfulness (41.86% on TruthfulQA)
Mathematical reasoning capabilities (28.7% accuracy)

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its balanced approach between model size and performance, offering significant improvements over its predecessor while maintaining practical deployment requirements. It demonstrates particularly strong gains in code generation and mathematical reasoning compared to Llama 1.

Q: What are the recommended use cases?

The model is specifically designed for commercial and research use in English, supporting various natural language generation tasks. It's particularly well-suited for applications requiring strong reasoning capabilities and factual accuracy, though it should be used with appropriate safety testing and tuning for specific applications.

Llama-2-13b-hf

Llama-2-13b-hf

What is Llama-2-13b-hf?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models