LLaMA-65B-HF

Property	Value
Parameter Count	65 Billion
Model Type	Auto-regressive language model
License	Non-commercial bespoke license
Training Data	CCNet (67%), C4 (15%), GitHub (4.5%), Wikipedia (4.5%), Books (4.5%), ArXiv (2.5%), Stack Exchange (2%)
Research Paper	LLaMA Paper

What is LLaMA-65B-HF?

LLaMA-65B-HF is Meta AI's largest language model in the LLaMA series, featuring 65 billion parameters. Developed by the FAIR team between December 2022 and February 2023, it represents a significant advancement in efficient foundation language models. The model architecture includes 8192 dimension, 64 attention heads, and 80 layers, trained on 1.4T tokens.

Implementation Details

The model is built on the transformer architecture and optimized for research purposes. It features sophisticated training on multilingual data covering 20 languages, with a primary focus on English content.

Architecture: 8192 dimension, 64 attention heads, 80 layers
Learning rate: 1.5E-04
Batch size: 4M tokens
Training tokens: 1.4T

Core Capabilities

Strong performance in common sense reasoning (85.3% on BoolQ)
Advanced question answering and natural language understanding
Multi-language support across 20 languages
Research-focused capabilities for bias evaluation and mitigation

Frequently Asked Questions

Q: What makes this model unique?

LLaMA-65B-HF stands out for its efficient architecture and strong performance across various benchmarks, particularly in reasoning tasks. It achieves state-of-the-art results while maintaining computational efficiency.

Q: What are the recommended use cases?

The model is primarily intended for research purposes, including exploring language model capabilities, evaluating biases, and developing improvement techniques. It should not be used in production applications without proper risk evaluation and mitigation strategies.

LLaMA-65B-HF

LLaMA-65B-HF

What is LLaMA-65B-HF?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models