LLaMA-65B-HF
Property | Value |
---|---|
Parameter Count | 65 Billion |
Model Type | Auto-regressive language model |
License | Non-commercial bespoke license |
Training Data | CCNet (67%), C4 (15%), GitHub (4.5%), Wikipedia (4.5%), Books (4.5%), ArXiv (2.5%), Stack Exchange (2%) |
Research Paper | LLaMA Paper |
What is LLaMA-65B-HF?
LLaMA-65B-HF is Meta AI's largest language model in the LLaMA series, featuring 65 billion parameters. Developed by the FAIR team between December 2022 and February 2023, it represents a significant advancement in efficient foundation language models. The model architecture includes 8192 dimension, 64 attention heads, and 80 layers, trained on 1.4T tokens.
Implementation Details
The model is built on the transformer architecture and optimized for research purposes. It features sophisticated training on multilingual data covering 20 languages, with a primary focus on English content.
- Architecture: 8192 dimension, 64 attention heads, 80 layers
- Learning rate: 1.5E-04
- Batch size: 4M tokens
- Training tokens: 1.4T
Core Capabilities
- Strong performance in common sense reasoning (85.3% on BoolQ)
- Advanced question answering and natural language understanding
- Multi-language support across 20 languages
- Research-focused capabilities for bias evaluation and mitigation
Frequently Asked Questions
Q: What makes this model unique?
LLaMA-65B-HF stands out for its efficient architecture and strong performance across various benchmarks, particularly in reasoning tasks. It achieves state-of-the-art results while maintaining computational efficiency.
Q: What are the recommended use cases?
The model is primarily intended for research purposes, including exploring language model capabilities, evaluating biases, and developing improvement techniques. It should not be used in production applications without proper risk evaluation and mitigation strategies.