decapoda-research-llama-7B-hf
Property | Value |
---|---|
Parameter Count | 7 Billion |
Model Type | Auto-regressive Language Model |
Architecture | Transformer (32 layers, 32 heads) |
License | Non-commercial bespoke license |
Training Data | CCNet (67%), C4 (15%), GitHub (4.5%), Wikipedia (4.5%), Books (4.5%), ArXiv (2.5%), Stack Exchange (2%) |
What is decapoda-research-llama-7B-hf?
This is a HuggingFace-compatible version of Meta AI's LLaMA 7B model, developed by the FAIR team between December 2022 and February 2023. It represents a significant achievement in efficient language model design, offering strong performance across multiple languages and reasoning tasks while maintaining a relatively compact 7B parameter size.
Implementation Details
The model features a dimension size of 4096, 32 attention heads, and 32 layers. It was trained with a learning rate of 3.0E-04 and a batch size of 4M tokens, processing approximately 1T tokens during training. The architecture focuses on efficiency and performance, particularly in research applications.
- Transformer-based architecture optimized for research applications
- Multi-lingual capability across 20 languages
- Comprehensive training data mix including academic, social, and programming sources
- State-of-the-art performance on various reasoning benchmarks
Core Capabilities
- Common sense reasoning (76.5% on BoolQ, 79.8% on PIQA)
- Natural language understanding and reading comprehension
- Question answering and knowledge retrieval
- Multi-lingual text processing
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its efficient architecture and strong performance across various reasoning tasks while maintaining a relatively small parameter count. It achieves impressive results on multiple benchmarks and supports 20 different languages.
Q: What are the recommended use cases?
The model is primarily intended for research purposes, including exploring language model capabilities, evaluating biases, and developing new NLP techniques. It should not be used for production applications without additional risk evaluation and mitigation strategies.