decapoda-research-llama-7B-hf

Property	Value
Parameter Count	7 Billion
Model Type	Auto-regressive Language Model
Architecture	Transformer (32 layers, 32 heads)
License	Non-commercial bespoke license
Training Data	CCNet (67%), C4 (15%), GitHub (4.5%), Wikipedia (4.5%), Books (4.5%), ArXiv (2.5%), Stack Exchange (2%)

What is decapoda-research-llama-7B-hf?

This is a HuggingFace-compatible version of Meta AI's LLaMA 7B model, developed by the FAIR team between December 2022 and February 2023. It represents a significant achievement in efficient language model design, offering strong performance across multiple languages and reasoning tasks while maintaining a relatively compact 7B parameter size.

Implementation Details

The model features a dimension size of 4096, 32 attention heads, and 32 layers. It was trained with a learning rate of 3.0E-04 and a batch size of 4M tokens, processing approximately 1T tokens during training. The architecture focuses on efficiency and performance, particularly in research applications.

Transformer-based architecture optimized for research applications
Multi-lingual capability across 20 languages
Comprehensive training data mix including academic, social, and programming sources
State-of-the-art performance on various reasoning benchmarks

Core Capabilities

Common sense reasoning (76.5% on BoolQ, 79.8% on PIQA)
Natural language understanding and reading comprehension
Question answering and knowledge retrieval
Multi-lingual text processing

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its efficient architecture and strong performance across various reasoning tasks while maintaining a relatively small parameter count. It achieves impressive results on multiple benchmarks and supports 20 different languages.

Q: What are the recommended use cases?

The model is primarily intended for research purposes, including exploring language model capabilities, evaluating biases, and developing new NLP techniques. It should not be used for production applications without additional risk evaluation and mitigation strategies.