Llama-3-8B-Instruct-262k
Property | Value |
---|---|
Parameter Count | 8.03B |
Context Length | 262,144 tokens |
License | Llama3 |
Base Model | Meta Llama-3-8B-Instruct |
Training Data | SlimPajama + UltraChat |
What is Llama-3-8B-Instruct-262k?
Llama-3-8B-Instruct-262k is an enhanced version of Meta's Llama-3 8B model, developed by Gradient AI to extend the context length from 8k to over 262k tokens while maintaining high performance. The model leverages NTK-aware interpolation and progressive training techniques to achieve efficient long-context processing.
Implementation Details
The model was trained using a two-phase approach: initial training at 65K context length, followed by extension to 262K. It employs optimized RoPE theta scheduling and uses the EasyContext Blockwise RingAttention library for efficient training.
- Progressive context length training (65K → 262K tokens)
- Optimized RoPE theta values (15.3M → 207.1M)
- Training performed on NVIDIA L40S GPUs
- Total training tokens: ~164M across both phases
Core Capabilities
- Extended context processing up to 262K tokens
- Improved instruction-following abilities
- Maintained base model performance on standard benchmarks
- Efficient processing of long documents and conversations
Frequently Asked Questions
Q: What makes this model unique?
This model uniquely extends Llama-3's context window by over 32x while maintaining performance, achieved through innovative RoPE optimization and progressive training techniques.
Q: What are the recommended use cases?
The model excels at tasks requiring long context processing, including document analysis, extended conversations, and complex instruction-following scenarios that benefit from broader context awareness.