Llama-3-8B-Instruct-262k

Property	Value
Parameter Count	8.03B
Context Length	262,144 tokens
License	Llama3
Base Model	Meta Llama-3-8B-Instruct
Training Data	SlimPajama + UltraChat

What is Llama-3-8B-Instruct-262k?

Llama-3-8B-Instruct-262k is an enhanced version of Meta's Llama-3 8B model, developed by Gradient AI to extend the context length from 8k to over 262k tokens while maintaining high performance. The model leverages NTK-aware interpolation and progressive training techniques to achieve efficient long-context processing.

Implementation Details

The model was trained using a two-phase approach: initial training at 65K context length, followed by extension to 262K. It employs optimized RoPE theta scheduling and uses the EasyContext Blockwise RingAttention library for efficient training.

Progressive context length training (65K → 262K tokens)
Optimized RoPE theta values (15.3M → 207.1M)
Training performed on NVIDIA L40S GPUs
Total training tokens: ~164M across both phases

Core Capabilities

Extended context processing up to 262K tokens
Improved instruction-following abilities
Maintained base model performance on standard benchmarks
Efficient processing of long documents and conversations

Frequently Asked Questions

Q: What makes this model unique?

This model uniquely extends Llama-3's context window by over 32x while maintaining performance, achieved through innovative RoPE optimization and progressive training techniques.

Q: What are the recommended use cases?

The model excels at tasks requiring long context processing, including document analysis, extended conversations, and complex instruction-following scenarios that benefit from broader context awareness.