Llama-3-8B-Instruct-262k

Maintained By
gradientai

Llama-3-8B-Instruct-262k

PropertyValue
Parameter Count8.03B
Context Length262,144 tokens
LicenseLlama3
Base ModelMeta Llama-3-8B-Instruct
Training DataSlimPajama + UltraChat

What is Llama-3-8B-Instruct-262k?

Llama-3-8B-Instruct-262k is an enhanced version of Meta's Llama-3 8B model, developed by Gradient AI to extend the context length from 8k to over 262k tokens while maintaining high performance. The model leverages NTK-aware interpolation and progressive training techniques to achieve efficient long-context processing.

Implementation Details

The model was trained using a two-phase approach: initial training at 65K context length, followed by extension to 262K. It employs optimized RoPE theta scheduling and uses the EasyContext Blockwise RingAttention library for efficient training.

  • Progressive context length training (65K → 262K tokens)
  • Optimized RoPE theta values (15.3M → 207.1M)
  • Training performed on NVIDIA L40S GPUs
  • Total training tokens: ~164M across both phases

Core Capabilities

  • Extended context processing up to 262K tokens
  • Improved instruction-following abilities
  • Maintained base model performance on standard benchmarks
  • Efficient processing of long documents and conversations

Frequently Asked Questions

Q: What makes this model unique?

This model uniquely extends Llama-3's context window by over 32x while maintaining performance, achieved through innovative RoPE optimization and progressive training techniques.

Q: What are the recommended use cases?

The model excels at tasks requiring long context processing, including document analysis, extended conversations, and complex instruction-following scenarios that benefit from broader context awareness.

The first platform built for prompt engineering