Llama-3-8b-64k-PoSE

Property	Value
Parameter Count	8.03B
Context Length	64,000 tokens
Base Model	Llama 3
Training Data	RedPajama V1 (300M tokens)
Paper	PoSE Paper

What is Llama-3-8b-64k-PoSE?

Llama-3-8b-64k-PoSE is an enhanced version of Meta's Llama 3 model that extends the context window from 8k to 64k tokens using Position-based Scaled Encoding (PoSE) technique. The model was specifically trained on the RedPajama V1 dataset, focusing on texts between 6k-8k tokens to optimize for long-range understanding.

Implementation Details

The model utilizes a rank stabilized LoRA of rank 256 and implements PoSE with a rope_theta value of 500,000.0 for context extension. After continued pre-training, the rope_theta was further increased to 2M to potentially extend the context beyond 64k tokens.

Built with Axolotl framework
Uses BF16 tensor type for efficient computation
Trained on select portions of RedPajama v1 dataset
Implements advanced position encoding techniques

Core Capabilities

Extended context processing up to 64k tokens
Enhanced long-range dependency modeling
Improved text generation and understanding
Compatible with standard transformer architectures

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its significantly extended context window (64k tokens) achieved through PoSE implementation, while maintaining the efficient 8B parameter count of Llama 3.

Q: What are the recommended use cases?

The model is particularly suited for tasks requiring long-context understanding such as document analysis, extended conversations, and processing of lengthy technical or academic texts.