Llama-3-8b-64k-PoSE

Maintained By
winglian

Llama-3-8b-64k-PoSE

PropertyValue
Parameter Count8.03B
Context Length64,000 tokens
Base ModelLlama 3
Training DataRedPajama V1 (300M tokens)
PaperPoSE Paper

What is Llama-3-8b-64k-PoSE?

Llama-3-8b-64k-PoSE is an enhanced version of Meta's Llama 3 model that extends the context window from 8k to 64k tokens using Position-based Scaled Encoding (PoSE) technique. The model was specifically trained on the RedPajama V1 dataset, focusing on texts between 6k-8k tokens to optimize for long-range understanding.

Implementation Details

The model utilizes a rank stabilized LoRA of rank 256 and implements PoSE with a rope_theta value of 500,000.0 for context extension. After continued pre-training, the rope_theta was further increased to 2M to potentially extend the context beyond 64k tokens.

  • Built with Axolotl framework
  • Uses BF16 tensor type for efficient computation
  • Trained on select portions of RedPajama v1 dataset
  • Implements advanced position encoding techniques

Core Capabilities

  • Extended context processing up to 64k tokens
  • Enhanced long-range dependency modeling
  • Improved text generation and understanding
  • Compatible with standard transformer architectures

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its significantly extended context window (64k tokens) achieved through PoSE implementation, while maintaining the efficient 8B parameter count of Llama 3.

Q: What are the recommended use cases?

The model is particularly suited for tasks requiring long-context understanding such as document analysis, extended conversations, and processing of lengthy technical or academic texts.

The first platform built for prompt engineering