Llama-3-8b-64k-PoSE
Property | Value |
---|---|
Parameter Count | 8.03B |
Context Length | 64,000 tokens |
Base Model | Llama 3 |
Training Data | RedPajama V1 (300M tokens) |
Paper | PoSE Paper |
What is Llama-3-8b-64k-PoSE?
Llama-3-8b-64k-PoSE is an enhanced version of Meta's Llama 3 model that extends the context window from 8k to 64k tokens using Position-based Scaled Encoding (PoSE) technique. The model was specifically trained on the RedPajama V1 dataset, focusing on texts between 6k-8k tokens to optimize for long-range understanding.
Implementation Details
The model utilizes a rank stabilized LoRA of rank 256 and implements PoSE with a rope_theta value of 500,000.0 for context extension. After continued pre-training, the rope_theta was further increased to 2M to potentially extend the context beyond 64k tokens.
- Built with Axolotl framework
- Uses BF16 tensor type for efficient computation
- Trained on select portions of RedPajama v1 dataset
- Implements advanced position encoding techniques
Core Capabilities
- Extended context processing up to 64k tokens
- Enhanced long-range dependency modeling
- Improved text generation and understanding
- Compatible with standard transformer architectures
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its significantly extended context window (64k tokens) achieved through PoSE implementation, while maintaining the efficient 8B parameter count of Llama 3.
Q: What are the recommended use cases?
The model is particularly suited for tasks requiring long-context understanding such as document analysis, extended conversations, and processing of lengthy technical or academic texts.