Llama-2-7B-32K-Instruct
Property | Value |
---|---|
License | LLaMA2 |
Research Paper | Link |
Context Length | 32K tokens |
Training Data | 19K instructions (50%) + BookSum (25%) + MQA (25%) |
What is Llama-2-7B-32K-Instruct?
Llama-2-7B-32K-Instruct is an advanced open-source language model that extends the capabilities of the base LLaMA-2 architecture with an impressive 32K token context window. Built using the Together API, this model has been specifically fine-tuned on high-quality instruction and chat data, making it particularly effective for long-context applications like summarization and question-answering tasks.
Implementation Details
The model implements a sophisticated training approach combining three key components: 19,000 single and multi-round conversations generated using Llama-2-70B-Chat, long-context summarization data from BookSum, and Multi-document Question Answering (MQA) datasets. The implementation requires Flash Attention V2 for optimal performance and can be easily accessed through the Together API or deployed locally.
- Built with less than 200 lines of Python code using Together API
- Incorporates Flash Attention V2 for enhanced performance
- Supports both API-based and local deployment options
- Uses special instruction tokens [INST] and [/INST] for input formatting
Core Capabilities
- Extended context window handling up to 32K tokens
- Strong performance in long-document summarization tasks
- Competitive results in multi-document question answering
- Matches or exceeds GPT-3.5-Turbo-16K on various benchmarks
- Achieves 70.36% win rate on Alpaca Eval metrics
Frequently Asked Questions
Q: What makes this model unique?
The model's primary distinction lies in its extended 32K token context window combined with instruction-tuning, making it particularly effective for long-form content processing while maintaining strong performance on standard chat and instruction-following tasks.
Q: What are the recommended use cases?
The model excels in long-document summarization, multi-document question answering, and general instruction-following tasks. It's particularly suitable for applications requiring processing of lengthy documents or multiple sources of information simultaneously.