Phi-3-small-128k-instruct
Property | Value |
---|---|
Parameter Count | 7.39B |
Context Length | 128,000 tokens |
License | MIT |
Author | Microsoft |
What is Phi-3-small-128k-instruct?
Phi-3-small-128k-instruct is a state-of-the-art lightweight language model developed by Microsoft, designed to deliver exceptional performance in reasoning tasks while maintaining efficiency. This 7B parameter model features an impressive 128K token context window and has been specifically optimized through supervised fine-tuning and direct preference optimization for instruction-following and safety measures.
Implementation Details
The model utilizes a dense decoder-only Transformer architecture with alternating dense and blocksparse attention mechanisms. It was trained on 4.8T tokens including high-quality educational data, synthetic textbook-like content, and carefully filtered public documents.
- Supports multiple languages with 10% multilingual training data
- Optimized for memory/compute constrained environments
- Implements Flash Attention 2 and Triton blocksparse attention
- Compatible with NVIDIA A100, A6000, and H100 GPUs
Core Capabilities
- Strong performance in reasoning tasks, especially code, math, and logic
- Extended context handling up to 128K tokens
- High accuracy in benchmark tests, outperforming larger models in specific tasks
- Efficient processing in latency-bound scenarios
- Robust safety measures through preference optimization
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its ability to match or exceed the performance of larger models while maintaining a relatively small 7B parameter size. It particularly excels in reasoning tasks and offers an exceptional 128K token context window, making it suitable for processing lengthy documents and complex problems.
Q: What are the recommended use cases?
The model is ideal for applications requiring strong reasoning capabilities, particularly in code generation, mathematical problem-solving, and logical reasoning. It's especially suitable for deployment in resource-constrained environments or applications requiring low-latency responses.