Phi-3-small-8k-instruct
Property | Value |
---|---|
Parameter Count | 7.39B |
Context Length | 8K tokens |
License | MIT |
Author | Microsoft |
Training Data | 4.8T tokens |
What is Phi-3-small-8k-instruct?
Phi-3-small-8k-instruct is a state-of-the-art lightweight language model developed by Microsoft, designed for efficient reasoning and instruction following. This 7B parameter model represents a careful balance between model size and performance, trained on a curated dataset of 4.8T tokens with a focus on high-quality and reasoning-dense properties.
Implementation Details
The model utilizes a dense decoder-only Transformer architecture with alternating dense and blocksparse attentions. It has undergone both supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) to ensure alignment with human preferences and safety guidelines.
- Supports 8K token context window
- Implements Flash Attention 2 and Triton blocksparse attention
- Optimized for NVIDIA A100, A6000, and H100 GPUs
- Vocabulary size of 100,352 tokens
Core Capabilities
- Strong reasoning performance in code, math, and logic tasks
- Efficient operation in memory/compute constrained environments
- Multilingual support with 10% of training data being multilingual
- Optimized for instruction-following and chat-based interactions
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its exceptional performance-to-size ratio, achieving state-of-the-art results among similar-sized models in benchmarks testing common sense, language understanding, math, and code generation. It's particularly notable for matching or exceeding the performance of larger models in reasoning tasks.
Q: What are the recommended use cases?
The model is ideal for general-purpose AI systems requiring low latency, especially in scenarios involving reasoning tasks, code generation, and mathematical problem-solving. It's particularly well-suited for commercial and research applications where computational resources are constrained.