Phi-3-medium-128k-instruct
Property | Value |
---|---|
Parameter Count | 14B |
Context Length | 128,000 tokens |
License | MIT |
Developer | Microsoft |
Training Data | 4.8T tokens |
What is Phi-3-medium-128k-instruct?
Phi-3-medium-128k-instruct is Microsoft's state-of-the-art language model designed for lightweight yet powerful AI applications. As part of the Phi-3 family, this 14B parameter model stands out for its impressive 128k token context window and specialized training focusing on reasoning capabilities.
Implementation Details
The model utilizes a dense decoder-only Transformer architecture, fine-tuned through Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO). It's optimized for BF16 precision and requires modern GPU hardware like NVIDIA A100, A6000, or H100 for optimal performance.
- Trained on 4.8T tokens including 10% multilingual content
- Supports vocabulary size up to 32,064 tokens
- Implements Flash Attention for improved performance
- Compatible with ONNX runtime for cross-platform deployment
Core Capabilities
- Strong reasoning performance in math, code, and logic tasks
- Extensive context handling up to 128k tokens
- Multilingual support with focus on English
- Optimized for memory/compute constrained environments
- Exceptional performance in benchmarks compared to larger models
Frequently Asked Questions
Q: What makes this model unique?
The model combines relatively small size (14B parameters) with exceptional reasoning capabilities and a large 128k context window, making it particularly suitable for resource-constrained commercial applications while maintaining competitive performance against larger models.
Q: What are the recommended use cases?
The model excels in scenarios requiring strong reasoning capabilities, particularly in code generation, mathematical problem-solving, and logical reasoning tasks. It's ideal for applications with memory/compute constraints or latency-sensitive requirements.