Phi-3-medium-128k-instruct

Property	Value
Parameter Count	14B
Context Length	128,000 tokens
License	MIT
Developer	Microsoft
Training Data	4.8T tokens

What is Phi-3-medium-128k-instruct?

Phi-3-medium-128k-instruct is Microsoft's state-of-the-art language model designed for lightweight yet powerful AI applications. As part of the Phi-3 family, this 14B parameter model stands out for its impressive 128k token context window and specialized training focusing on reasoning capabilities.

Implementation Details

The model utilizes a dense decoder-only Transformer architecture, fine-tuned through Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO). It's optimized for BF16 precision and requires modern GPU hardware like NVIDIA A100, A6000, or H100 for optimal performance.

Trained on 4.8T tokens including 10% multilingual content
Supports vocabulary size up to 32,064 tokens
Implements Flash Attention for improved performance
Compatible with ONNX runtime for cross-platform deployment

Core Capabilities

Strong reasoning performance in math, code, and logic tasks
Extensive context handling up to 128k tokens
Multilingual support with focus on English
Optimized for memory/compute constrained environments
Exceptional performance in benchmarks compared to larger models

Frequently Asked Questions

Q: What makes this model unique?

The model combines relatively small size (14B parameters) with exceptional reasoning capabilities and a large 128k context window, making it particularly suitable for resource-constrained commercial applications while maintaining competitive performance against larger models.

Q: What are the recommended use cases?

The model excels in scenarios requiring strong reasoning capabilities, particularly in code generation, mathematical problem-solving, and logical reasoning tasks. It's ideal for applications with memory/compute constraints or latency-sensitive requirements.