Phi-3-medium-4k-instruct
Property | Value |
---|---|
Parameter Count | 14B |
Context Length | 4K tokens |
License | MIT |
Training Data | 4.8T tokens |
Architecture | Dense decoder-only Transformer |
What is Phi-3-medium-4k-instruct?
Phi-3-medium-4k-instruct is a state-of-the-art language model developed by Microsoft, featuring 14B parameters and optimized for a 4K token context window. It's part of the Phi-3 family and has undergone extensive supervised fine-tuning and direct preference optimization to ensure high-quality outputs and safety compliance.
Implementation Details
The model utilizes a dense decoder-only Transformer architecture and was trained on 512 H100-80G GPUs over 42 days. It incorporates advanced features like Flash-Attention and supports multilingual capabilities with 10% of its training data being non-English content.
- Built with PyTorch and DeepSpeed for optimal performance
- Supports vocabulary size of 32,064 tokens
- Optimized for chat-based interactions using specific formatting
- Available in multiple formats including ONNX for cross-platform deployment
Core Capabilities
- Strong performance in reasoning tasks, particularly in code and mathematics
- Excels in memory/compute constrained environments
- Demonstrated high benchmark scores competing with larger models
- Specialized in instruction-following and safety-aware responses
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its efficient architecture and strong performance despite its relatively smaller size, achieving competitive results against larger models like GPT-3.5 and Mixtral-8x7B in various benchmarks, particularly in reasoning tasks.
Q: What are the recommended use cases?
The model is ideal for general-purpose AI systems, especially in scenarios requiring strong reasoning capabilities, code generation, and mathematical problem-solving. It's particularly well-suited for deployment in resource-constrained environments or latency-sensitive applications.