Phi-3-medium-4k-instruct

Maintained By
microsoft

Phi-3-medium-4k-instruct

PropertyValue
Parameter Count14B
Context Length4K tokens
LicenseMIT
Training Data4.8T tokens
ArchitectureDense decoder-only Transformer

What is Phi-3-medium-4k-instruct?

Phi-3-medium-4k-instruct is a state-of-the-art language model developed by Microsoft, featuring 14B parameters and optimized for a 4K token context window. It's part of the Phi-3 family and has undergone extensive supervised fine-tuning and direct preference optimization to ensure high-quality outputs and safety compliance.

Implementation Details

The model utilizes a dense decoder-only Transformer architecture and was trained on 512 H100-80G GPUs over 42 days. It incorporates advanced features like Flash-Attention and supports multilingual capabilities with 10% of its training data being non-English content.

  • Built with PyTorch and DeepSpeed for optimal performance
  • Supports vocabulary size of 32,064 tokens
  • Optimized for chat-based interactions using specific formatting
  • Available in multiple formats including ONNX for cross-platform deployment

Core Capabilities

  • Strong performance in reasoning tasks, particularly in code and mathematics
  • Excels in memory/compute constrained environments
  • Demonstrated high benchmark scores competing with larger models
  • Specialized in instruction-following and safety-aware responses

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its efficient architecture and strong performance despite its relatively smaller size, achieving competitive results against larger models like GPT-3.5 and Mixtral-8x7B in various benchmarks, particularly in reasoning tasks.

Q: What are the recommended use cases?

The model is ideal for general-purpose AI systems, especially in scenarios requiring strong reasoning capabilities, code generation, and mathematical problem-solving. It's particularly well-suited for deployment in resource-constrained environments or latency-sensitive applications.

The first platform built for prompt engineering