Phi-3-medium-128k-instruct

Maintained By
microsoft

Phi-3-medium-128k-instruct

PropertyValue
Parameter Count14B
Context Length128,000 tokens
LicenseMIT
DeveloperMicrosoft
Training Data4.8T tokens

What is Phi-3-medium-128k-instruct?

Phi-3-medium-128k-instruct is Microsoft's state-of-the-art language model designed for lightweight yet powerful AI applications. As part of the Phi-3 family, this 14B parameter model stands out for its impressive 128k token context window and specialized training focusing on reasoning capabilities.

Implementation Details

The model utilizes a dense decoder-only Transformer architecture, fine-tuned through Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO). It's optimized for BF16 precision and requires modern GPU hardware like NVIDIA A100, A6000, or H100 for optimal performance.

  • Trained on 4.8T tokens including 10% multilingual content
  • Supports vocabulary size up to 32,064 tokens
  • Implements Flash Attention for improved performance
  • Compatible with ONNX runtime for cross-platform deployment

Core Capabilities

  • Strong reasoning performance in math, code, and logic tasks
  • Extensive context handling up to 128k tokens
  • Multilingual support with focus on English
  • Optimized for memory/compute constrained environments
  • Exceptional performance in benchmarks compared to larger models

Frequently Asked Questions

Q: What makes this model unique?

The model combines relatively small size (14B parameters) with exceptional reasoning capabilities and a large 128k context window, making it particularly suitable for resource-constrained commercial applications while maintaining competitive performance against larger models.

Q: What are the recommended use cases?

The model excels in scenarios requiring strong reasoning capabilities, particularly in code generation, mathematical problem-solving, and logical reasoning tasks. It's ideal for applications with memory/compute constraints or latency-sensitive requirements.

The first platform built for prompt engineering