Phi-3-mini-128k-instruct

Maintained By
microsoft

Phi-3-mini-128k-instruct

PropertyValue
Parameter Count3.8B
Context Length128,000 tokens
LicenseMIT
ArchitectureDense decoder-only Transformer
Training Data4.9T tokens

What is Phi-3-mini-128k-instruct?

Phi-3-mini-128k-instruct is Microsoft's latest compact yet powerful language model, designed to deliver state-of-the-art performance in a remarkably efficient package. This 3.8B parameter model supports an impressive 128,000 token context window, making it suitable for processing lengthy documents while maintaining high performance on reasoning tasks.

Implementation Details

The model utilizes a dense decoder-only Transformer architecture and has been fine-tuned using both Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO). It's trained on a diverse dataset of 4.9T tokens, including high-quality educational content, synthetic data, and carefully filtered public documents.

  • Optimized for memory and compute-constrained environments
  • Supports Flash Attention 2 for improved performance
  • Includes extensive safety measures and preference alignment
  • Compatible with multiple platforms through ONNX runtime

Core Capabilities

  • Strong performance in reasoning tasks, particularly in code, math, and logic
  • Competitive benchmark scores against larger models (69.7 on MMLU, 85.3 on GSM8K)
  • Extended context handling for long document processing
  • Multi-turn conversation support with chat format
  • Cross-platform deployment options

Frequently Asked Questions

Q: What makes this model unique?

Despite its compact size of 3.8B parameters, Phi-3-mini-128k-instruct achieves performance levels comparable to much larger models, particularly in reasoning tasks. Its 128K token context window and optimization for efficient deployment make it particularly valuable for practical applications.

Q: What are the recommended use cases?

The model excels in scenarios requiring strong reasoning capabilities, including code generation, mathematical problem-solving, and logical analysis. It's particularly well-suited for memory-constrained environments and latency-sensitive applications. The extended context window makes it ideal for long document processing and summarization tasks.

The first platform built for prompt engineering