Phi-3-mini-4k-instruct

Maintained By
microsoft

Phi-3-mini-4k-instruct

PropertyValue
Parameter Count3.8B
Context Length4K tokens
LicenseMIT
Training Data4.9T tokens
AuthorMicrosoft

What is Phi-3-mini-4k-instruct?

Phi-3-mini-4k-instruct is Microsoft's lightweight yet powerful language model that represents a significant advancement in efficient AI. As part of the Phi-3 family, this 3.8B parameter model is designed to deliver strong performance in reasoning tasks while maintaining a compact size. The model has undergone extensive training on high-quality datasets and features both supervised fine-tuning and direct preference optimization for enhanced instruction following and safety measures.

Implementation Details

The model architecture is based on a dense decoder-only Transformer, optimized with Flash Attention for improved performance. It supports a 4K token context window and utilizes a vocabulary size of 32,064 tokens. The training process involved 512 H100-80G GPUs over 10 days, processing 4.9T tokens of carefully curated data.

  • Supports both English and French languages
  • Implements chat format with system, user, and assistant roles
  • Optimized for instruction-following and structured output
  • Compatible with Flash Attention 2 for enhanced performance

Core Capabilities

  • Strong performance in math and logical reasoning tasks
  • Excellent results in common sense and language understanding
  • Code generation capabilities, particularly in Python
  • Structured output generation (JSON, XML)
  • Multi-turn conversation support

Frequently Asked Questions

Q: What makes this model unique?

This model achieves remarkable performance metrics comparable to much larger models while maintaining a relatively small parameter count of 3.8B. It particularly excels in reasoning tasks, achieving state-of-the-art performance among models under 13B parameters.

Q: What are the recommended use cases?

The model is ideal for memory/compute constrained environments, latency-bound scenarios, and applications requiring strong reasoning capabilities. It's particularly well-suited for commercial and research applications in English, especially those involving math, logic, and structured data processing.

The first platform built for prompt engineering