Phi-3-mini-4k-instruct
Property | Value |
---|---|
Parameter Count | 3.8B |
Context Length | 4K tokens |
License | MIT |
Training Data | 4.9T tokens |
Author | Microsoft |
What is Phi-3-mini-4k-instruct?
Phi-3-mini-4k-instruct is Microsoft's lightweight yet powerful language model that represents a significant advancement in efficient AI. As part of the Phi-3 family, this 3.8B parameter model is designed to deliver strong performance in reasoning tasks while maintaining a compact size. The model has undergone extensive training on high-quality datasets and features both supervised fine-tuning and direct preference optimization for enhanced instruction following and safety measures.
Implementation Details
The model architecture is based on a dense decoder-only Transformer, optimized with Flash Attention for improved performance. It supports a 4K token context window and utilizes a vocabulary size of 32,064 tokens. The training process involved 512 H100-80G GPUs over 10 days, processing 4.9T tokens of carefully curated data.
- Supports both English and French languages
- Implements chat format with system, user, and assistant roles
- Optimized for instruction-following and structured output
- Compatible with Flash Attention 2 for enhanced performance
Core Capabilities
- Strong performance in math and logical reasoning tasks
- Excellent results in common sense and language understanding
- Code generation capabilities, particularly in Python
- Structured output generation (JSON, XML)
- Multi-turn conversation support
Frequently Asked Questions
Q: What makes this model unique?
This model achieves remarkable performance metrics comparable to much larger models while maintaining a relatively small parameter count of 3.8B. It particularly excels in reasoning tasks, achieving state-of-the-art performance among models under 13B parameters.
Q: What are the recommended use cases?
The model is ideal for memory/compute constrained environments, latency-bound scenarios, and applications requiring strong reasoning capabilities. It's particularly well-suited for commercial and research applications in English, especially those involving math, logic, and structured data processing.