Phi-3.5-mini-instruct

Maintained By
microsoft

Phi-3.5-mini-instruct

PropertyValue
Parameter Count3.82B
Context Length128K tokens
LicenseMIT
PaperTechnical Report
Supported Languages23 languages including English, Chinese, Arabic, German, etc.

What is Phi-3.5-mini-instruct?

Phi-3.5-mini-instruct is a lightweight, state-of-the-art language model that achieves remarkable performance despite its compact size of 3.82B parameters. Built upon the datasets used for Phi-3, it focuses on high-quality, reasoning-dense data and supports an impressive 128K token context length.

Implementation Details

The model leverages a decoder-only Transformer architecture and has undergone comprehensive enhancement through supervised fine-tuning, proximal policy optimization, and direct preference optimization. It requires specific GPU hardware for optimal performance, being tested on NVIDIA A100, A6000, and H100.

  • Training involved 3.4T tokens across multiple data sources
  • Supports flash attention for improved performance
  • Implements robust safety measures and instruction adherence

Core Capabilities

  • Multilingual support across 23 languages with competitive performance
  • Strong performance in reasoning tasks, particularly in code, math, and logic
  • Long-context understanding with 128K token support
  • Efficient operation in memory/compute constrained environments

Frequently Asked Questions

Q: What makes this model unique?

The model's ability to achieve performance comparable to much larger models (7B-12B parameters) while maintaining a compact size of 3.82B parameters makes it unique. It also offers extensive multilingual capabilities and long context support, making it versatile for various applications.

Q: What are the recommended use cases?

The model is ideal for scenarios requiring: 1) Memory/compute constrained environments, 2) Latency-sensitive applications, 3) Strong reasoning capabilities in code and math, and 4) Multilingual support. It's particularly suitable for commercial and research applications needing efficient language processing.

The first platform built for prompt engineering