Phi-3-small-128k-instruct

Maintained By
microsoft

Phi-3-small-128k-instruct

PropertyValue
Parameter Count7.39B
Context Length128,000 tokens
LicenseMIT
AuthorMicrosoft

What is Phi-3-small-128k-instruct?

Phi-3-small-128k-instruct is a state-of-the-art lightweight language model developed by Microsoft, designed to deliver exceptional performance in reasoning tasks while maintaining efficiency. This 7B parameter model features an impressive 128K token context window and has been specifically optimized through supervised fine-tuning and direct preference optimization for instruction-following and safety measures.

Implementation Details

The model utilizes a dense decoder-only Transformer architecture with alternating dense and blocksparse attention mechanisms. It was trained on 4.8T tokens including high-quality educational data, synthetic textbook-like content, and carefully filtered public documents.

  • Supports multiple languages with 10% multilingual training data
  • Optimized for memory/compute constrained environments
  • Implements Flash Attention 2 and Triton blocksparse attention
  • Compatible with NVIDIA A100, A6000, and H100 GPUs

Core Capabilities

  • Strong performance in reasoning tasks, especially code, math, and logic
  • Extended context handling up to 128K tokens
  • High accuracy in benchmark tests, outperforming larger models in specific tasks
  • Efficient processing in latency-bound scenarios
  • Robust safety measures through preference optimization

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its ability to match or exceed the performance of larger models while maintaining a relatively small 7B parameter size. It particularly excels in reasoning tasks and offers an exceptional 128K token context window, making it suitable for processing lengthy documents and complex problems.

Q: What are the recommended use cases?

The model is ideal for applications requiring strong reasoning capabilities, particularly in code generation, mathematical problem-solving, and logical reasoning. It's especially suitable for deployment in resource-constrained environments or applications requiring low-latency responses.

The first platform built for prompt engineering