Phi-3-small-8k-instruct

Maintained By
microsoft

Phi-3-small-8k-instruct

PropertyValue
Parameter Count7.39B
Context Length8K tokens
LicenseMIT
AuthorMicrosoft
Training Data4.8T tokens

What is Phi-3-small-8k-instruct?

Phi-3-small-8k-instruct is a state-of-the-art lightweight language model developed by Microsoft, designed for efficient reasoning and instruction following. This 7B parameter model represents a careful balance between model size and performance, trained on a curated dataset of 4.8T tokens with a focus on high-quality and reasoning-dense properties.

Implementation Details

The model utilizes a dense decoder-only Transformer architecture with alternating dense and blocksparse attentions. It has undergone both supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) to ensure alignment with human preferences and safety guidelines.

  • Supports 8K token context window
  • Implements Flash Attention 2 and Triton blocksparse attention
  • Optimized for NVIDIA A100, A6000, and H100 GPUs
  • Vocabulary size of 100,352 tokens

Core Capabilities

  • Strong reasoning performance in code, math, and logic tasks
  • Efficient operation in memory/compute constrained environments
  • Multilingual support with 10% of training data being multilingual
  • Optimized for instruction-following and chat-based interactions

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its exceptional performance-to-size ratio, achieving state-of-the-art results among similar-sized models in benchmarks testing common sense, language understanding, math, and code generation. It's particularly notable for matching or exceeding the performance of larger models in reasoning tasks.

Q: What are the recommended use cases?

The model is ideal for general-purpose AI systems requiring low latency, especially in scenarios involving reasoning tasks, code generation, and mathematical problem-solving. It's particularly well-suited for commercial and research applications where computational resources are constrained.

The first platform built for prompt engineering