Qwen2-72B-Instruct

Maintained By
Qwen

Qwen2-72B-Instruct

PropertyValue
Parameter Count72.7B
Context Length131,072 tokens
Licensetongyi-qianwen
PaperYARN Paper
Tensor TypeBF16

What is Qwen2-72B-Instruct?

Qwen2-72B-Instruct is a state-of-the-art instruction-tuned language model that represents the latest advancement in the Qwen series. This model combines massive scale with sophisticated engineering, featuring 72.7 billion parameters and an impressive context window of 131,072 tokens. It's built on an enhanced Transformer architecture incorporating SwiGLU activation, attention QKV bias, and group query attention mechanisms.

Implementation Details

The model leverages advanced techniques including YARN for handling long contexts, and requires transformers>=4.37.0 for deployment. It demonstrates exceptional performance across various benchmarks, particularly in language understanding, coding, and mathematical reasoning tasks.

  • Enhanced tokenizer optimized for multiple languages and code
  • Supports extensive input processing through YARN implementation
  • Trained using supervised finetuning and direct preference optimization
  • Deployable through vLLM for production environments

Core Capabilities

  • Superior performance in MMLU (82.3%) and MMLU-Pro (64.4%)
  • Exceptional coding capabilities with 86% accuracy on HumanEval
  • Strong mathematical reasoning with 91.1% accuracy on GSM8K
  • Advanced multilingual support with 83.8% on C-Eval for Chinese
  • High-quality conversational abilities scored 9.12 on MT-Bench

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its combination of massive scale (72.7B parameters) and advanced architecture features, particularly its ability to handle extremely long contexts of up to 131K tokens using YARN technology. It shows superior performance across diverse tasks, often outperforming both open-source and proprietary models.

Q: What are the recommended use cases?

The model excels in various applications including complex reasoning tasks, coding assignments, mathematical problem-solving, and multilingual content generation. It's particularly well-suited for applications requiring long-context understanding and generation, making it ideal for document analysis, technical documentation, and sophisticated conversation systems.

The first platform built for prompt engineering