Qwen2-72B

Maintained By
Qwen

Qwen2-72B

PropertyValue
Parameter Count72.7B
Model TypeDense Transformer
Licensetongyi-qianwen
Tensor TypeBF16

What is Qwen2-72B?

Qwen2-72B is a state-of-the-art dense transformer model that represents the latest advancement in the Qwen series. This base language model demonstrates exceptional performance across various benchmarks, particularly excelling in multilingual tasks, coding, and mathematical reasoning. With 72.7 billion parameters, it achieves impressive scores on key benchmarks like MMLU (84.2%) and GSM8K (89.5%).

Implementation Details

The model is built on an advanced transformer architecture featuring SwiGLU activation, attention QKV bias, and group query attention. It requires transformers>=4.37.0 for proper implementation and uses BF16 tensor type for optimal performance.

  • Advanced tokenizer adaptive to multiple languages and code
  • Dense architecture optimized for performance
  • Supports multiple natural languages and coding tasks
  • Requires latest Hugging Face transformers library

Core Capabilities

  • Exceptional performance in English language tasks (MMLU: 84.2%)
  • Strong coding capabilities (HumanEval: 64.6%, MBPP: 76.9%)
  • Superior mathematical reasoning (GSM8K: 89.5%, MATH: 51.1%)
  • Outstanding multilingual performance (C-Eval: 91.0%, CMMLU: 90.1%)
  • Robust multi-task capabilities across various domains

Frequently Asked Questions

Q: What makes this model unique?

Qwen2-72B stands out for its balanced performance across diverse tasks, particularly excelling in multilingual capabilities and mathematical reasoning. It achieves state-of-the-art results in many benchmarks, surpassing both open-source and some proprietary models.

Q: What are the recommended use cases?

The model is primarily designed as a base language model for further fine-tuning. It's recommended for post-training applications such as SFT, RLHF, or continued pretraining rather than direct text generation. It's particularly suitable for applications requiring strong multilingual understanding, coding, or mathematical reasoning capabilities.

The first platform built for prompt engineering