Qwen-72B-Chat

Property	Value
Parameter Count	72.3B
Context Length	32,768 tokens
License	Tongyi Qianwen License
Paper	Technical Report

What is Qwen-72B-Chat?

Qwen-72B-Chat is a large language model developed by Alibaba Cloud, featuring 72.3 billion parameters and trained on over 3 trillion tokens. It's designed as a versatile AI assistant supporting multiple languages, particularly excelling in Chinese and English tasks, with strong capabilities in code generation and mathematical reasoning.

Implementation Details

The model is built on a Transformer architecture with 80 layers, 64 attention heads, and a model dimension of 8192. It implements modern architectural choices including RoPE positional encoding, SwiGLU activation functions, and RMSNorm. The tokenizer utilizes a comprehensive 151,851-token vocabulary optimized for multilingual processing.

Supports multiple precision options: BF16, Int8, and Int4 quantization
Requires minimum 144GB GPU memory for BF16/FP16 or 48GB for Int4
Compatible with both Hugging Face Transformers and vLLM deployment

Core Capabilities

Achieves 80.1% accuracy on C-Eval and 74.3% on MMLU (zero-shot)
64.6% pass rate on HumanEval coding tasks
76.4% accuracy on GSM8K mathematical reasoning
Handles 32k context length with strong performance on long-context tasks
Supports system prompts for role-playing and task customization

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its comprehensive multilingual vocabulary, extensive training data (3+ trillion tokens), and strong performance across diverse tasks while maintaining efficient deployment options through quantization.

Q: What are the recommended use cases?

Qwen-72B-Chat excels in multilingual conversations, complex reasoning, code generation, and mathematical problem-solving. It's particularly suitable for applications requiring long context understanding and detailed technical discussions.

Qwen-72B-Chat

Qwen-72B-Chat

What is Qwen-72B-Chat?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models

The first platform built for prompt engineering