Qwen-72B-Chat
Property | Value |
---|---|
Parameter Count | 72.3B |
Context Length | 32,768 tokens |
License | Tongyi Qianwen License |
Paper | Technical Report |
What is Qwen-72B-Chat?
Qwen-72B-Chat is a large language model developed by Alibaba Cloud, featuring 72.3 billion parameters and trained on over 3 trillion tokens. It's designed as a versatile AI assistant supporting multiple languages, particularly excelling in Chinese and English tasks, with strong capabilities in code generation and mathematical reasoning.
Implementation Details
The model is built on a Transformer architecture with 80 layers, 64 attention heads, and a model dimension of 8192. It implements modern architectural choices including RoPE positional encoding, SwiGLU activation functions, and RMSNorm. The tokenizer utilizes a comprehensive 151,851-token vocabulary optimized for multilingual processing.
- Supports multiple precision options: BF16, Int8, and Int4 quantization
- Requires minimum 144GB GPU memory for BF16/FP16 or 48GB for Int4
- Compatible with both Hugging Face Transformers and vLLM deployment
Core Capabilities
- Achieves 80.1% accuracy on C-Eval and 74.3% on MMLU (zero-shot)
- 64.6% pass rate on HumanEval coding tasks
- 76.4% accuracy on GSM8K mathematical reasoning
- Handles 32k context length with strong performance on long-context tasks
- Supports system prompts for role-playing and task customization
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its comprehensive multilingual vocabulary, extensive training data (3+ trillion tokens), and strong performance across diverse tasks while maintaining efficient deployment options through quantization.
Q: What are the recommended use cases?
Qwen-72B-Chat excels in multilingual conversations, complex reasoning, code generation, and mathematical problem-solving. It's particularly suitable for applications requiring long context understanding and detailed technical discussions.