Qwen1.5-72B

Property	Value
Parameter Count	72.3B
Tensor Type	BF16
License	tongyi-qianwen
Research Paper	arXiv:2309.16609

What is Qwen1.5-72B?

Qwen1.5-72B is the beta version of Qwen2, representing a significant advancement in transformer-based decoder-only language models. As the largest dense model in the Qwen1.5 series, it showcases impressive capabilities with its 72.3 billion parameters and advanced architecture.

Implementation Details

The model is built on a sophisticated transformer architecture incorporating several key technological improvements, including SwiGLU activation, attention QKV bias, and group query attention. It supports a substantial 32K context length and requires transformers>=4.37.0 for proper implementation.

Advanced transformer architecture with SwiGLU activation
Improved tokenizer for multiple natural languages and code
32K context length support
Requires no trust_remote_code

Core Capabilities

Extensive multilingual support for both base and chat models
Enhanced performance in chat model variants
Stable long-context processing
Versatile application in post-training scenarios

Frequently Asked Questions

Q: What makes this model unique?

This model stands out due to its massive scale (72.3B parameters), improved architecture, and ability to handle 32K context length while maintaining stable performance across multiple languages and tasks.

Q: What are the recommended use cases?

The model is primarily designed for post-training applications such as SFT (Supervised Fine-Tuning), RLHF (Reinforcement Learning from Human Feedback), and continued pretraining. Direct use for text generation is not recommended without additional fine-tuning.

Qwen1.5-72B

Qwen1.5-72B

What is Qwen1.5-72B?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models