Qwen1.5-72B
Property | Value |
---|---|
Parameter Count | 72.3B |
Tensor Type | BF16 |
License | tongyi-qianwen |
Research Paper | arXiv:2309.16609 |
What is Qwen1.5-72B?
Qwen1.5-72B is the beta version of Qwen2, representing a significant advancement in transformer-based decoder-only language models. As the largest dense model in the Qwen1.5 series, it showcases impressive capabilities with its 72.3 billion parameters and advanced architecture.
Implementation Details
The model is built on a sophisticated transformer architecture incorporating several key technological improvements, including SwiGLU activation, attention QKV bias, and group query attention. It supports a substantial 32K context length and requires transformers>=4.37.0 for proper implementation.
- Advanced transformer architecture with SwiGLU activation
- Improved tokenizer for multiple natural languages and code
- 32K context length support
- Requires no trust_remote_code
Core Capabilities
- Extensive multilingual support for both base and chat models
- Enhanced performance in chat model variants
- Stable long-context processing
- Versatile application in post-training scenarios
Frequently Asked Questions
Q: What makes this model unique?
This model stands out due to its massive scale (72.3B parameters), improved architecture, and ability to handle 32K context length while maintaining stable performance across multiple languages and tasks.
Q: What are the recommended use cases?
The model is primarily designed for post-training applications such as SFT (Supervised Fine-Tuning), RLHF (Reinforcement Learning from Human Feedback), and continued pretraining. Direct use for text generation is not recommended without additional fine-tuning.