Qwen1.5-110B
Property | Value |
---|---|
Parameter Count | 111B parameters |
Model Type | Decoder-only Transformer |
License | tongyi-qianwen |
Context Length | 32K tokens |
Paper | Research Paper |
What is Qwen1.5-110B?
Qwen1.5-110B is a state-of-the-art language model that represents the beta version of Qwen2. As the largest model in the Qwen1.5 series with 111B parameters, it showcases significant improvements in multilingual capabilities and chat performance. The model features a sophisticated transformer-based architecture with advanced components like SwiGLU activation and a hybrid attention mechanism.
Implementation Details
The model utilizes a decoder-only architecture enhanced with several technical innovations including attention QKV bias and group query attention. It requires transformers>=4.37.0 for proper functionality and implements BF16 tensor type for efficient computation.
- Advanced transformer architecture with SwiGLU activation
- Improved tokenizer for multiple natural languages and code
- Stable 32K context length support
- No requirement for trust_remote_code
Core Capabilities
- Multilingual text processing and generation
- Enhanced chat model performance
- Extensive context handling (32K tokens)
- Suitable for post-training applications (SFT, RLHF)
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its massive scale (111B parameters), improved multilingual capabilities, and enhanced chat performance. It's part of a comprehensive series that includes 9 different model sizes, offering flexibility for various applications.
Q: What are the recommended use cases?
The base model is primarily intended for post-training applications such as supervised fine-tuning (SFT), reinforcement learning from human feedback (RLHF), and continued pretraining. Direct text generation using the base model is not recommended.