Qwen1.5-110B

Property	Value
Parameter Count	111B parameters
Model Type	Decoder-only Transformer
License	tongyi-qianwen
Context Length	32K tokens
Paper	Research Paper

What is Qwen1.5-110B?

Qwen1.5-110B is a state-of-the-art language model that represents the beta version of Qwen2. As the largest model in the Qwen1.5 series with 111B parameters, it showcases significant improvements in multilingual capabilities and chat performance. The model features a sophisticated transformer-based architecture with advanced components like SwiGLU activation and a hybrid attention mechanism.

Implementation Details

The model utilizes a decoder-only architecture enhanced with several technical innovations including attention QKV bias and group query attention. It requires transformers>=4.37.0 for proper functionality and implements BF16 tensor type for efficient computation.

Advanced transformer architecture with SwiGLU activation
Improved tokenizer for multiple natural languages and code
Stable 32K context length support
No requirement for trust_remote_code

Core Capabilities

Multilingual text processing and generation
Enhanced chat model performance
Extensive context handling (32K tokens)
Suitable for post-training applications (SFT, RLHF)

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its massive scale (111B parameters), improved multilingual capabilities, and enhanced chat performance. It's part of a comprehensive series that includes 9 different model sizes, offering flexibility for various applications.

Q: What are the recommended use cases?

The base model is primarily intended for post-training applications such as supervised fine-tuning (SFT), reinforcement learning from human feedback (RLHF), and continued pretraining. Direct text generation using the base model is not recommended.

Qwen1.5-110B

Qwen1.5-110B

What is Qwen1.5-110B?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models