Qwen1.5-110B

Maintained By
Qwen

Qwen1.5-110B

PropertyValue
Parameter Count111B parameters
Model TypeDecoder-only Transformer
Licensetongyi-qianwen
Context Length32K tokens
PaperResearch Paper

What is Qwen1.5-110B?

Qwen1.5-110B is a state-of-the-art language model that represents the beta version of Qwen2. As the largest model in the Qwen1.5 series with 111B parameters, it showcases significant improvements in multilingual capabilities and chat performance. The model features a sophisticated transformer-based architecture with advanced components like SwiGLU activation and a hybrid attention mechanism.

Implementation Details

The model utilizes a decoder-only architecture enhanced with several technical innovations including attention QKV bias and group query attention. It requires transformers>=4.37.0 for proper functionality and implements BF16 tensor type for efficient computation.

  • Advanced transformer architecture with SwiGLU activation
  • Improved tokenizer for multiple natural languages and code
  • Stable 32K context length support
  • No requirement for trust_remote_code

Core Capabilities

  • Multilingual text processing and generation
  • Enhanced chat model performance
  • Extensive context handling (32K tokens)
  • Suitable for post-training applications (SFT, RLHF)

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its massive scale (111B parameters), improved multilingual capabilities, and enhanced chat performance. It's part of a comprehensive series that includes 9 different model sizes, offering flexibility for various applications.

Q: What are the recommended use cases?

The base model is primarily intended for post-training applications such as supervised fine-tuning (SFT), reinforcement learning from human feedback (RLHF), and continued pretraining. Direct text generation using the base model is not recommended.

The first platform built for prompt engineering