Qwen-7B-Chat-Int4

Maintained By
Qwen

Qwen-7B-Chat-Int4

PropertyValue
Parameter Count2.11B parameters
Model TypeQuantized Chat Model
Architecture32 layers, 32 heads, 4096 d_model
LicenseTongyi Qianwen License Agreement
Supported LanguagesChinese, English, Multi-lingual

What is Qwen-7B-Chat-Int4?

Qwen-7B-Chat-Int4 is a 4-bit quantized version of the Qwen-7B-Chat model, designed for efficient deployment while maintaining impressive performance. The model is built on a Transformer architecture and has been trained on diverse datasets including web texts, professional books, and code repositories. This quantized version significantly reduces memory usage while preserving most of the original model's capabilities.

Implementation Details

The model implements advanced technical features including RoPE relative position encoding, SwiGLU activation functions, and RMSNorm. It uses a vocabulary of approximately 150K tokens optimized for Chinese, English, and code, built upon GPT-4's BPE vocabulary base cl100k_base.

  • Architecture: 32 layers, 32 attention heads, 4096 dimension model
  • Context Length: 8192 tokens
  • Memory Usage: 8.21GB for encoding 2048 tokens
  • Inference Speed: 50.09 tokens/s for 2048 tokens with Flash Attention v2

Core Capabilities

  • Strong performance in Chinese (59.7% on C-Eval) and English (55.8% on MMLU) evaluations
  • Code generation capabilities with 37.2% Pass@1 on HumanEval
  • Mathematical reasoning with 50.3% accuracy on GSM8K
  • Tool usage and ReAct prompting support
  • Efficient inference with reduced memory footprint

Frequently Asked Questions

Q: What makes this model unique?

The model combines efficient 4-bit quantization with strong multi-lingual capabilities and tool usage abilities, making it particularly suitable for deployment in resource-constrained environments while maintaining high performance.

Q: What are the recommended use cases?

The model excels in multi-lingual chat applications, code generation, mathematical problem-solving, and tool-augmented tasks. It's particularly suitable for deployment scenarios where memory efficiency is crucial.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.