Qwen2.5-72B-Instruct-GPTQ-Int4

Maintained By
Qwen

Qwen2.5-72B-Instruct-GPTQ-Int4

PropertyValue
Parameter Count72.7B (70.0B Non-Embedding)
Model TypeCausal Language Model
ArchitectureTransformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
LicenseQwen License
Context Length131,072 tokens
QuantizationGPTQ 4-bit

What is Qwen2.5-72B-Instruct-GPTQ-Int4?

Qwen2.5-72B-Instruct-GPTQ-Int4 is a quantized version of the latest Qwen2.5 series large language model, optimized for efficient deployment while maintaining high performance. This model represents a significant advancement in the Qwen series, featuring GPTQ 4-bit quantization for reduced memory footprint while preserving the model's sophisticated capabilities.

Implementation Details

The model is built on an advanced transformer architecture, featuring 80 layers and 64 attention heads for queries with 8 for key-value pairs. It implements the latest architecture improvements including RoPE, SwiGLU, and RMSNorm, optimized for both performance and efficiency.

  • Advanced GQA (Grouped Query Attention) implementation
  • YaRN-powered context length extension up to 131,072 tokens
  • 4-bit precision quantization using GPTQ
  • Support for generating up to 8,192 tokens

Core Capabilities

  • Enhanced knowledge base and improved capabilities in coding and mathematics
  • Superior instruction following and long-text generation
  • Structured data understanding and JSON output generation
  • Multi-lingual support for 29+ languages
  • Improved role-play implementation and chatbot condition-setting

Frequently Asked Questions

Q: What makes this model unique?

The model combines massive scale (72B parameters) with efficient 4-bit quantization, while supporting an exceptionally long context window of 128K tokens. It's particularly notable for its improved capabilities in specialized domains like coding and mathematics, along with enhanced multilingual support.

Q: What are the recommended use cases?

This model is ideal for applications requiring sophisticated language understanding and generation, including code development, mathematical problem-solving, multilingual applications, and long-form content generation. It's particularly well-suited for deployment scenarios where efficiency is crucial but high performance must be maintained.

The first platform built for prompt engineering