gemma-3-27b-it-GPTQ-4b-128g
Property | Value |
---|---|
Original Model | Gemma-3-27B-IT |
Quantization | INT4 (GPTQ) |
Group Size | 128 |
Author | ISTA-DASLab |
Model URL | HuggingFace Repository |
What is gemma-3-27b-it-GPTQ-4b-128g?
This is a highly optimized version of the Gemma-3-27B-IT model, specifically quantized to reduce its computational footprint while maintaining performance. The model employs GPTQ quantization to compress the weights from 16-bit to 4-bit precision, resulting in approximately 75% reduction in disk space and GPU memory requirements.
Implementation Details
The quantization process specifically targets the linear operators within the language model transformer blocks while preserving the original precision for vision model and multimodal projection components. The implementation uses a symmetric per-group quantization scheme with a group size of 128, optimized through the GPTQ algorithm. The model checkpoint is stored in compressed_tensors format for efficient storage and loading.
- Selective quantization of transformer blocks only
- Preservation of vision and multimodal components in original precision
- Symmetric per-group quantization scheme
- 4-bit precision with 128 group size
Core Capabilities
- Multimodal processing (text and image)
- Reduced memory footprint (75% reduction)
- Maintained model quality despite compression
- Compatible with standard transformers library
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its efficient compression of the Gemma architecture while maintaining multimodal capabilities. The selective quantization approach ensures that critical vision-related components remain at full precision while achieving significant memory savings.
Q: What are the recommended use cases?
The model is ideal for deployment scenarios where GPU memory is limited but full multimodal capabilities are required. It's particularly suitable for applications involving both text and image processing, such as image description, visual question answering, and multimodal chat interactions.