WizardLM-7B-GPTQ

Property	Value
Parameter Count	1.13B
Model Type	LLaMA
License	Other
Quantization	4-bit GPTQ

What is wizardLM-7B-GPTQ?

WizardLM-7B-GPTQ is a quantized version of the WizardLM language model, optimized for efficient GPU inference. This implementation provides multiple GPTQ parameter options, allowing users to balance between model quality, VRAM usage, and inference speed. The model supports both AutoGPTQ and GPTQ-for-LLaMA frameworks, making it versatile for different deployment scenarios.

Implementation Details

The model comes in multiple quantization variants, including 4-bit and 8-bit versions with different group sizes (32g, 64g, 128g) and act-order configurations. The main branch offers a 4-bit/128g version that provides optimal compatibility and good inference speed, while other branches offer different trade-offs between quality and resource usage.

Multiple GPTQ configurations available through different branches
Supports SafeTensors format for secure model loading
Compatible with ExLlama for 4-bit variants
Includes automatic parameter configuration through quantize_config.json

Core Capabilities

Efficient GPU inference with reduced memory footprint
Multiple quantization options for different hardware configurations
Compatible with popular frameworks including AutoGPTQ and GPTQ-for-LLaMA
Supports text generation with customizable parameters

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its variety of quantization options, allowing users to choose the optimal configuration for their specific hardware and use case. The availability of multiple GPTQ parameter permutations makes it highly versatile for different deployment scenarios.

Q: What are the recommended use cases?

The model is ideal for GPU-based inference in scenarios where memory efficiency is crucial. It's particularly well-suited for text generation tasks using the WizardLM prompt template format, and can be easily integrated into applications using popular frameworks like text-generation-webui or Python's transformers library.

wizardLM-7B-GPTQ