wizardLM-7B-GPTQ

Maintained By
TheBloke

WizardLM-7B-GPTQ

PropertyValue
Parameter Count1.13B
Model TypeLLaMA
LicenseOther
Quantization4-bit GPTQ

What is wizardLM-7B-GPTQ?

WizardLM-7B-GPTQ is a quantized version of the WizardLM language model, optimized for efficient GPU inference. This implementation provides multiple GPTQ parameter options, allowing users to balance between model quality, VRAM usage, and inference speed. The model supports both AutoGPTQ and GPTQ-for-LLaMA frameworks, making it versatile for different deployment scenarios.

Implementation Details

The model comes in multiple quantization variants, including 4-bit and 8-bit versions with different group sizes (32g, 64g, 128g) and act-order configurations. The main branch offers a 4-bit/128g version that provides optimal compatibility and good inference speed, while other branches offer different trade-offs between quality and resource usage.

  • Multiple GPTQ configurations available through different branches
  • Supports SafeTensors format for secure model loading
  • Compatible with ExLlama for 4-bit variants
  • Includes automatic parameter configuration through quantize_config.json

Core Capabilities

  • Efficient GPU inference with reduced memory footprint
  • Multiple quantization options for different hardware configurations
  • Compatible with popular frameworks including AutoGPTQ and GPTQ-for-LLaMA
  • Supports text generation with customizable parameters

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its variety of quantization options, allowing users to choose the optimal configuration for their specific hardware and use case. The availability of multiple GPTQ parameter permutations makes it highly versatile for different deployment scenarios.

Q: What are the recommended use cases?

The model is ideal for GPU-based inference in scenarios where memory efficiency is crucial. It's particularly well-suited for text generation tasks using the WizardLM prompt template format, and can be easily integrated into applications using popular frameworks like text-generation-webui or Python's transformers library.

The first platform built for prompt engineering