wizard-vicuna-13B-GPTQ

Maintained By
TheBloke

Wizard-Vicuna-13B-GPTQ

PropertyValue
Parameter Count2.03B (Quantized)
Model TypeLLaMA-based Dialogue Model
Quantization4-bit GPTQ
LicenseOther (LLaMA model terms)

What is wizard-vicuna-13B-GPTQ?

Wizard-Vicuna-13B-GPTQ is a quantized version of the original Wizard-Vicuna model, optimized for efficient deployment while maintaining high-quality dialogue capabilities. This model combines the comprehensive dataset approach of WizardLM with Vicuna's advanced conversational abilities, resulting in approximately 7% performance improvement over standard VicunaLM.

Implementation Details

The model uses 4-bit quantization with a group size of 128 and was trained on the C4 dataset with a sequence length of 2048. It's specifically designed for GPU inference and requires AutoGPTQ 0.4.2 or later for optimal performance.

  • Quantization: 4-bit precision with 128 group size
  • Model Size: 7.26GB after quantization
  • Compatibility: ExLlama, AutoGPTQ, and Text Generation Inference

Core Capabilities

  • Enhanced conversational abilities with multi-round dialogue support
  • Improved context understanding through WizardLM's dataset approach
  • Efficient memory usage through GPTQ quantization
  • Support for various deployment scenarios including text-generation-webui

Frequently Asked Questions

Q: What makes this model unique?

This model uniquely combines WizardLM's sophisticated dataset handling with Vicuna's multi-turn conversation capabilities, all while being optimized through GPTQ quantization for efficient deployment.

Q: What are the recommended use cases?

The model excels in interactive dialogue applications, content generation, and scenarios requiring detailed, helpful responses while maintaining reasonable hardware requirements through quantization.

The first platform built for prompt engineering