wizard-vicuna-13B-GPTQ

Maintained By
TheBloke

Wizard-Vicuna-13B-GPTQ

PropertyValue
Parameter Count2.03B (Quantized)
Model TypeLLaMA-based Dialogue Model
Quantization4-bit GPTQ
LicenseOther (LLaMA model terms)

What is wizard-vicuna-13B-GPTQ?

Wizard-Vicuna-13B-GPTQ is a quantized version of the original Wizard-Vicuna model, optimized for efficient deployment while maintaining high-quality dialogue capabilities. This model combines the comprehensive dataset approach of WizardLM with Vicuna's advanced conversational abilities, resulting in approximately 7% performance improvement over standard VicunaLM.

Implementation Details

The model uses 4-bit quantization with a group size of 128 and was trained on the C4 dataset with a sequence length of 2048. It's specifically designed for GPU inference and requires AutoGPTQ 0.4.2 or later for optimal performance.

  • Quantization: 4-bit precision with 128 group size
  • Model Size: 7.26GB after quantization
  • Compatibility: ExLlama, AutoGPTQ, and Text Generation Inference

Core Capabilities

  • Enhanced conversational abilities with multi-round dialogue support
  • Improved context understanding through WizardLM's dataset approach
  • Efficient memory usage through GPTQ quantization
  • Support for various deployment scenarios including text-generation-webui

Frequently Asked Questions

Q: What makes this model unique?

This model uniquely combines WizardLM's sophisticated dataset handling with Vicuna's multi-turn conversation capabilities, all while being optimized through GPTQ quantization for efficient deployment.

Q: What are the recommended use cases?

The model excels in interactive dialogue applications, content generation, and scenarios requiring detailed, helpful responses while maintaining reasonable hardware requirements through quantization.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.