Wizard-Vicuna-30B-Uncensored-GGML

Maintained By
TheBloke

Wizard-Vicuna-30B-Uncensored-GGML

PropertyValue
AuthorTheBloke
Base ModelWizard-Vicuna-30B-Uncensored
LicenseOther
FormatGGML (Various Quantizations)

What is Wizard-Vicuna-30B-Uncensored-GGML?

This is a GGML-quantized version of Eric Hartford's Wizard-Vicuna-30B-Uncensored model, optimized for CPU and GPU inference. The model comes in multiple quantization formats ranging from 2-bit to 8-bit, offering different trade-offs between model size, performance, and accuracy.

Implementation Details

The model is available in various quantization methods including both original llama.cpp methods (q4_0, q4_1, q5_0, q5_1, q8_0) and new k-quant methods (q2_K, q3_K_S, q3_K_M, q3_K_L, q4_K_S, q4_K_M, q5_K_S, q6_K). File sizes range from 13.60GB (q2_K) to 34.56GB (q8_0).

  • Original llama.cpp quantization methods maintain broad compatibility
  • New k-quant methods offer improved efficiency but require newer llama.cpp versions
  • Multiple quantization options for different hardware configurations

Core Capabilities

  • CPU + GPU inference support via llama.cpp
  • Compatible with popular interfaces like text-generation-webui and KoboldCpp
  • Uncensored responses without built-in alignment
  • Flexible deployment options with various quantization levels

Frequently Asked Questions

Q: What makes this model unique?

This model offers a unique combination of high parameter count (30B) with various quantization options, making it accessible for different hardware configurations while maintaining performance.

Q: What are the recommended use cases?

The model is suitable for general language tasks requiring unrestricted outputs. Users should note that as an uncensored model, it comes without built-in guardrails and requires responsible usage.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.