Wizard-Vicuna-30B-Uncensored-GGML

Property	Value
Author	TheBloke
Base Model	Wizard-Vicuna-30B-Uncensored
License	Other
Format	GGML (Various Quantizations)

What is Wizard-Vicuna-30B-Uncensored-GGML?

This is a GGML-quantized version of Eric Hartford's Wizard-Vicuna-30B-Uncensored model, optimized for CPU and GPU inference. The model comes in multiple quantization formats ranging from 2-bit to 8-bit, offering different trade-offs between model size, performance, and accuracy.

Implementation Details

The model is available in various quantization methods including both original llama.cpp methods (q4_0, q4_1, q5_0, q5_1, q8_0) and new k-quant methods (q2_K, q3_K_S, q3_K_M, q3_K_L, q4_K_S, q4_K_M, q5_K_S, q6_K). File sizes range from 13.60GB (q2_K) to 34.56GB (q8_0).

Original llama.cpp quantization methods maintain broad compatibility
New k-quant methods offer improved efficiency but require newer llama.cpp versions
Multiple quantization options for different hardware configurations

Core Capabilities

CPU + GPU inference support via llama.cpp
Compatible with popular interfaces like text-generation-webui and KoboldCpp
Uncensored responses without built-in alignment
Flexible deployment options with various quantization levels

Frequently Asked Questions

Q: What makes this model unique?

This model offers a unique combination of high parameter count (30B) with various quantization options, making it accessible for different hardware configurations while maintaining performance.

Q: What are the recommended use cases?

The model is suitable for general language tasks requiring unrestricted outputs. Users should note that as an uncensored model, it comes without built-in guardrails and requires responsible usage.