Wizard-Vicuna-30B-Uncensored-GGML
Property | Value |
---|---|
Author | TheBloke |
Base Model | Wizard-Vicuna-30B-Uncensored |
License | Other |
Format | GGML (Various Quantizations) |
What is Wizard-Vicuna-30B-Uncensored-GGML?
This is a GGML-quantized version of Eric Hartford's Wizard-Vicuna-30B-Uncensored model, optimized for CPU and GPU inference. The model comes in multiple quantization formats ranging from 2-bit to 8-bit, offering different trade-offs between model size, performance, and accuracy.
Implementation Details
The model is available in various quantization methods including both original llama.cpp methods (q4_0, q4_1, q5_0, q5_1, q8_0) and new k-quant methods (q2_K, q3_K_S, q3_K_M, q3_K_L, q4_K_S, q4_K_M, q5_K_S, q6_K). File sizes range from 13.60GB (q2_K) to 34.56GB (q8_0).
- Original llama.cpp quantization methods maintain broad compatibility
- New k-quant methods offer improved efficiency but require newer llama.cpp versions
- Multiple quantization options for different hardware configurations
Core Capabilities
- CPU + GPU inference support via llama.cpp
- Compatible with popular interfaces like text-generation-webui and KoboldCpp
- Uncensored responses without built-in alignment
- Flexible deployment options with various quantization levels
Frequently Asked Questions
Q: What makes this model unique?
This model offers a unique combination of high parameter count (30B) with various quantization options, making it accessible for different hardware configurations while maintaining performance.
Q: What are the recommended use cases?
The model is suitable for general language tasks requiring unrestricted outputs. Users should note that as an uncensored model, it comes without built-in guardrails and requires responsible usage.