Wizard-Vicuna-13B-Uncensored-GGUF

Property	Value
Parameter Count	13B
Model Type	LLaMA-based
License	Other
Author	Eric Hartford (Original), TheBloke (Quantization)

What is Wizard-Vicuna-13B-Uncensored-GGUF?

Wizard-Vicuna-13B-Uncensored-GGUF is a specialized variant of the Wizard-Vicuna language model, specifically designed without built-in alignment constraints. This GGUF version offers multiple quantization options ranging from 2-bit to 8-bit precision, making it adaptable for various hardware configurations and use cases.

Implementation Details

The model implements the latest GGUF format, superseding the older GGML standard. It's available in multiple quantization levels, from Q2_K (5.43GB) to Q8_0 (13.83GB), each offering different trade-offs between model size and performance.

Uses Vicuna prompt template for optimal interaction
Supports context length of 2048 tokens
Compatible with llama.cpp and various third-party UIs
Offers GPU acceleration support with layer offloading

Core Capabilities

Unrestricted creative text generation
Flexible deployment options (CPU/GPU)
Multiple quantization options for different hardware constraints
Supports various interfaces including text-generation-webui and LangChain

Frequently Asked Questions

Q: What makes this model unique?

This model is distinguished by its lack of built-in alignment constraints, allowing for more flexible application of custom alignment through RLHF LoRA or other methods. It provides extensive quantization options for different deployment scenarios.

Q: What are the recommended use cases?

The model is suited for applications requiring unrestricted creative responses and scenarios where custom alignment can be implemented separately. The Q4_K_M quantization (7.87GB) is recommended for balanced performance and resource usage.