Wizard-Vicuna-13B-Uncensored-GGUF
Property | Value |
---|---|
Parameter Count | 13B |
Model Type | LLaMA-based |
License | Other |
Author | Eric Hartford (Original), TheBloke (Quantization) |
What is Wizard-Vicuna-13B-Uncensored-GGUF?
Wizard-Vicuna-13B-Uncensored-GGUF is a specialized variant of the Wizard-Vicuna language model, specifically designed without built-in alignment constraints. This GGUF version offers multiple quantization options ranging from 2-bit to 8-bit precision, making it adaptable for various hardware configurations and use cases.
Implementation Details
The model implements the latest GGUF format, superseding the older GGML standard. It's available in multiple quantization levels, from Q2_K (5.43GB) to Q8_0 (13.83GB), each offering different trade-offs between model size and performance.
- Uses Vicuna prompt template for optimal interaction
- Supports context length of 2048 tokens
- Compatible with llama.cpp and various third-party UIs
- Offers GPU acceleration support with layer offloading
Core Capabilities
- Unrestricted creative text generation
- Flexible deployment options (CPU/GPU)
- Multiple quantization options for different hardware constraints
- Supports various interfaces including text-generation-webui and LangChain
Frequently Asked Questions
Q: What makes this model unique?
This model is distinguished by its lack of built-in alignment constraints, allowing for more flexible application of custom alignment through RLHF LoRA or other methods. It provides extensive quantization options for different deployment scenarios.
Q: What are the recommended use cases?
The model is suited for applications requiring unrestricted creative responses and scenarios where custom alignment can be implemented separately. The Q4_K_M quantization (7.87GB) is recommended for balanced performance and resource usage.