Llama2 7B Chat Uncensored GPTQ
Property | Value |
---|---|
Parameter Count | 7 Billion |
Model Type | Llama2 |
License | Other + Meta Llama 2 License |
Quantization | 4-bit GPTQ |
What is llama2_7b_chat_uncensored-GPTQ?
This is a quantized version of George Sung's Llama2 7B Chat Uncensored model, optimized by TheBloke for efficient deployment. The model was fine-tuned using QLoRA on the uncensored Wizard-Vicuna conversation dataset, making it suitable for open-ended dialogue generation without traditional content restrictions.
Implementation Details
The model offers multiple GPTQ quantization variants, each optimized for different hardware configurations and performance requirements. It uses a 4-bit precision base with varying group sizes (32g, 64g, 128g) and includes Act Order optimization options.
- Multiple branch options with different quantization parameters
- Compatible with AutoGPTQ, Transformers, and ExLlama
- Supports sequence lengths up to 4096 tokens
- Includes optimized configurations for different VRAM requirements
Core Capabilities
- Efficient GPU inference with reduced memory footprint
- Maintains model quality while reducing size to ~4GB
- Supports both direct Transformers integration and pipeline usage
- Optimized for chat-based applications with specific prompt template
Frequently Asked Questions
Q: What makes this model unique?
This model combines the capabilities of Llama 2 with uncensored training data, while offering multiple quantization options for efficient deployment. The GPTQ quantization maintains model quality while significantly reducing the resource requirements.
Q: What are the recommended use cases?
The model is best suited for applications requiring open-ended dialogue generation, particularly where traditional content restrictions might be limiting. It's especially useful in scenarios where GPU memory is constrained but model quality needs to be maintained.