Wizard-Vicuna-13B-Uncensored-SuperHOT-8K-GPTQ

Maintained By
TheBloke

Wizard-Vicuna-13B-Uncensored-SuperHOT-8K-GPTQ

PropertyValue
Parameter Count2.03B
Model TypeGPTQ-Quantized LLM
LicenseOther
Context LengthUp to 8K tokens
Quantization4-bit precision

What is Wizard-Vicuna-13B-Uncensored-SuperHOT-8K-GPTQ?

This model represents a significant advancement in large language model deployment, combining the Wizard-Vicuna 13B architecture with SuperHOT 8K technology. It's a GPTQ-quantized version that maintains high performance while reducing hardware requirements through 4-bit precision. The model stands out for its extended context length of up to 8K tokens and uncensored training approach.

Implementation Details

The model utilizes GPTQ quantization with a group size of 128 for improved inference accuracy. It's compatible with ExLlama and AutoGPTQ, featuring specialized improvements for extended context handling through trust_remote_code implementation. The model can be easily deployed using text-generation-webui or through Python code with AutoGPTQ.

  • 4-bit quantization with 128 group size
  • Supports context lengths of 4096 or 8192 tokens
  • Compatible with ExLlama and AutoGPTQ frameworks
  • No activation order optimization for better compatibility

Core Capabilities

  • Extended context processing up to 8K tokens
  • Uncensored text generation without built-in alignment
  • Efficient GPU inference with reduced memory footprint
  • Flexible deployment options through various frameworks
  • Improved accuracy through optimized quantization parameters

Frequently Asked Questions

Q: What makes this model unique?

This model uniquely combines three key features: extended context length (8K), uncensored training, and efficient 4-bit quantization, making it particularly suitable for applications requiring long-context understanding without built-in restrictions.

Q: What are the recommended use cases?

The model is ideal for applications requiring extended context processing, creative writing, and general text generation tasks. However, users should note that as an uncensored model, it requires responsible implementation of custom alignment or filtering systems for production use.

The first platform built for prompt engineering