Luna-AI-Llama2-Uncensored-GPTQ

Maintained By
TheBloke

Luna-AI-Llama2-Uncensored-GPTQ

PropertyValue
Parameter Count1.13B
Licensecc-by-sa-4.0
Model TypeLlama2
QuantizationGPTQ

What is Luna-AI-Llama2-Uncensored-GPTQ?

Luna-AI-Llama2-Uncensored-GPTQ is a quantized version of the original Luna AI model, specifically optimized for efficient deployment while maintaining performance. Created by Tap and quantized by TheBloke, this model has been fine-tuned on over 40,000 long-form chat discussions, making it particularly effective for conversational AI applications.

Implementation Details

The model offers multiple quantization options, including 4-bit and 8-bit versions with various group sizes and Act Order configurations. The quantization process utilized the WikiText dataset with a sequence length of 4096 tokens.

  • Multiple GPTQ parameter options for different hardware requirements
  • Supports both GPU and CPU inference
  • Follows the Vicuna 1.1/OpenChat format for prompting
  • Compatible with ExLlama, AutoGPTQ, and Hugging Face's TGI

Core Capabilities

  • Benchmark performance: 0.5512 on ARC Challenge
  • MMLU accuracy: 0.46521
  • TruthfulQA performance: 0.4716
  • Flexible deployment options with various quantization parameters

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its uncensored nature and multiple quantization options, allowing users to balance between performance and resource usage. The various GPTQ configurations make it highly adaptable to different hardware setups.

Q: What are the recommended use cases?

The model is particularly well-suited for conversational AI applications, chat-based systems, and scenarios where efficient deployment is crucial. Its uncensored nature makes it suitable for open-domain conversations.

The first platform built for prompt engineering