Luna-AI-Llama2-Uncensored-GPTQ

Property	Value
Parameter Count	1.13B
License	cc-by-sa-4.0
Model Type	Llama2
Quantization	GPTQ

What is Luna-AI-Llama2-Uncensored-GPTQ?

Luna-AI-Llama2-Uncensored-GPTQ is a quantized version of the original Luna AI model, specifically optimized for efficient deployment while maintaining performance. Created by Tap and quantized by TheBloke, this model has been fine-tuned on over 40,000 long-form chat discussions, making it particularly effective for conversational AI applications.

Implementation Details

The model offers multiple quantization options, including 4-bit and 8-bit versions with various group sizes and Act Order configurations. The quantization process utilized the WikiText dataset with a sequence length of 4096 tokens.

Multiple GPTQ parameter options for different hardware requirements
Supports both GPU and CPU inference
Follows the Vicuna 1.1/OpenChat format for prompting
Compatible with ExLlama, AutoGPTQ, and Hugging Face's TGI

Core Capabilities

Benchmark performance: 0.5512 on ARC Challenge
MMLU accuracy: 0.46521
TruthfulQA performance: 0.4716
Flexible deployment options with various quantization parameters

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its uncensored nature and multiple quantization options, allowing users to balance between performance and resource usage. The various GPTQ configurations make it highly adaptable to different hardware setups.

Q: What are the recommended use cases?

The model is particularly well-suited for conversational AI applications, chat-based systems, and scenarios where efficient deployment is crucial. Its uncensored nature makes it suitable for open-domain conversations.