Luna-AI-Llama2-Uncensored-GPTQ
Property | Value |
---|---|
Parameter Count | 1.13B |
License | cc-by-sa-4.0 |
Model Type | Llama2 |
Quantization | GPTQ |
What is Luna-AI-Llama2-Uncensored-GPTQ?
Luna-AI-Llama2-Uncensored-GPTQ is a quantized version of the original Luna AI model, specifically optimized for efficient deployment while maintaining performance. Created by Tap and quantized by TheBloke, this model has been fine-tuned on over 40,000 long-form chat discussions, making it particularly effective for conversational AI applications.
Implementation Details
The model offers multiple quantization options, including 4-bit and 8-bit versions with various group sizes and Act Order configurations. The quantization process utilized the WikiText dataset with a sequence length of 4096 tokens.
- Multiple GPTQ parameter options for different hardware requirements
- Supports both GPU and CPU inference
- Follows the Vicuna 1.1/OpenChat format for prompting
- Compatible with ExLlama, AutoGPTQ, and Hugging Face's TGI
Core Capabilities
- Benchmark performance: 0.5512 on ARC Challenge
- MMLU accuracy: 0.46521
- TruthfulQA performance: 0.4716
- Flexible deployment options with various quantization parameters
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its uncensored nature and multiple quantization options, allowing users to balance between performance and resource usage. The various GPTQ configurations make it highly adaptable to different hardware setups.
Q: What are the recommended use cases?
The model is particularly well-suited for conversational AI applications, chat-based systems, and scenarios where efficient deployment is crucial. Its uncensored nature makes it suitable for open-domain conversations.