Luna AI Llama2 Uncensored GGML
Property | Value |
---|---|
License | Llama2 |
Model Type | LLaMA Architecture |
Author | TheBloke (Quantized) / Tap (Original) |
Format | GGML (CPU/GPU Optimized) |
What is Luna-AI-Llama2-Uncensored-GGML?
Luna AI Llama2 Uncensored GGML is a specialized conversion of the original Luna AI model, optimized for CPU and GPU inference using the GGML format. This model was fine-tuned on over 40,000 long-form chat discussions and offers various quantization options ranging from 2-bit to 8-bit precision, allowing users to balance performance and resource usage according to their needs.
Implementation Details
The model implements multiple quantization methods, including new k-quant techniques, offering file sizes from 2.87GB (q2_K) to 7.16GB (q8_0). It uses a straightforward User-Assistant prompt format and requires between 5.37GB to 9.66GB of RAM depending on the chosen quantization level.
- Multiple quantization options (q2_K through q8_0)
- GPU layer offloading support
- Optimized for both CPU and GPU inference
- Context window support up to 4096 tokens
Core Capabilities
- Long-form chat interactions
- Flexible deployment options across different hardware configurations
- Uncensored responses while maintaining coherence
- Support for multiple inference frameworks including text-generation-webui and KoboldCpp
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its variety of quantization options and optimization for CPU/GPU deployment, while maintaining the original Luna AI's uncensored conversation capabilities. It's particularly notable for its k-quant methods that offer excellent compression while preserving model quality.
Q: What are the recommended use cases?
The model is ideal for desktop deployment of chat applications, particularly where hardware resources are limited. The various quantization options make it suitable for everything from low-resource environments (using q2_K) to high-quality inference scenarios (using q8_0).