Luna AI Llama2 Uncensored GGML

Property	Value
License	Llama2
Model Type	LLaMA Architecture
Author	TheBloke (Quantized) / Tap (Original)
Format	GGML (CPU/GPU Optimized)

What is Luna-AI-Llama2-Uncensored-GGML?

Luna AI Llama2 Uncensored GGML is a specialized conversion of the original Luna AI model, optimized for CPU and GPU inference using the GGML format. This model was fine-tuned on over 40,000 long-form chat discussions and offers various quantization options ranging from 2-bit to 8-bit precision, allowing users to balance performance and resource usage according to their needs.

Implementation Details

The model implements multiple quantization methods, including new k-quant techniques, offering file sizes from 2.87GB (q2_K) to 7.16GB (q8_0). It uses a straightforward User-Assistant prompt format and requires between 5.37GB to 9.66GB of RAM depending on the chosen quantization level.

Multiple quantization options (q2_K through q8_0)
GPU layer offloading support
Optimized for both CPU and GPU inference
Context window support up to 4096 tokens

Core Capabilities

Long-form chat interactions
Flexible deployment options across different hardware configurations
Uncensored responses while maintaining coherence
Support for multiple inference frameworks including text-generation-webui and KoboldCpp

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its variety of quantization options and optimization for CPU/GPU deployment, while maintaining the original Luna AI's uncensored conversation capabilities. It's particularly notable for its k-quant methods that offer excellent compression while preserving model quality.

Q: What are the recommended use cases?

The model is ideal for desktop deployment of chat applications, particularly where hardware resources are limited. The various quantization options make it suitable for everything from low-resource environments (using q2_K) to high-quality inference scenarios (using q8_0).