Phi-3.5-mini-instruct_Uncensored-GGUF

Property	Value
Parameter Count	3.82B
License	Apache 2.0
Format	GGUF
Author	bartowski

What is Phi-3.5-mini-instruct_Uncensored-GGUF?

This is a comprehensive collection of GGUF quantizations of the Phi-3.5-mini-instruct uncensored model, optimized for different hardware configurations and use cases. The model offers various quantization levels ranging from full F16 precision (7.64GB) down to highly compressed versions (1.32GB), making it adaptable to different computing resources.

Implementation Details

The model uses a specific prompt format: "<s><|system|> {system_prompt}<|end|><|user|> {prompt}<|end|><|assistant|><|end|>" and comes in multiple quantization types including Q8_0, Q6_K, Q5_K, Q4_K, and innovative I-quant versions.

Multiple quantization options optimized using imatrix
Specialized versions with Q8_0 embeddings for enhanced quality
Compatible with LM Studio and various inference engines
Advanced compression techniques maintaining model quality

Core Capabilities

Flexible deployment options for different hardware configurations
RAM and VRAM optimization through various quantization levels
Support for both CPU and GPU inference
Compatible with cuBLAS, rocBLAS, and CPU backends

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its comprehensive range of quantization options, allowing users to balance between model size and performance based on their hardware capabilities. The inclusion of both traditional K-quants and newer I-quants provides flexibility for different use cases.

Q: What are the recommended use cases?

For maximum performance, choose a quantization size 1-2GB smaller than your GPU's VRAM. For optimal quality, select a version that fits within your combined system RAM and GPU VRAM. Q5_K_M and Q4_K_M are recommended for most general use cases.