Phi-3.5-mini-instruct_Uncensored-GGUF
Property | Value |
---|---|
Parameter Count | 3.82B |
License | Apache 2.0 |
Format | GGUF |
Author | bartowski |
What is Phi-3.5-mini-instruct_Uncensored-GGUF?
This is a comprehensive collection of GGUF quantizations of the Phi-3.5-mini-instruct uncensored model, optimized for different hardware configurations and use cases. The model offers various quantization levels ranging from full F16 precision (7.64GB) down to highly compressed versions (1.32GB), making it adaptable to different computing resources.
Implementation Details
The model uses a specific prompt format: "<s><|system|> {system_prompt}<|end|><|user|> {prompt}<|end|><|assistant|><|end|>" and comes in multiple quantization types including Q8_0, Q6_K, Q5_K, Q4_K, and innovative I-quant versions.
- Multiple quantization options optimized using imatrix
- Specialized versions with Q8_0 embeddings for enhanced quality
- Compatible with LM Studio and various inference engines
- Advanced compression techniques maintaining model quality
Core Capabilities
- Flexible deployment options for different hardware configurations
- RAM and VRAM optimization through various quantization levels
- Support for both CPU and GPU inference
- Compatible with cuBLAS, rocBLAS, and CPU backends
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its comprehensive range of quantization options, allowing users to balance between model size and performance based on their hardware capabilities. The inclusion of both traditional K-quants and newer I-quants provides flexibility for different use cases.
Q: What are the recommended use cases?
For maximum performance, choose a quantization size 1-2GB smaller than your GPU's VRAM. For optimal quality, select a version that fits within your combined system RAM and GPU VRAM. Q5_K_M and Q4_K_M are recommended for most general use cases.