Gryphe Pantheon-RP 1.8 24B Small
Property | Value |
---|---|
Base Model Size | 24B Parameters |
Original Author | Gryphe |
Quantization Author | bartowski |
Model Hub | Hugging Face |
Format | GGUF (LLaMA.cpp compatible) |
What is Gryphe_Pantheon-RP-1.8-24b-Small-3.1-GGUF?
This is a comprehensive collection of GGUF quantized versions of the Pantheon-RP model, designed to make a powerful 24B parameter language model accessible across different hardware configurations. The quantizations range from full BF16 precision (47.15GB) down to highly compressed IQ2_XS (7.21GB), offering various trade-offs between model size and performance.
Implementation Details
The model uses llama.cpp's imatrix quantization technology, providing multiple quantization options optimized for different use cases. It implements a specific prompt format using system and user markers, and includes special optimizations for embed/output weights in certain variants.
- Multiple quantization levels (Q2 to Q8) with different size-quality trade-offs
- Special IQ (Integer Quantization) variants for enhanced performance
- Optimized versions for ARM and AVX architectures
- Enhanced embed/output weight handling in specific variants
Core Capabilities
- Flexible deployment options from 7GB to 47GB model sizes
- Optimized performance on both CPU and GPU configurations
- Support for online weight repacking on compatible hardware
- Special quantizations for ARM and AVX systems
- Compatible with LM Studio and other llama.cpp-based projects
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its comprehensive range of quantization options, allowing users to choose the perfect balance between model size and performance for their specific hardware constraints. The implementation of both K-quants and I-quants provides flexibility for different use cases and hardware configurations.
Q: What are the recommended use cases?
For maximum performance, choose a quantization size 1-2GB smaller than your GPU's VRAM. For optimal quality, select a variant that fits within your combined system RAM and GPU VRAM. Q4_K_M is recommended as the default choice for most use cases, while Q6_K_L offers near-perfect quality for users with more available memory.