Gryphe Pantheon-RP 1.8 24B Small

Property	Value
Base Model Size	24B Parameters
Original Author	Gryphe
Quantization Author	bartowski
Model Hub	Hugging Face
Format	GGUF (LLaMA.cpp compatible)

What is Gryphe_Pantheon-RP-1.8-24b-Small-3.1-GGUF?

This is a comprehensive collection of GGUF quantized versions of the Pantheon-RP model, designed to make a powerful 24B parameter language model accessible across different hardware configurations. The quantizations range from full BF16 precision (47.15GB) down to highly compressed IQ2_XS (7.21GB), offering various trade-offs between model size and performance.

Implementation Details

The model uses llama.cpp's imatrix quantization technology, providing multiple quantization options optimized for different use cases. It implements a specific prompt format using system and user markers, and includes special optimizations for embed/output weights in certain variants.

Multiple quantization levels (Q2 to Q8) with different size-quality trade-offs
Special IQ (Integer Quantization) variants for enhanced performance
Optimized versions for ARM and AVX architectures
Enhanced embed/output weight handling in specific variants

Core Capabilities

Flexible deployment options from 7GB to 47GB model sizes
Optimized performance on both CPU and GPU configurations
Support for online weight repacking on compatible hardware
Special quantizations for ARM and AVX systems
Compatible with LM Studio and other llama.cpp-based projects

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its comprehensive range of quantization options, allowing users to choose the perfect balance between model size and performance for their specific hardware constraints. The implementation of both K-quants and I-quants provides flexibility for different use cases and hardware configurations.

Q: What are the recommended use cases?

For maximum performance, choose a quantization size 1-2GB smaller than your GPU's VRAM. For optimal quality, select a variant that fits within your combined system RAM and GPU VRAM. Q4_K_M is recommended as the default choice for most use cases, while Q6_K_L offers near-perfect quality for users with more available memory.