Gryphe_Pantheon-RP-1.8-24b-Small-3.1-GGUF

Maintained By
bartowski

Gryphe Pantheon-RP 1.8 24B Small

PropertyValue
Base Model Size24B Parameters
Original AuthorGryphe
Quantization Authorbartowski
Model HubHugging Face
FormatGGUF (LLaMA.cpp compatible)

What is Gryphe_Pantheon-RP-1.8-24b-Small-3.1-GGUF?

This is a comprehensive collection of GGUF quantized versions of the Pantheon-RP model, designed to make a powerful 24B parameter language model accessible across different hardware configurations. The quantizations range from full BF16 precision (47.15GB) down to highly compressed IQ2_XS (7.21GB), offering various trade-offs between model size and performance.

Implementation Details

The model uses llama.cpp's imatrix quantization technology, providing multiple quantization options optimized for different use cases. It implements a specific prompt format using system and user markers, and includes special optimizations for embed/output weights in certain variants.

  • Multiple quantization levels (Q2 to Q8) with different size-quality trade-offs
  • Special IQ (Integer Quantization) variants for enhanced performance
  • Optimized versions for ARM and AVX architectures
  • Enhanced embed/output weight handling in specific variants

Core Capabilities

  • Flexible deployment options from 7GB to 47GB model sizes
  • Optimized performance on both CPU and GPU configurations
  • Support for online weight repacking on compatible hardware
  • Special quantizations for ARM and AVX systems
  • Compatible with LM Studio and other llama.cpp-based projects

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its comprehensive range of quantization options, allowing users to choose the perfect balance between model size and performance for their specific hardware constraints. The implementation of both K-quants and I-quants provides flexibility for different use cases and hardware configurations.

Q: What are the recommended use cases?

For maximum performance, choose a quantization size 1-2GB smaller than your GPU's VRAM. For optimal quality, select a variant that fits within your combined system RAM and GPU VRAM. Q4_K_M is recommended as the default choice for most use cases, while Q6_K_L offers near-perfect quality for users with more available memory.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.