Nous-Hermes-13B-GGML

Maintained By
TheBloke

Nous-Hermes-13B-GGML

PropertyValue
LicenseOther
FormatGGML
AuthorTheBloke
Base ModelNous-Hermes-13B

What is Nous-Hermes-13B-GGML?

Nous-Hermes-13B-GGML is a quantized version of the Nous-Hermes-13B model, specifically optimized for CPU and GPU inference using the llama.cpp framework. The model comes in multiple quantization variants ranging from 2-bit to 8-bit, offering different trade-offs between model size, performance, and accuracy.

Implementation Details

The model implements various quantization methods including both original llama.cpp methods (q4_0, q4_1, q5_0, q5_1, q8_0) and new k-quant methods (q2_K, q3_K_S, q3_K_M, q3_K_L, q4_K_S, q4_K_M, q5_K_S, q6_K). File sizes range from 5.43GB to 13.83GB depending on the quantization level.

  • Supports Alpaca prompt format
  • Compatible with multiple UI frameworks including text-generation-webui and KoboldCpp
  • Offers GPU layer offloading capabilities

Core Capabilities

  • Long-form response generation
  • Low hallucination rate
  • Strong performance across various benchmarks (top rankings in ARC-c, ARC-e, Hellaswag)
  • Flexible deployment options for both CPU and GPU inference

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its variety of quantization options that allow users to balance between performance and resource usage. It provides exceptional compatibility with llama.cpp ecosystem while maintaining the strong performance of the base Nous-Hermes-13B model.

Q: What are the recommended use cases?

The model is ideal for local deployment scenarios where GPU resources might be limited. It's particularly well-suited for applications requiring natural language understanding, creative writing, and instruction following, while allowing users to choose the quantization level that best matches their hardware capabilities.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.