Nous-Hermes-13B-GGML
Property | Value |
---|---|
License | Other |
Format | GGML |
Author | TheBloke |
Base Model | Nous-Hermes-13B |
What is Nous-Hermes-13B-GGML?
Nous-Hermes-13B-GGML is a quantized version of the Nous-Hermes-13B model, specifically optimized for CPU and GPU inference using the llama.cpp framework. The model comes in multiple quantization variants ranging from 2-bit to 8-bit, offering different trade-offs between model size, performance, and accuracy.
Implementation Details
The model implements various quantization methods including both original llama.cpp methods (q4_0, q4_1, q5_0, q5_1, q8_0) and new k-quant methods (q2_K, q3_K_S, q3_K_M, q3_K_L, q4_K_S, q4_K_M, q5_K_S, q6_K). File sizes range from 5.43GB to 13.83GB depending on the quantization level.
- Supports Alpaca prompt format
- Compatible with multiple UI frameworks including text-generation-webui and KoboldCpp
- Offers GPU layer offloading capabilities
Core Capabilities
- Long-form response generation
- Low hallucination rate
- Strong performance across various benchmarks (top rankings in ARC-c, ARC-e, Hellaswag)
- Flexible deployment options for both CPU and GPU inference
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its variety of quantization options that allow users to balance between performance and resource usage. It provides exceptional compatibility with llama.cpp ecosystem while maintaining the strong performance of the base Nous-Hermes-13B model.
Q: What are the recommended use cases?
The model is ideal for local deployment scenarios where GPU resources might be limited. It's particularly well-suited for applications requiring natural language understanding, creative writing, and instruction following, while allowing users to choose the quantization level that best matches their hardware capabilities.