Nous-Hermes-Llama2-GPTQ

Maintained By
TheBloke

Nous-Hermes-Llama2-GPTQ

PropertyValue
Parameter Count2.03B
LicenseMIT
ArchitectureLlama2 with GPTQ quantization
AuthorTheBloke

What is Nous-Hermes-Llama2-GPTQ?

Nous-Hermes-Llama2-GPTQ is a quantized version of the Nous-Hermes language model, optimized for efficient inference while maintaining high performance. It uses GPTQ compression to reduce model size and memory requirements while preserving accuracy. The model offers multiple quantization options ranging from 4-bit to 8-bit precision with various group sizes.

Implementation Details

The model implements state-of-the-art quantization techniques with AutoGPTQ compatibility. It features multiple GPTQ parameter permutations, allowing users to choose between different precision levels and memory usage tradeoffs. All quantization variants use a sequence length of 4096 and the WikiText dataset for calibration.

  • Multiple quantization options (4-bit and 8-bit variants)
  • Group size options from 32g to 128g
  • Compatible with ExLlama for 4-bit variants
  • Supports both act-order and non-act-order configurations

Core Capabilities

  • Advanced instruction following with 300,000+ instruction dataset
  • Long-form response generation
  • Lower hallucination rate compared to baseline models
  • High performance on various benchmarks including ARC and HellaSwag
  • Flexible deployment options for different hardware configurations

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its optimized balance between performance and efficiency, offering multiple quantization options to suit different hardware constraints while maintaining high accuracy. It's built on the strong foundation of Nous-Hermes, known for its comprehensive instruction-following capabilities.

Q: What are the recommended use cases?

The model excels in instruction-following tasks, creative text generation, and complex reasoning. It's particularly well-suited for applications requiring efficient deployment while maintaining high-quality outputs, making it ideal for both research and production environments with limited computational resources.

The first platform built for prompt engineering