Nous-Hermes-13B-GPTQ

Property	Value
Parameter Count	2.03B (Quantized)
Model Type	GPTQ 4-bit Quantized LLaMA
License	Other
Language	English

What is Nous-Hermes-13B-GPTQ?

Nous-Hermes-13B-GPTQ is a highly optimized 4-bit quantized version of the Nous-Hermes-13B language model, designed for efficient GPU inference. Originally fine-tuned on over 300,000 instructions, this model rivals GPT-3.5-turbo in performance while offering unique advantages like longer responses and lower hallucination rates.

Implementation Details

The model utilizes GPTQ quantization with a groupsize of 128 and features automatic parameter configuration through quantize_config.json. It's implemented using the AutoGPTQ framework and supports both Triton and CUDA execution paths.

4-bit precision quantization for optimal performance
Trained on synthetic GPT-4 outputs and specialized datasets
Supports text-generation-inference capabilities
Uses Alpaca prompt format for instruction following

Core Capabilities

Long-form response generation
Low hallucination rate compared to similar models
Comprehensive instruction following
Code generation and analysis
Scientific and technical content generation

Frequently Asked Questions

Q: What makes this model unique?

The model combines efficient 4-bit quantization with high-quality instruction following capabilities, trained on a diverse dataset including GPTeacher, CodeAlpaca, and specialized scientific datasets. It's particularly notable for generating longer, more accurate responses without conventional censorship mechanisms.

Q: What are the recommended use cases?

The model excels in tasks requiring detailed responses, code generation, scientific content creation, and general instruction following. It's particularly suitable for applications needing efficient GPU inference while maintaining high-quality outputs.