Nous-Hermes-13B-GPTQ
Property | Value |
---|---|
Parameter Count | 2.03B (Quantized) |
Model Type | GPTQ 4-bit Quantized LLaMA |
License | Other |
Language | English |
What is Nous-Hermes-13B-GPTQ?
Nous-Hermes-13B-GPTQ is a highly optimized 4-bit quantized version of the Nous-Hermes-13B language model, designed for efficient GPU inference. Originally fine-tuned on over 300,000 instructions, this model rivals GPT-3.5-turbo in performance while offering unique advantages like longer responses and lower hallucination rates.
Implementation Details
The model utilizes GPTQ quantization with a groupsize of 128 and features automatic parameter configuration through quantize_config.json. It's implemented using the AutoGPTQ framework and supports both Triton and CUDA execution paths.
- 4-bit precision quantization for optimal performance
- Trained on synthetic GPT-4 outputs and specialized datasets
- Supports text-generation-inference capabilities
- Uses Alpaca prompt format for instruction following
Core Capabilities
- Long-form response generation
- Low hallucination rate compared to similar models
- Comprehensive instruction following
- Code generation and analysis
- Scientific and technical content generation
Frequently Asked Questions
Q: What makes this model unique?
The model combines efficient 4-bit quantization with high-quality instruction following capabilities, trained on a diverse dataset including GPTeacher, CodeAlpaca, and specialized scientific datasets. It's particularly notable for generating longer, more accurate responses without conventional censorship mechanisms.
Q: What are the recommended use cases?
The model excels in tasks requiring detailed responses, code generation, scientific content creation, and general instruction following. It's particularly suitable for applications needing efficient GPU inference while maintaining high-quality outputs.