Llama-2-13B-GPTQ
Property | Value |
---|---|
Base Model | Meta Llama-2-13B |
Parameter Count | 13 Billion |
License | Llama2 |
Paper | Research Paper |
Author | TheBloke |
What is Llama-2-13B-GPTQ?
Llama-2-13B-GPTQ is a quantized version of Meta's Llama-2-13B language model, optimized for efficient deployment while maintaining performance. This implementation provides multiple quantization options, including 4-bit and 8-bit variants with different group sizes, allowing users to balance between VRAM usage and model accuracy.
Implementation Details
The model offers various GPTQ configurations, with the main branch providing a 4-bit quantization with 128 group size. Multiple branches are available for different quantization parameters, including options with act-order and varying group sizes (32g, 64g, 128g).
- Multiple quantization options (4-bit and 8-bit versions)
- Compatible with AutoGPTQ and ExLlama
- Supports different group sizes for VRAM optimization
- Uses WikiText dataset for quantization
Core Capabilities
- Text generation with configurable parameters
- Supports both pipeline and direct generation approaches
- Integration with popular frameworks like text-generation-webui
- Flexible deployment options for different hardware configurations
Frequently Asked Questions
Q: What makes this model unique?
This model stands out through its variety of quantization options, allowing users to choose the optimal balance between VRAM usage and model quality. The 4-bit variants are particularly noteworthy for their efficiency while maintaining good performance.
Q: What are the recommended use cases?
The model is best suited for text generation tasks in English, particularly when deployment efficiency is crucial. It's ideal for scenarios requiring a balance between model performance and hardware resources.