Llama-2-13B-GPTQ

Maintained By
TheBloke

Llama-2-13B-GPTQ

PropertyValue
Base ModelMeta Llama-2-13B
Parameter Count13 Billion
LicenseLlama2
PaperResearch Paper
AuthorTheBloke

What is Llama-2-13B-GPTQ?

Llama-2-13B-GPTQ is a quantized version of Meta's Llama-2-13B language model, optimized for efficient deployment while maintaining performance. This implementation provides multiple quantization options, including 4-bit and 8-bit variants with different group sizes, allowing users to balance between VRAM usage and model accuracy.

Implementation Details

The model offers various GPTQ configurations, with the main branch providing a 4-bit quantization with 128 group size. Multiple branches are available for different quantization parameters, including options with act-order and varying group sizes (32g, 64g, 128g).

  • Multiple quantization options (4-bit and 8-bit versions)
  • Compatible with AutoGPTQ and ExLlama
  • Supports different group sizes for VRAM optimization
  • Uses WikiText dataset for quantization

Core Capabilities

  • Text generation with configurable parameters
  • Supports both pipeline and direct generation approaches
  • Integration with popular frameworks like text-generation-webui
  • Flexible deployment options for different hardware configurations

Frequently Asked Questions

Q: What makes this model unique?

This model stands out through its variety of quantization options, allowing users to choose the optimal balance between VRAM usage and model quality. The 4-bit variants are particularly noteworthy for their efficiency while maintaining good performance.

Q: What are the recommended use cases?

The model is best suited for text generation tasks in English, particularly when deployment efficiency is crucial. It's ideal for scenarios requiring a balance between model performance and hardware resources.

The first platform built for prompt engineering