Llama-2-13B-chat-GPTQ

Maintained By
TheBloke

Llama-2-13B-chat-GPTQ

PropertyValue
Base ModelMeta's Llama-2-13B-chat
Parameter Count13 Billion
Quantization4-bit GPTQ
LicenseLlama2
PaperResearch Paper

What is Llama-2-13B-chat-GPTQ?

Llama-2-13B-chat-GPTQ is a quantized version of Meta's Llama 2 chat model, optimized for efficient deployment while maintaining performance. This version uses GPTQ quantization to reduce the model size and memory footprint while preserving the core capabilities of the original model.

Implementation Details

The model implements various quantization options, with the main branch offering 4-bit precision with 128 group size. The quantization process utilizes the wikitext dataset with a sequence length of 4096 tokens. The model is compatible with multiple frameworks including AutoGPTQ, Transformers, and ExLlama.

  • Multiple quantization options available (4-bit and 8-bit variants)
  • Group size options from 32g to 128g for different performance/memory tradeoffs
  • Compatibility with major frameworks and inference engines
  • Optimized for dialogue use cases with implemented chat template

Core Capabilities

  • Chat-optimized responses with built-in safety parameters
  • Context window of 4096 tokens
  • Supports multiple inference frameworks
  • Various quantization options for different hardware configurations
  • Maintains the base model's performance while reducing resource requirements

Frequently Asked Questions

Q: What makes this model unique?

This model offers a carefully optimized balance between performance and resource usage through GPTQ quantization, making it practical for deployment on consumer hardware while maintaining the quality of the original Llama 2 model.

Q: What are the recommended use cases?

The model is best suited for dialogue applications, chatbots, and interactive AI assistants where efficient deployment is crucial. It's particularly valuable for scenarios requiring good performance on limited hardware resources.

The first platform built for prompt engineering