GPT4-X-Alpaca-30B-4bit

Property	Value
Base Architecture	LLaMA 30B
Quantization Types	GPTQ & GGML
Training Parameters	LoRA (r=16), 10 epochs, 512 context
Author	MetaIX

What is GPT4-X-Alpaca-30B-4bit?

GPT4-X-Alpaca-30B-4bit is a highly optimized quantized language model based on Chansung's GPT4-Alpaca LoRA. It offers both GPTQ and GGML quantization options, making it versatile for both GPU and CPU deployment while maintaining impressive performance metrics.

Implementation Details

The model comes in multiple quantized versions: two GPTQ variants (with true-sequential/act-order and true-sequential/groupsize-128 optimizations) and three GGML variants (q4_1, q5_0, and q5_1). The implementation includes specific optimizations that allow for efficient deployment on hardware with varying capabilities.

GPTQ version with act-order optimization fits in 24GB VRAM
Training utilized LoRA with r=16 across q_proj, k_proj, v_proj, and o_proj modules
Benchmark scores show strong performance (Wikitext2: 4.28-4.48, PTB: 8.34-8.54)

Core Capabilities

Efficient text generation with 4-bit quantization
Compatible with popular frameworks (Oobabooga, KoboldAI)
Flexible deployment options for both GPU and CPU
Full context length support with optimized memory usage

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its dual quantization approach, offering both GPTQ and GGML options, making it versatile for different hardware setups while maintaining strong performance metrics. The act-order optimized version particularly excels in memory efficiency.

Q: What are the recommended use cases?

The model is well-suited for text generation tasks, particularly in scenarios where memory efficiency is crucial. It's ideal for both consumer-grade GPUs (24GB VRAM version) and CPU deployment through GGML quantization.