GPT4-X-Alpaca-30B-4bit

Maintained By
MetaIX

GPT4-X-Alpaca-30B-4bit

PropertyValue
Base ArchitectureLLaMA 30B
Quantization TypesGPTQ & GGML
Training ParametersLoRA (r=16), 10 epochs, 512 context
AuthorMetaIX

What is GPT4-X-Alpaca-30B-4bit?

GPT4-X-Alpaca-30B-4bit is a highly optimized quantized language model based on Chansung's GPT4-Alpaca LoRA. It offers both GPTQ and GGML quantization options, making it versatile for both GPU and CPU deployment while maintaining impressive performance metrics.

Implementation Details

The model comes in multiple quantized versions: two GPTQ variants (with true-sequential/act-order and true-sequential/groupsize-128 optimizations) and three GGML variants (q4_1, q5_0, and q5_1). The implementation includes specific optimizations that allow for efficient deployment on hardware with varying capabilities.

  • GPTQ version with act-order optimization fits in 24GB VRAM
  • Training utilized LoRA with r=16 across q_proj, k_proj, v_proj, and o_proj modules
  • Benchmark scores show strong performance (Wikitext2: 4.28-4.48, PTB: 8.34-8.54)

Core Capabilities

  • Efficient text generation with 4-bit quantization
  • Compatible with popular frameworks (Oobabooga, KoboldAI)
  • Flexible deployment options for both GPU and CPU
  • Full context length support with optimized memory usage

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its dual quantization approach, offering both GPTQ and GGML options, making it versatile for different hardware setups while maintaining strong performance metrics. The act-order optimized version particularly excels in memory efficiency.

Q: What are the recommended use cases?

The model is well-suited for text generation tasks, particularly in scenarios where memory efficiency is crucial. It's ideal for both consumer-grade GPUs (24GB VRAM version) and CPU deployment through GGML quantization.

The first platform built for prompt engineering