GPT4-X-Alpaca-30B-4bit
Property | Value |
---|---|
Base Architecture | LLaMA 30B |
Quantization Types | GPTQ & GGML |
Training Parameters | LoRA (r=16), 10 epochs, 512 context |
Author | MetaIX |
What is GPT4-X-Alpaca-30B-4bit?
GPT4-X-Alpaca-30B-4bit is a highly optimized quantized language model based on Chansung's GPT4-Alpaca LoRA. It offers both GPTQ and GGML quantization options, making it versatile for both GPU and CPU deployment while maintaining impressive performance metrics.
Implementation Details
The model comes in multiple quantized versions: two GPTQ variants (with true-sequential/act-order and true-sequential/groupsize-128 optimizations) and three GGML variants (q4_1, q5_0, and q5_1). The implementation includes specific optimizations that allow for efficient deployment on hardware with varying capabilities.
- GPTQ version with act-order optimization fits in 24GB VRAM
- Training utilized LoRA with r=16 across q_proj, k_proj, v_proj, and o_proj modules
- Benchmark scores show strong performance (Wikitext2: 4.28-4.48, PTB: 8.34-8.54)
Core Capabilities
- Efficient text generation with 4-bit quantization
- Compatible with popular frameworks (Oobabooga, KoboldAI)
- Flexible deployment options for both GPU and CPU
- Full context length support with optimized memory usage
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its dual quantization approach, offering both GPTQ and GGML options, making it versatile for different hardware setups while maintaining strong performance metrics. The act-order optimized version particularly excels in memory efficiency.
Q: What are the recommended use cases?
The model is well-suited for text generation tasks, particularly in scenarios where memory efficiency is crucial. It's ideal for both consumer-grade GPUs (24GB VRAM version) and CPU deployment through GGML quantization.