Mixtral-8x7B-Instruct-v0.1-GPTQ
Property | Value |
---|---|
Parameter Count | 6.09B |
License | Apache 2.0 |
Supported Languages | English, French, Italian, German, Spanish |
Author | TheBloke |
What is Mixtral-8x7B-Instruct-v0.1-GPTQ?
This is a quantized version of Mistral AI's Mixtral-8x7B-Instruct model, optimized using GPTQ quantization techniques. It offers multiple quantization options ranging from 3-bit to 8-bit precision, allowing users to balance between model performance and resource requirements.
Implementation Details
The model implements a Sparse Mixture of Experts architecture and uses the GPTQ quantization method to reduce model size while maintaining performance. It requires Transformers 4.36.0 or later and either AutoGPTQ 0.6 or Transformers 4.37.0.dev0.
- Multiple quantization options (3-bit, 4-bit, 8-bit)
- Various group sizes (32g, 128g) for different VRAM optimization levels
- Act-Order implementation for improved accuracy
- Sequence length of 8192 tokens
Core Capabilities
- Multi-language support across 5 major languages
- Efficient GPU inference with reduced VRAM usage
- Instruction-following capabilities using [INST] format
- Maintains base model performance while reducing resource requirements
Frequently Asked Questions
Q: What makes this model unique?
This model provides multiple quantization options optimized for different hardware configurations, making it highly versatile for various deployment scenarios while maintaining the core capabilities of the original Mixtral model.
Q: What are the recommended use cases?
The model is ideal for GPU-based inference in resource-constrained environments, particularly suitable for tasks requiring multilingual capabilities and instruction-following behavior. Different quantization options allow for deployment on various hardware configurations.