Dolphin 2.7 Mixtral 8x7b GPTQ
Property | Value |
---|---|
Parameter Count | 6.09B |
License | Apache 2.0 |
Model Type | Mixtral-based GPTQ |
Quantization Options | 3-bit, 4-bit, 8-bit |
What is dolphin-2.7-mixtral-8x7b-GPTQ?
Dolphin 2.7 Mixtral 8x7b GPTQ is a quantized version of the Dolphin 2.7 model, based on Mixtral-8x7b architecture. This version represents a significant advancement in efficient AI deployment, offering various quantization options to balance performance and resource usage. The model was trained for 3 days on 4x A100s using qLoRA and Axolotl, incorporating multiple high-quality datasets including ehartford/dolphin, airoboros, and Magicoder.
Implementation Details
The model employs GPTQ quantization with multiple options ranging from 3-bit to 8-bit precision, allowing users to choose based on their hardware constraints and quality requirements. It uses the ChatML prompt format and supports a context length of up to 8192 tokens.
- Multiple GPTQ parameter permutations available (3-bit, 4-bit, 8-bit)
- Supports various group sizes for different VRAM requirements
- Compatible with text-generation-webui and Hugging Face's transformers library
- Implements Act Order for improved quantization accuracy
Core Capabilities
- Enhanced coding capabilities with specialized training data
- Highly compliant and responsive to user instructions
- Supports 16k context window during training
- Optimized for both general chat and specialized tasks
- Compatible with multiple inference frameworks including TGI and text-generation-webui
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its optimized balance between performance and resource efficiency, offering multiple quantization options while maintaining high-quality output, especially in coding tasks. The incorporation of various high-quality datasets and specific fixes in the transformers library make it particularly robust.
Q: What are the recommended use cases?
The model excels in coding tasks, general chat applications, and structured output generation. It's particularly suitable for deployments where resource efficiency is crucial while maintaining high-quality output. The multiple quantization options make it adaptable to various hardware configurations.