Dolphin 2.7 Mixtral 8x7b GPTQ

Property	Value
Parameter Count	6.09B
License	Apache 2.0
Model Type	Mixtral-based GPTQ
Quantization Options	3-bit, 4-bit, 8-bit

What is dolphin-2.7-mixtral-8x7b-GPTQ?

Dolphin 2.7 Mixtral 8x7b GPTQ is a quantized version of the Dolphin 2.7 model, based on Mixtral-8x7b architecture. This version represents a significant advancement in efficient AI deployment, offering various quantization options to balance performance and resource usage. The model was trained for 3 days on 4x A100s using qLoRA and Axolotl, incorporating multiple high-quality datasets including ehartford/dolphin, airoboros, and Magicoder.

Implementation Details

The model employs GPTQ quantization with multiple options ranging from 3-bit to 8-bit precision, allowing users to choose based on their hardware constraints and quality requirements. It uses the ChatML prompt format and supports a context length of up to 8192 tokens.

Multiple GPTQ parameter permutations available (3-bit, 4-bit, 8-bit)
Supports various group sizes for different VRAM requirements
Compatible with text-generation-webui and Hugging Face's transformers library
Implements Act Order for improved quantization accuracy

Core Capabilities

Enhanced coding capabilities with specialized training data
Highly compliant and responsive to user instructions
Supports 16k context window during training
Optimized for both general chat and specialized tasks
Compatible with multiple inference frameworks including TGI and text-generation-webui

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its optimized balance between performance and resource efficiency, offering multiple quantization options while maintaining high-quality output, especially in coding tasks. The incorporation of various high-quality datasets and specific fixes in the transformers library make it particularly robust.

Q: What are the recommended use cases?

The model excels in coding tasks, general chat applications, and structured output generation. It's particularly suitable for deployments where resource efficiency is crucial while maintaining high-quality output. The multiple quantization options make it adaptable to various hardware configurations.