Mistral-7B-OpenOrca-GPTQ

Maintained By
TheBloke

Mistral-7B-OpenOrca-GPTQ

PropertyValue
Base ModelMistral-7B-OpenOrca
Parameter Count7B
LicenseApache 2.0
PaperOrca Paper

What is Mistral-7B-OpenOrca-GPTQ?

Mistral-7B-OpenOrca-GPTQ is a quantized version of the OpenOrca-enhanced Mistral language model, optimized for efficient GPU inference. This implementation uses GPTQ quantization to reduce model size while maintaining performance, offering multiple quantization options including 4-bit and 8-bit variants with different group sizes for optimal performance-efficiency trade-offs.

Implementation Details

The model utilizes the ChatML format and comes with multiple GPTQ parameter permutations, ranging from 4-bit to 8-bit quantization with various group sizes (32g, 64g, 128g). The quantization process used the WikiText dataset with a sequence length of 32,768 tokens and includes Act Order optimization for enhanced accuracy.

  • Multiple quantization options (4-bit and 8-bit variants)
  • Supports group sizes from 32 to 128 for optimization
  • Compatible with ExLlama, AutoGPTQ, and Text Generation Inference
  • Uses ChatML prompt format for structured interactions

Core Capabilities

  • Efficient GPU inference with reduced memory footprint
  • Maintains high performance while reducing model size
  • Supports context length of 32,768 tokens
  • Optimized for both accuracy and memory efficiency
  • Compatible with major inference frameworks

Frequently Asked Questions

Q: What makes this model unique?

This model combines the power of Mistral-7B with OpenOrca's improvements and GPTQ quantization, offering an efficient solution for GPU deployment while maintaining high performance. It notably provides multiple quantization options to suit different hardware configurations and use cases.

Q: What are the recommended use cases?

The model is ideal for deployment in resource-constrained environments where GPU memory is limited but high performance is required. It's particularly suitable for text generation, conversation, and general language understanding tasks where efficient inference is crucial.

The first platform built for prompt engineering