pixtral-12b-FP8-dynamic

Maintained By
neuralmagic

pixtral-12b-FP8-dynamic

PropertyValue
Parameter Count12.7B
LicenseApache 2.0
Supported LanguagesEnglish, German, French, Italian, Portuguese, Hindi, Spanish, Thai
Model TypeMultimodal (Text/Image)
Tensor TypeBF16/F8_E4M3

What is pixtral-12b-FP8-dynamic?

pixtral-12b-FP8-dynamic is an optimized version of the Pixtral (Llava) architecture, developed by Neural Magic. This model represents a significant advancement in efficient multimodal AI, featuring FP8 quantization for both weights and activations, reducing memory requirements by approximately 50% while maintaining performance comparable to its base model.

Implementation Details

The model employs sophisticated quantization techniques, specifically targeting linear operators within transformer blocks. It uses symmetric per-channel quantization with FP8 data type, implementing dynamic per-token activation quantization. The model can be deployed using vLLM backend, offering efficient inference capabilities.

  • Weight and activation quantization using FP8
  • 50% reduction in disk size and GPU memory requirements
  • Symmetric per-channel quantization for linear operators
  • Dynamic per-token activation quantization

Core Capabilities

  • Multimodal processing (text and image inputs)
  • Competitive performance on benchmarks (MMMU: 51.11%, Mathvista: 59.4%)
  • Support for 8 different languages
  • Assistant-like chat functionality
  • Commercial and research use cases

Frequently Asked Questions

Q: What makes this model unique?

The model's key differentiator is its efficient FP8 quantization while maintaining performance comparable to the original model. It achieves this while supporting multiple languages and handling both text and image inputs, making it particularly valuable for resource-constrained deployments.

Q: What are the recommended use cases?

The model is designed for commercial and research applications requiring multimodal capabilities. It excels in assistant-like chat scenarios, visual question answering, and multilingual applications. However, it should not be used for applications that violate applicable laws or regulations.

The first platform built for prompt engineering