pixtral-12b-FP8-dynamic

Property	Value
Parameter Count	12.7B
License	Apache 2.0
Supported Languages	English, German, French, Italian, Portuguese, Hindi, Spanish, Thai
Model Type	Multimodal (Text/Image)
Tensor Type	BF16/F8_E4M3

What is pixtral-12b-FP8-dynamic?

pixtral-12b-FP8-dynamic is an optimized version of the Pixtral (Llava) architecture, developed by Neural Magic. This model represents a significant advancement in efficient multimodal AI, featuring FP8 quantization for both weights and activations, reducing memory requirements by approximately 50% while maintaining performance comparable to its base model.

Implementation Details

The model employs sophisticated quantization techniques, specifically targeting linear operators within transformer blocks. It uses symmetric per-channel quantization with FP8 data type, implementing dynamic per-token activation quantization. The model can be deployed using vLLM backend, offering efficient inference capabilities.

Weight and activation quantization using FP8
50% reduction in disk size and GPU memory requirements
Symmetric per-channel quantization for linear operators
Dynamic per-token activation quantization

Core Capabilities

Multimodal processing (text and image inputs)
Competitive performance on benchmarks (MMMU: 51.11%, Mathvista: 59.4%)
Support for 8 different languages
Assistant-like chat functionality
Commercial and research use cases

Frequently Asked Questions

Q: What makes this model unique?

The model's key differentiator is its efficient FP8 quantization while maintaining performance comparable to the original model. It achieves this while supporting multiple languages and handling both text and image inputs, making it particularly valuable for resource-constrained deployments.

Q: What are the recommended use cases?

The model is designed for commercial and research applications requiring multimodal capabilities. It excels in assistant-like chat scenarios, visual question answering, and multilingual applications. However, it should not be used for applications that violate applicable laws or regulations.