pixtral-12b-FP8-dynamic
Property | Value |
---|---|
Parameter Count | 12.7B |
License | Apache 2.0 |
Supported Languages | English, German, French, Italian, Portuguese, Hindi, Spanish, Thai |
Model Type | Multimodal (Text/Image) |
Tensor Type | BF16/F8_E4M3 |
What is pixtral-12b-FP8-dynamic?
pixtral-12b-FP8-dynamic is an optimized version of the Pixtral (Llava) architecture, developed by Neural Magic. This model represents a significant advancement in efficient multimodal AI, featuring FP8 quantization for both weights and activations, reducing memory requirements by approximately 50% while maintaining performance comparable to its base model.
Implementation Details
The model employs sophisticated quantization techniques, specifically targeting linear operators within transformer blocks. It uses symmetric per-channel quantization with FP8 data type, implementing dynamic per-token activation quantization. The model can be deployed using vLLM backend, offering efficient inference capabilities.
- Weight and activation quantization using FP8
- 50% reduction in disk size and GPU memory requirements
- Symmetric per-channel quantization for linear operators
- Dynamic per-token activation quantization
Core Capabilities
- Multimodal processing (text and image inputs)
- Competitive performance on benchmarks (MMMU: 51.11%, Mathvista: 59.4%)
- Support for 8 different languages
- Assistant-like chat functionality
- Commercial and research use cases
Frequently Asked Questions
Q: What makes this model unique?
The model's key differentiator is its efficient FP8 quantization while maintaining performance comparable to the original model. It achieves this while supporting multiple languages and handling both text and image inputs, making it particularly valuable for resource-constrained deployments.
Q: What are the recommended use cases?
The model is designed for commercial and research applications requiring multimodal capabilities. It excels in assistant-like chat scenarios, visual question answering, and multilingual applications. However, it should not be used for applications that violate applicable laws or regulations.