Llama-3.2-90B-Vision-Instruct-FP8-dynamic
Property | Value |
---|---|
Parameter Count | 88.6B |
Model Type | Multimodal (Text + Vision) |
Languages | English, German, French, Italian, Portuguese, Hindi, Spanish, Thai |
License | llama3.2 |
Release Date | September 25, 2024 |
What is Llama-3.2-90B-Vision-Instruct-FP8-dynamic?
This model is an optimized version of Meta's Llama-3.2-90B-Vision-Instruct, specifically quantized to FP8 format to reduce memory requirements while maintaining performance. It's designed for multimodal tasks, capable of processing both text and image inputs to generate human-like responses across 8 different languages.
Implementation Details
The model employs sophisticated quantization techniques, converting the original BF16 weights and activations to FP8 format. This optimization reduces the bits per parameter from 16 to 8, resulting in approximately 50% reduction in disk size and GPU memory requirements. The quantization is applied specifically to linear operators within transformer blocks using symmetric per-channel quantization.
- Weight quantization: FP8 with per-channel scaling
- Activation quantization: Dynamic FP8 per-token basis
- Implementation via LLM Compressor framework
- Compatible with vLLM backend for efficient deployment
Core Capabilities
- Multimodal processing of text and image inputs
- Support for 8 different languages
- Efficient memory usage through FP8 quantization
- Assistant-like chat functionality
- Commercial and research applications
Frequently Asked Questions
Q: What makes this model unique?
This model stands out due to its efficient FP8 quantization while maintaining the capabilities of the original 90B parameter model. It achieves significant memory savings without compromising on the multimodal and multilingual capabilities.
Q: What are the recommended use cases?
The model is particularly well-suited for commercial and research applications requiring multimodal understanding, such as image-based conversations, multilingual assistance, and general-purpose AI chat applications. However, it should not be used for applications that violate applicable laws or regulations.