Llama-3.2-90B-Vision-Instruct-FP8-dynamic

Property	Value
Parameter Count	88.6B
Model Type	Multimodal (Text + Vision)
Languages	English, German, French, Italian, Portuguese, Hindi, Spanish, Thai
License	llama3.2
Release Date	September 25, 2024

What is Llama-3.2-90B-Vision-Instruct-FP8-dynamic?

This model is an optimized version of Meta's Llama-3.2-90B-Vision-Instruct, specifically quantized to FP8 format to reduce memory requirements while maintaining performance. It's designed for multimodal tasks, capable of processing both text and image inputs to generate human-like responses across 8 different languages.

Implementation Details

The model employs sophisticated quantization techniques, converting the original BF16 weights and activations to FP8 format. This optimization reduces the bits per parameter from 16 to 8, resulting in approximately 50% reduction in disk size and GPU memory requirements. The quantization is applied specifically to linear operators within transformer blocks using symmetric per-channel quantization.

Weight quantization: FP8 with per-channel scaling
Activation quantization: Dynamic FP8 per-token basis
Implementation via LLM Compressor framework
Compatible with vLLM backend for efficient deployment

Core Capabilities

Multimodal processing of text and image inputs
Support for 8 different languages
Efficient memory usage through FP8 quantization
Assistant-like chat functionality
Commercial and research applications

Frequently Asked Questions

Q: What makes this model unique?

This model stands out due to its efficient FP8 quantization while maintaining the capabilities of the original 90B parameter model. It achieves significant memory savings without compromising on the multimodal and multilingual capabilities.

Q: What are the recommended use cases?

The model is particularly well-suited for commercial and research applications requiring multimodal understanding, such as image-based conversations, multilingual assistance, and general-purpose AI chat applications. However, it should not be used for applications that violate applicable laws or regulations.