Meta-Llama-3.1-70B-Instruct-FP8

Maintained By
neuralmagic

Meta-Llama-3.1-70B-Instruct-FP8

PropertyValue
Parameter Count70.6B
Licensellama3.1
Supported Languages8 (en, de, fr, it, pt, hi, es, th)
QuantizationFP8 (8-bit)
Release Date7/23/2024

What is Meta-Llama-3.1-70B-Instruct-FP8?

Meta-Llama-3.1-70B-Instruct-FP8 is a highly optimized version of the original Meta-Llama-3.1-70B-Instruct model, featuring FP8 quantization for both weights and activations. This optimization reduces the model's disk size and GPU memory requirements by approximately 50% while maintaining an impressive 99.88% of the original model's performance.

Implementation Details

The model employs symmetric per-tensor quantization on the linear operators within transformer blocks, using LLM Compressor with calibration samples from UltraChat. It achieves an average score of 84.29 on the OpenLLM benchmark, compared to the original model's 84.40.

  • Weight and activation quantization using FP8 data type
  • Optimized for vLLM deployment
  • 50% reduction in memory footprint
  • Calibrated using 512 sequences from UltraChat

Core Capabilities

  • Multi-language support across 8 languages
  • Assistant-like chat functionality
  • High performance on key benchmarks (MMLU, ARC-Challenge, GSM-8K)
  • Efficient deployment with vLLM backend
  • Commercial and research use compatibility

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for achieving nearly identical performance to its full-precision counterpart while requiring only half the computational resources through FP8 quantization. It maintains over 99.8% accuracy across major benchmarks while being more deployment-friendly.

Q: What are the recommended use cases?

The model is specifically designed for commercial and research applications requiring assistant-like chat capabilities across multiple languages. It's particularly suitable for deployments where resource efficiency is crucial without compromising on performance.

The first platform built for prompt engineering