Meta-Llama-3-70B-Instruct-FP8

Maintained By
neuralmagic

Meta-Llama-3-70B-Instruct-FP8

PropertyValue
Parameter Count70.6B
Model TypeLanguage Model (Instruct)
LicenseLlama3
QuantizationFP8
OpenLLM Score79.16

What is Meta-Llama-3-70B-Instruct-FP8?

Meta-Llama-3-70B-Instruct-FP8 is an optimized version of Meta's Llama-3 70B model, specifically designed for efficient deployment while maintaining near-original performance. This model implements FP8 quantization for both weights and activations, effectively reducing the model's memory footprint by approximately 50% compared to the original 16-bit version.

Implementation Details

The model employs sophisticated quantization techniques using AutoFP8, focusing on the linear operators within transformer blocks. It achieves remarkable efficiency while maintaining 99.55% of the original model's performance on benchmark tasks.

  • Weight and activation quantization using FP8 data type
  • Symmetric per-tensor quantization implementation
  • Compatible with vLLM >= 0.5.0 for inference
  • Calibrated using 512 sequences from UltraChat

Core Capabilities

  • Benchmark Performance: 80.06% on MMLU (5-shot)
  • Strong reasoning capabilities with 91.12% on GSM-8K
  • Excellent performance on Hellaswag (85.41%) and Winogrande (83.03%)
  • Optimized for English language tasks
  • Suitable for commercial and research applications

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its optimal balance between performance and efficiency, using FP8 quantization to reduce resource requirements while maintaining 99.55% of the original model's accuracy. It's specifically optimized for deployment with vLLM, making it ideal for production environments.

Q: What are the recommended use cases?

The model is best suited for English language tasks, particularly in commercial and research applications requiring assistant-like chat capabilities. It's optimized for deployment scenarios where resource efficiency is crucial while maintaining high performance standards.

The first platform built for prompt engineering