Mistral-Nemo-Instruct-2407-FP8

Maintained By
neuralmagic

Mistral-Nemo-Instruct-2407-FP8

PropertyValue
Parameter Count12.2B
LicenseApache 2.0
Tensor TypeBF16/F8_E4M3
OpenLLM Score71.28

What is Mistral-Nemo-Instruct-2407-FP8?

Mistral-Nemo-Instruct-2407-FP8 is an optimized version of the original Mistral-Nemo-Instruct model, specifically designed for efficient deployment while maintaining high performance. Through FP8 quantization, it achieves approximately 50% reduction in disk size and GPU memory requirements compared to the original model, while preserving 99.53% of its performance.

Implementation Details

The model employs sophisticated optimization techniques, particularly in its quantization approach. It uses symmetric per-tensor quantization for both weights and activations of linear operators within transformer blocks, implementing the FP8 data type through the AutoFP8 framework with calibration on 512 sequences of UltraChat.

  • Weight and activation quantization to FP8
  • Compatible with vLLM >= 0.5.0
  • 4096 token context window
  • Optimized for commercial and research applications

Core Capabilities

  • Achieves 71.28 average score on OpenLLM benchmark
  • Excels in various tasks: MMLU (68.50%), GSM-8K (73.01%), Hellaswag (84.18%)
  • Supports efficient deployment through vLLM backend
  • Specialized for English language tasks and assistant-like chat applications

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its efficient FP8 quantization that reduces resource requirements by 50% while maintaining over 99% of the original model's performance, making it particularly suitable for production deployment.

Q: What are the recommended use cases?

The model is optimized for English language applications, particularly in commercial and research contexts requiring assistant-like chat functionality. It's specifically designed for deployment scenarios where resource efficiency is crucial without compromising performance.

The first platform built for prompt engineering