Meta-Llama-3.1-8B-Instruct-FP8
Property | Value |
---|---|
Parameter Count | 8.03B |
Model Type | Instruction-tuned LLM |
Supported Languages | 8 (en, de, fr, it, pt, hi, es, th) |
License | llama3.1 |
Quantization | FP8 (weights and activations) |
What is Meta-Llama-3.1-8B-Instruct-FP8?
Meta-Llama-3.1-8B-Instruct-FP8 is an optimized version of Meta's LLaMA 3.1 model, specifically designed for efficient deployment while maintaining nearly identical performance to its full-precision counterpart. Through FP8 quantization, it achieves a 50% reduction in disk size and GPU memory requirements while retaining 99.52% of the original model's performance.
Implementation Details
The model utilizes symmetric per-tensor quantization for both weights and activations of linear operators within transformer blocks. It's optimized for deployment with vLLM and was calibrated using 512 sequences from UltraChat dataset.
- Achieves 73.44 average score on OpenLLM benchmark (vs 73.79 for original)
- Optimized for commercial and research applications
- Compatible with vLLM for efficient inference
Core Capabilities
- Multi-lingual support across 8 languages
- Assistant-style chat functionality
- Strong performance on key benchmarks (MMLU: 67.97%, ARC Challenge: 81.66%, GSM-8K: 81.12%)
- 50% reduced resource requirements compared to original model
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its efficient FP8 quantization that dramatically reduces resource requirements while maintaining over 99.5% of the original model's performance across all benchmarks. It's particularly notable for maintaining this high performance across multiple languages and complex reasoning tasks.
Q: What are the recommended use cases?
The model is ideal for commercial and research applications requiring efficient deployment of large language models, particularly in multi-lingual contexts. It's specifically designed for assistant-like chat applications where resource optimization is crucial but performance cannot be compromised.