DeepSeek-R1-FP4

Maintained By
nvidia

DeepSeek-R1-FP4

PropertyValue
LicenseMIT
ArchitectureTransformer-based DeepSeek R1
QuantizationFP4
Context Length128K tokens
Hardware SupportNVIDIA Blackwell
Model URLhttps://huggingface.co/nvidia/DeepSeek-R1-FP4

What is DeepSeek-R1-FP4?

DeepSeek-R1-FP4 is NVIDIA's quantized version of the DeepSeek R1 auto-regressive language model, optimized for efficient inference using FP4 precision. This model represents a significant advancement in model optimization, reducing the bits per parameter from 8 to 4, resulting in approximately 1.6x reduction in disk size and GPU memory requirements while maintaining performance.

Implementation Details

The model leverages TensorRT-LLM for deployment and requires 8xB200 GPUs for optimal performance. The quantization process specifically targets the weights and activations of linear operators within transformer blocks, providing an efficient balance between performance and resource utilization.

  • Optimized using nvidia-modelopt v0.23.0
  • Supports up to 128K context length
  • Calibrated using cnn_dailymail dataset
  • Evaluated on MMLU benchmark

Core Capabilities

  • Efficient text generation with reduced memory footprint
  • High-performance inference using TensorRT-LLM
  • Support for long context understanding
  • Optimized for commercial and non-commercial applications

Frequently Asked Questions

Q: What makes this model unique?

The model's FP4 quantization significantly reduces resource requirements while maintaining performance, making it ideal for production deployments on NVIDIA hardware.

Q: What are the recommended use cases?

The model is suitable for various text generation tasks requiring efficient inference, particularly in production environments where resource optimization is crucial while maintaining high performance.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.