DeepSeek-R1-Distill-Llama-70B

Maintained By
deepseek-ai

DeepSeek-R1-Distill-Llama-70B

PropertyValue
Base ModelLlama-3.3-70B-Instruct
LicenseMIT License
Context Length32,768 tokens
PaperarXiv:2501.12948

What is DeepSeek-R1-Distill-Llama-70B?

DeepSeek-R1-Distill-Llama-70B is a powerful language model that has been distilled from the larger DeepSeek-R1 model, based on Llama 3.3 architecture. It represents a significant achievement in making advanced reasoning capabilities accessible in a more compact form while maintaining impressive performance across various tasks.

Implementation Details

The model is derived from Llama 3.3-70B-Instruct and has been fine-tuned using carefully curated samples from DeepSeek-R1. It demonstrates remarkable capabilities in mathematical reasoning, coding, and general problem-solving tasks.

  • Utilizes temperature settings of 0.6 and top-p of 0.95 for optimal performance
  • Supports a maximum context length of 32,768 tokens
  • Compatible with vLLM and SGLang deployment options

Core Capabilities

  • Strong performance in AIME 2024 (70.0% pass@1)
  • Excellent MATH-500 results (94.5% pass@1)
  • Competitive CodeForces rating of 1633
  • Advanced reasoning and problem-solving abilities
  • Robust performance in GPQA Diamond tasks (65.2% pass@1)

Frequently Asked Questions

Q: What makes this model unique?

This model uniquely combines the efficiency of distillation with the advanced reasoning capabilities of DeepSeek-R1, making it particularly effective for mathematical and coding tasks while being more accessible for deployment.

Q: What are the recommended use cases?

The model excels in mathematical problem-solving, coding tasks, and complex reasoning scenarios. It's particularly well-suited for applications requiring detailed step-by-step reasoning and technical problem-solving.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.