DeepSeek-R1-Distill-Llama-70B
Property | Value |
---|---|
Base Model | Llama-3.3-70B-Instruct |
License | MIT License |
Context Length | 32,768 tokens |
Paper | arXiv:2501.12948 |
What is DeepSeek-R1-Distill-Llama-70B?
DeepSeek-R1-Distill-Llama-70B is a powerful language model that has been distilled from the larger DeepSeek-R1 model, based on Llama 3.3 architecture. It represents a significant achievement in making advanced reasoning capabilities accessible in a more compact form while maintaining impressive performance across various tasks.
Implementation Details
The model is derived from Llama 3.3-70B-Instruct and has been fine-tuned using carefully curated samples from DeepSeek-R1. It demonstrates remarkable capabilities in mathematical reasoning, coding, and general problem-solving tasks.
- Utilizes temperature settings of 0.6 and top-p of 0.95 for optimal performance
- Supports a maximum context length of 32,768 tokens
- Compatible with vLLM and SGLang deployment options
Core Capabilities
- Strong performance in AIME 2024 (70.0% pass@1)
- Excellent MATH-500 results (94.5% pass@1)
- Competitive CodeForces rating of 1633
- Advanced reasoning and problem-solving abilities
- Robust performance in GPQA Diamond tasks (65.2% pass@1)
Frequently Asked Questions
Q: What makes this model unique?
This model uniquely combines the efficiency of distillation with the advanced reasoning capabilities of DeepSeek-R1, making it particularly effective for mathematical and coding tasks while being more accessible for deployment.
Q: What are the recommended use cases?
The model excels in mathematical problem-solving, coding tasks, and complex reasoning scenarios. It's particularly well-suited for applications requiring detailed step-by-step reasoning and technical problem-solving.