DeepSeek-Prover-V1.5-RL

Property	Value
Parameter Count	6.91B
Model Type	Theorem Proving LLM
License	DeepSeek License
Paper	arXiv:2408.08152
Tensor Type	BF16

What is DeepSeek-Prover-V1.5-RL?

DeepSeek-Prover-V1.5-RL is an advanced language model specifically designed for theorem proving in Lean 4. It represents a significant evolution from its predecessor, incorporating reinforcement learning from proof assistant feedback (RLPAF) and a novel Monte-Carlo tree search variant called RMaxTS. The model achieves state-of-the-art performance on key benchmarks, demonstrating 63.5% accuracy on miniF2F and 25.3% on ProofNet.

Implementation Details

The model builds upon DeepSeekMath-Base and undergoes a sophisticated training process including supervised fine-tuning and reinforcement learning. It introduces RMaxTS, an innovative approach that uses intrinsic-reward-driven exploration to generate diverse proof paths, moving beyond simple single-pass proof generation.

Pre-trained on DeepSeekMath-Base with formal mathematical language specialization
Enhanced through supervised fine-tuning using an improved theorem proving dataset
Optimized via reinforcement learning with proof assistant feedback
Implements RMaxTS for advanced proof path exploration

Core Capabilities

State-of-the-art performance on high school level miniF2F benchmark (63.5%)
Superior results on undergraduate level ProofNet benchmark (25.3%)
Advanced theorem proving in Lean 4
Diverse proof path generation using Monte-Carlo tree search

Frequently Asked Questions

Q: What makes this model unique?

The model's unique combination of reinforcement learning from proof assistant feedback and RMaxTS-based proof exploration sets it apart, enabling significantly improved theorem-proving capabilities compared to previous approaches.

Q: What are the recommended use cases?

This model is specifically designed for formal theorem proving in Lean 4, making it ideal for mathematical proof verification, academic research, and educational applications in formal mathematics.