DeepSeek-Prover-V1.5-RL

Property	Value
Parameter Count	6.91B
Model Type	Theorem Proving LLM
License	DeepSeek License
Paper	arXiv:2408.08152
Tensor Type	BF16

What is DeepSeek-Prover-V1.5-RL?

DeepSeek-Prover-V1.5-RL is a sophisticated language model specifically designed for theorem proving in Lean 4. Built upon DeepSeekMath-Base, this model represents a significant advancement in automated mathematical reasoning, achieving state-of-the-art results through a combination of reinforcement learning and innovative Monte-Carlo tree search techniques.

Implementation Details

The model implements a novel RMaxTS (Reward-Maximizing Tree Search) approach, combining reinforcement learning from proof assistant feedback (RLPAF) with sophisticated tree search algorithms. This implementation has led to remarkable improvements over its predecessor, particularly in handling complex mathematical proofs.

Specialized pre-training on mathematical formal languages
Enhanced supervised fine-tuning using an improved theorem proving dataset
Implementation of RMaxTS for diverse proof path generation
Integration of proof assistant feedback for reinforcement learning

Core Capabilities

Achieves 63.5% accuracy on miniF2F test benchmark (high school level)
25.3% success rate on ProofNet (undergraduate level)
Generates diverse proof paths through intrinsic-reward-driven exploration
Supports complex mathematical reasoning and formal proof generation

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its innovative combination of reinforcement learning and Monte-Carlo tree search, specifically optimized for mathematical theorem proving. The RMaxTS approach represents a significant advancement in proof generation methodology.

Q: What are the recommended use cases?

The model is specifically designed for formal mathematical proof generation in Lean 4, making it ideal for automated theorem proving, mathematical research assistance, and educational applications in formal mathematics.