AReaL-boba-SFT-32B

Property	Value
Parameter Count	32 Billion
Model Type	Language Model (Mathematical Reasoning)
Architecture	Transformer-based SFT Model
Model Link	Hugging Face

What is AReaL-boba-SFT-32B?

AReaL-boba-SFT-32B represents a significant advancement in efficient large language model training, particularly for mathematical reasoning tasks. This 32B parameter model achieves remarkable performance comparable to QwQ-32B using only 200 carefully curated training samples, demonstrating the importance of high-quality data over quantity.

Implementation Details

The model leverages SGLang v0.4.0 as its generation backend, offering significant speed improvements through radix attention mechanisms. The training process incorporates sophisticated optimizations for handling variable-length sequences and large batches, with high-performance data transfer capabilities scaling up to 1000 GPUs.

Advanced sequence packing into 1D tensors for optimal GPU memory utilization
NCCL with GPU-Direct RDMA for efficient communication
Specialized evaluation settings for long-context generation
Temperature setting of 0.6 and top_p of 0.95 for inference

Core Capabilities

Achieves 78.8% accuracy on AIME 2024 problems
62.1% accuracy on AIME 2025 problems
60.1% performance on GPQA-Diamond
Specialized in step-by-step mathematical reasoning
Efficient handling of long-context problems

Frequently Asked Questions

Q: What makes this model unique?

The model's ability to achieve SOTA performance using only 200 training samples sets it apart, demonstrating exceptional data efficiency. It matches the performance of QwQ-32B while using significantly fewer resources for training.

Q: What are the recommended use cases?

The model excels in mathematical reasoning tasks, particularly in solving complex problems requiring step-by-step solutions. It's specifically optimized for AIME-level mathematics problems and similar advanced mathematical reasoning tasks.