Llama-3.3-Nemotron-70B-Select

Property	Value
Parameter Count	70 Billion
Architecture	Transformer (Llama 3.3)
License	NVIDIA Open Model License
Release Date	March 18, 2025
Max Context Length	128k tokens

What is Llama-3.3-Nemotron-70B-Select?

Llama-3.3-Nemotron-70B-Select is a sophisticated language model that builds upon Meta's Llama-3.3-70B-Instruct foundation. This model specializes in response selection using scaled Bradley-Terry modeling, effectively identifying and selecting the most helpful AI-generated responses to user queries. As part of the Feedback-Edit Inference Time Scaling (ITS) system, it achieves remarkable performance on the Arena Hard leaderboard.

Implementation Details

The model operates on a transformer architecture with 70 billion parameters, requiring at least 2x80GB GPUs (NVIDIA Ampere or newer) for deployment. It processes text inputs with a generous 128k token context window and outputs float values representing response quality scores.

Trained on the HelpSteer3 dataset comprising 38,459 prompts with paired responses
Supports major NVIDIA architectures including Ampere, Hopper, and Turing
Implements through HuggingFace Transformers library
Utilizes bfloat16 precision for optimal performance

Core Capabilities

Response Quality Assessment through float score outputs
High-performance response selection using Bradley-Terry modeling
Integration with Feedback-Edit ITS system
Commercial-ready deployment capabilities
Extensive context handling up to 128k tokens

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its ability to evaluate and select the most helpful responses using scaled Bradley-Terry modeling, achieving superior performance when combined with the Feedback-Edit ITS approach. It demonstrates 93.4% accuracy on Arena Hard benchmarks as part of the complete system.

Q: What are the recommended use cases?

The model is specifically designed for users seeking to improve performance through Inference-Time-Scaling in general-domain, open-ended tasks. It excels in scenarios requiring high-quality response selection and can be effectively integrated into commercial applications.