Llama-3.3-Nemotron-70B-Select

Maintained By
nvidia

Llama-3.3-Nemotron-70B-Select

PropertyValue
Parameter Count70 Billion
ArchitectureTransformer (Llama 3.3)
LicenseNVIDIA Open Model License
Release DateMarch 18, 2025
Max Context Length128k tokens

What is Llama-3.3-Nemotron-70B-Select?

Llama-3.3-Nemotron-70B-Select is a sophisticated language model that builds upon Meta's Llama-3.3-70B-Instruct foundation. This model specializes in response selection using scaled Bradley-Terry modeling, effectively identifying and selecting the most helpful AI-generated responses to user queries. As part of the Feedback-Edit Inference Time Scaling (ITS) system, it achieves remarkable performance on the Arena Hard leaderboard.

Implementation Details

The model operates on a transformer architecture with 70 billion parameters, requiring at least 2x80GB GPUs (NVIDIA Ampere or newer) for deployment. It processes text inputs with a generous 128k token context window and outputs float values representing response quality scores.

  • Trained on the HelpSteer3 dataset comprising 38,459 prompts with paired responses
  • Supports major NVIDIA architectures including Ampere, Hopper, and Turing
  • Implements through HuggingFace Transformers library
  • Utilizes bfloat16 precision for optimal performance

Core Capabilities

  • Response Quality Assessment through float score outputs
  • High-performance response selection using Bradley-Terry modeling
  • Integration with Feedback-Edit ITS system
  • Commercial-ready deployment capabilities
  • Extensive context handling up to 128k tokens

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its ability to evaluate and select the most helpful responses using scaled Bradley-Terry modeling, achieving superior performance when combined with the Feedback-Edit ITS approach. It demonstrates 93.4% accuracy on Arena Hard benchmarks as part of the complete system.

Q: What are the recommended use cases?

The model is specifically designed for users seeking to improve performance through Inference-Time-Scaling in general-domain, open-ended tasks. It excels in scenarios requiring high-quality response selection and can be effectively integrated into commercial applications.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.