Skywork-Reward-Gemma-2-27B-v0.2

Property	Value
Parameter Count	27.2B
Model Type	Sequence Classification
Base Architecture	google/gemma-2-27b-it
Paper	Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs
License	Skywork Community License

What is Skywork-Reward-Gemma-2-27B-v0.2?

Skywork-Reward-Gemma-2-27B-v0.2 is an advanced reward model that currently ranks first on the RewardBench leaderboard. Built on the Gemma-2-27b-it architecture, it was trained using a carefully curated dataset of 80K high-quality preference pairs from public sources. The model demonstrates exceptional capability in handling complex preference scenarios across various domains including mathematics, coding, and safety.

Implementation Details

The model utilizes BF16 precision and requires either flash_attention_2 or eager implementation for optimal performance. It was trained on the Skywork Reward Data Collection, which combines data from multiple sources including HelpSteer2, OffsetBias, WildGuard, and the Magpie DPO series.

Achieves 94.3 overall score on RewardBench
Implements sophisticated data curation techniques
Leverages flash attention for enhanced performance
Supports both commercial and community use under specific licensing

Core Capabilities

Superior performance in chat evaluation (96.1 score)
Excellent reasoning capabilities (98.1 score)
Strong safety alignment (93.0 score)
Robust handling of complex preference pairs
Efficient operation with proper attention implementation

Frequently Asked Questions

Q: What makes this model unique?

The model achieves state-of-the-art performance on RewardBench using only 80K carefully curated training pairs, demonstrating that high-quality data curation can be more important than dataset size. It's particularly notable for maintaining balanced performance across different domains.

Q: What are the recommended use cases?

The model is ideal for evaluating and ranking AI-generated responses, particularly in scenarios involving mathematical reasoning, coding tasks, and safety-critical applications. It's specifically designed to help determine which of two possible responses better addresses a given prompt.