Skywork-Reward-Llama-3.1-8B-v0.2

Property	Value
Parameter Count	7.5B
Model Type	Text Classification
Architecture	Llama-3.1-based Reward Model
Paper	Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs
License	Skywork Community License

What is Skywork-Reward-Llama-3.1-8B-v0.2?

Skywork-Reward-Llama-3.1-8B-v0.2 is an advanced reward model built on the Llama-3.1-8B-Instruct architecture. It's designed to evaluate and score text responses, trained on a carefully curated dataset of 80K high-quality preference pairs. The model currently ranks first among 8B models on the RewardBench leaderboard, demonstrating exceptional performance in various domains including chat, safety, and reasoning tasks.

Implementation Details

The model utilizes a BF16 tensor type and implements sequence classification for reward modeling. It's trained on the Skywork Reward Data Collection, which combines data from multiple high-quality sources including HelpSteer2, OffsetBias, WildGuard, and the Magpie DPO series.

Achieves 93.1 overall score on RewardBench
Optimized for flash attention 2 implementation
Supports both inference and training workflows
Implements sophisticated preference learning techniques

Core Capabilities

Chat evaluation (94.7% accuracy)
Safety assessment (92.7% accuracy)
Complex reasoning tasks (96.7% accuracy)
Mathematical and coding response evaluation
Handling challenging preference pairs

Frequently Asked Questions

Q: What makes this model unique?

The model achieves state-of-the-art performance using only 80K carefully curated training samples, demonstrating that high-quality data curation can outperform larger datasets. It's specifically designed to handle complex preference scenarios across multiple domains.

Q: What are the recommended use cases?

The model is ideal for evaluating text generation quality, scoring chat responses, assessing safety compliance, and rating mathematical and coding solutions. It's particularly effective for developing and fine-tuning language models through reinforcement learning.