Skywork-Reward-Llama-3.1-8B-v0.2
Property | Value |
---|---|
Parameter Count | 7.5B |
Model Type | Text Classification |
Architecture | Llama-3.1-based Reward Model |
Paper | Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs |
License | Skywork Community License |
What is Skywork-Reward-Llama-3.1-8B-v0.2?
Skywork-Reward-Llama-3.1-8B-v0.2 is an advanced reward model built on the Llama-3.1-8B-Instruct architecture. It's designed to evaluate and score text responses, trained on a carefully curated dataset of 80K high-quality preference pairs. The model currently ranks first among 8B models on the RewardBench leaderboard, demonstrating exceptional performance in various domains including chat, safety, and reasoning tasks.
Implementation Details
The model utilizes a BF16 tensor type and implements sequence classification for reward modeling. It's trained on the Skywork Reward Data Collection, which combines data from multiple high-quality sources including HelpSteer2, OffsetBias, WildGuard, and the Magpie DPO series.
- Achieves 93.1 overall score on RewardBench
- Optimized for flash attention 2 implementation
- Supports both inference and training workflows
- Implements sophisticated preference learning techniques
Core Capabilities
- Chat evaluation (94.7% accuracy)
- Safety assessment (92.7% accuracy)
- Complex reasoning tasks (96.7% accuracy)
- Mathematical and coding response evaluation
- Handling challenging preference pairs
Frequently Asked Questions
Q: What makes this model unique?
The model achieves state-of-the-art performance using only 80K carefully curated training samples, demonstrating that high-quality data curation can outperform larger datasets. It's specifically designed to handle complex preference scenarios across multiple domains.
Q: What are the recommended use cases?
The model is ideal for evaluating text generation quality, scoring chat responses, assessing safety compliance, and rating mathematical and coding solutions. It's particularly effective for developing and fine-tuning language models through reinforcement learning.