Skywork-Reward-Llama-3.1-8B-v0.2

Maintained By
Skywork

Skywork-Reward-Llama-3.1-8B-v0.2

PropertyValue
Parameter Count7.5B
Model TypeText Classification
ArchitectureLlama-3.1-based Reward Model
PaperSkywork-Reward: Bag of Tricks for Reward Modeling in LLMs
LicenseSkywork Community License

What is Skywork-Reward-Llama-3.1-8B-v0.2?

Skywork-Reward-Llama-3.1-8B-v0.2 is an advanced reward model built on the Llama-3.1-8B-Instruct architecture. It's designed to evaluate and score text responses, trained on a carefully curated dataset of 80K high-quality preference pairs. The model currently ranks first among 8B models on the RewardBench leaderboard, demonstrating exceptional performance in various domains including chat, safety, and reasoning tasks.

Implementation Details

The model utilizes a BF16 tensor type and implements sequence classification for reward modeling. It's trained on the Skywork Reward Data Collection, which combines data from multiple high-quality sources including HelpSteer2, OffsetBias, WildGuard, and the Magpie DPO series.

  • Achieves 93.1 overall score on RewardBench
  • Optimized for flash attention 2 implementation
  • Supports both inference and training workflows
  • Implements sophisticated preference learning techniques

Core Capabilities

  • Chat evaluation (94.7% accuracy)
  • Safety assessment (92.7% accuracy)
  • Complex reasoning tasks (96.7% accuracy)
  • Mathematical and coding response evaluation
  • Handling challenging preference pairs

Frequently Asked Questions

Q: What makes this model unique?

The model achieves state-of-the-art performance using only 80K carefully curated training samples, demonstrating that high-quality data curation can outperform larger datasets. It's specifically designed to handle complex preference scenarios across multiple domains.

Q: What are the recommended use cases?

The model is ideal for evaluating text generation quality, scoring chat responses, assessing safety compliance, and rating mathematical and coding solutions. It's particularly effective for developing and fine-tuning language models through reinforcement learning.

The first platform built for prompt engineering