LLaMA-3-8B-SFR-Iterative-DPO-R

Maintained By
Salesforce

LLaMA-3-8B-SFR-Iterative-DPO-R

PropertyValue
Parameter Count8.03B
Model TypeText Generation / Conversational
ArchitectureLLaMA-3 with Iterative DPO
LicenseLLaMA 3
Research PaperRLHF Workflow Paper

What is LLaMA-3-8B-SFR-Iterative-DPO-R?

LLaMA-3-8B-SFR-Iterative-DPO-R is a state-of-the-art instruction-following language model developed by Salesforce that achieves remarkable performance despite its relatively modest size. The model employs an innovative online RLHF (Reinforcement Learning from Human Feedback) approach using iterative DPO (Direct Preference Optimization), enabling it to outperform many larger models including Mixtral-8x7B and even some GPT-3.5 variants.

Implementation Details

The model implements a novel training approach that combines the efficiency of DPO with online learning to address distribution shifts during policy optimization. This implementation is more cost-effective and simpler to train compared to traditional PPO-based approaches while maintaining superior performance.

  • Utilizes BF16 tensor format for efficient computation
  • Achieves 31.3 on Alpaca-Eval-V2, surpassing many larger models
  • Scores 8.46 on MT-Bench, competing with models 5-10x its size
  • Demonstrates strong performance on academic benchmarks like GSM-8K (80.7%) and MMLU (65.3%)

Core Capabilities

  • Advanced instruction following and chat capabilities
  • Strong performance on mathematical reasoning (GSM-8K)
  • Robust general knowledge (MMLU)
  • Competitive coding abilities (HumanEval: 64.6%)
  • Enhanced truthfulness (TruthfulQA: 60.4%)

Frequently Asked Questions

Q: What makes this model unique?

This model's main distinction is its ability to achieve performance comparable to or better than much larger models while using only 8B parameters, thanks to its innovative iterative DPO training approach. It effectively demonstrates that careful training methodology can be more important than raw model size.

Q: What are the recommended use cases?

The model is particularly well-suited for instruction-following tasks, chatbot applications, mathematical reasoning, and general knowledge queries. However, users should be aware of potential limitations regarding safety and ethical considerations, particularly under adversarial conditions.

The first platform built for prompt engineering