Smaug-72B-v0.1

Maintained By
abacusai

Smaug-72B-v0.1

PropertyValue
Parameter Count72.3B
Model TypeLarge Language Model
Base ModelMoMo-72B-lora-1.8.7-DPO
LicenseTongyi Qianwen License Agreement
PaperarXiv:2402.13228

What is Smaug-72B-v0.1?

Smaug-72B-v0.1 is a groundbreaking large language model that has achieved first place on the Open LLM Leaderboard by HuggingFace, being the first open-source model to surpass an average score of 80%. Developed by Abacus AI, this model introduces a novel fine-tuning technique called DPO-Positive (DPOP), specifically designed to overcome traditional limitations in preference optimization.

Implementation Details

The model is built upon MoMo-72B-lora-1.8.7-DPO and ultimately based on Qwen-72B. It utilizes BF16 tensor type and implements innovative pairwise preference versions of ARC, HellaSwag, and MetaMath datasets. The DPOP technique addresses the limitation where standard DPO loss can reduce the model's likelihood of preferred examples in certain scenarios.

  • Achieves 80.48% average score across major benchmarks
  • MT-Bench scores: 8.18 (First Turn), 7.34 (Second Turn), 7.76 (Average)
  • Implements contamination detection methodology using Llama7B as reference

Core Capabilities

  • Outstanding performance on ARC (76.02%), HellaSwag (89.27%), and MMLU (77.15%)
  • Strong truthfulness metrics with TruthfulQA score of 76.67%
  • Excellent reasoning capabilities with GSM8K score of 78.70%
  • Advanced text generation and complex task handling

Frequently Asked Questions

Q: What makes this model unique?

The model's uniqueness lies in its DPOP training technique, which specifically addresses the limitations of standard DPO training, particularly in scenarios where edit distances between pairs of completions are low. This innovation has led to state-of-the-art performance across various benchmarks.

Q: What are the recommended use cases?

Smaug-72B-v0.1 excels in various applications including complex reasoning, truthful QA, mathematical problem-solving, and general text generation tasks. It's particularly well-suited for applications requiring high accuracy and reliable output.

The first platform built for prompt engineering