Smaug-72B-v0.1
Property | Value |
---|---|
Parameter Count | 72.3B |
Model Type | Large Language Model |
Base Model | MoMo-72B-lora-1.8.7-DPO |
License | Tongyi Qianwen License Agreement |
Paper | arXiv:2402.13228 |
What is Smaug-72B-v0.1?
Smaug-72B-v0.1 is a groundbreaking large language model that has achieved first place on the Open LLM Leaderboard by HuggingFace, being the first open-source model to surpass an average score of 80%. Developed by Abacus AI, this model introduces a novel fine-tuning technique called DPO-Positive (DPOP), specifically designed to overcome traditional limitations in preference optimization.
Implementation Details
The model is built upon MoMo-72B-lora-1.8.7-DPO and ultimately based on Qwen-72B. It utilizes BF16 tensor type and implements innovative pairwise preference versions of ARC, HellaSwag, and MetaMath datasets. The DPOP technique addresses the limitation where standard DPO loss can reduce the model's likelihood of preferred examples in certain scenarios.
- Achieves 80.48% average score across major benchmarks
- MT-Bench scores: 8.18 (First Turn), 7.34 (Second Turn), 7.76 (Average)
- Implements contamination detection methodology using Llama7B as reference
Core Capabilities
- Outstanding performance on ARC (76.02%), HellaSwag (89.27%), and MMLU (77.15%)
- Strong truthfulness metrics with TruthfulQA score of 76.67%
- Excellent reasoning capabilities with GSM8K score of 78.70%
- Advanced text generation and complex task handling
Frequently Asked Questions
Q: What makes this model unique?
The model's uniqueness lies in its DPOP training technique, which specifically addresses the limitations of standard DPO training, particularly in scenarios where edit distances between pairs of completions are low. This innovation has led to state-of-the-art performance across various benchmarks.
Q: What are the recommended use cases?
Smaug-72B-v0.1 excels in various applications including complex reasoning, truthful QA, mathematical problem-solving, and general text generation tasks. It's particularly well-suited for applications requiring high accuracy and reliable output.