Smaug-34B-v0.1
Property | Value |
---|---|
Parameter Count | 34.4B |
Base Model | bagel-34b-v0.2 |
License | Apache 2.0 |
Paper | arXiv:2402.13228 |
Tensor Type | BF16 |
What is Smaug-34B-v0.1?
Smaug-34B-v0.1 is an advanced language model that introduces a revolutionary fine-tuning technique called DPO-Positive (DPOP). Built upon the foundation of the Bagel-34B model, it represents a significant advancement in preference optimization and performance across various benchmarks, achieving an impressive 77.29% average score across key evaluations.
Implementation Details
The model employs a novel training approach that addresses traditional DPO limitations, particularly in scenarios where edit distances between completion pairs are minimal. Through the innovative DPOP technique, Smaug-34B-v0.1 maintains high performance while avoiding the typical pitfalls of preference optimization.
- Utilizes new pairwise preference versions of ARC, HellaSwag, and MetaMath datasets
- Implements BF16 tensor format for efficient computation
- Achieves state-of-the-art performance: 74.23% on ARC, 86.76% on HellaSwag, 76.66% on MMLU
Core Capabilities
- Enhanced mathematical reasoning with 72.18% accuracy on GSM8K
- Strong performance in truthfulness evaluation (70.22% on TruthfulQA)
- Exceptional common-sense reasoning with 83.66% on Winogrande
- Minimal contamination across benchmark datasets
Frequently Asked Questions
Q: What makes this model unique?
The model's distinctive feature is its DPO-Positive training approach, which specifically addresses the limitations of standard DPO in scenarios with low edit distances between completion pairs. This innovation enables better performance across various tasks while maintaining the quality of preferred examples.
Q: What are the recommended use cases?
Smaug-34B-v0.1 excels in mathematical reasoning, truthfulness assessment, and common-sense understanding tasks. It's particularly suitable for applications requiring precise reasoning and accurate content generation, especially in scenarios where maintaining consistent quality across varied inputs is crucial.