Dolphin-2.6-Mistral-7B-DPO
Property | Value |
---|---|
Parameter Count | 7.24B |
License | Apache-2.0 |
Context Length | 16k tokens |
Training Data | 8 specialized datasets |
Average Benchmark Score | 67.20% |
What is dolphin-2.6-mistral-7b-dpo?
Dolphin-2.6-Mistral-7B-DPO is an advanced language model built on the Mistral-7B architecture and enhanced through Direct Preference Optimization (DPO). This model represents a significant advancement in instruction-following and coding capabilities, trained on a diverse set of high-quality datasets including Magicoder, OpenHermes, and specialized coding instructions.
Implementation Details
The model was trained over 3 epochs using 4 A100 GPUs, implementing full weights fine-tuning via the Axolotl framework. It utilizes the ChatML prompt format and supports 16k context length, making it suitable for extended conversations and complex coding tasks.
- Advanced DPO tuning using the ultrafeedback-binarized-preferences-cleaned dataset
- Benchmark performance: 85.48% on HellaSwag, 63.24% on MMLU, 48.75% on GSM8k
- Specialized training for enhanced coding capabilities
- Implements BF16 tensor type for optimal performance
Core Capabilities
- Superior coding assistance and generation
- High compliance with user instructions
- Extended context handling (16k tokens)
- Strong performance in reasoning tasks (65.61% on AI2 Reasoning Challenge)
- Enhanced truthfulness (61.47% on TruthfulQA)
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its combination of strong coding capabilities, instruction-following behavior, and DPO optimization. It achieves this while maintaining high performance across various benchmarks and supporting an extended 16k context window.
Q: What are the recommended use cases?
The model excels in coding tasks, general instruction-following, and complex reasoning scenarios. It's particularly well-suited for software development assistance, technical writing, and detailed analytical tasks.