Dolphin-2.6-Mistral-7B-DPO

Property	Value
Parameter Count	7.24B
License	Apache-2.0
Context Length	16k tokens
Training Data	8 specialized datasets
Average Benchmark Score	67.20%

What is dolphin-2.6-mistral-7b-dpo?

Dolphin-2.6-Mistral-7B-DPO is an advanced language model built on the Mistral-7B architecture and enhanced through Direct Preference Optimization (DPO). This model represents a significant advancement in instruction-following and coding capabilities, trained on a diverse set of high-quality datasets including Magicoder, OpenHermes, and specialized coding instructions.

Implementation Details

The model was trained over 3 epochs using 4 A100 GPUs, implementing full weights fine-tuning via the Axolotl framework. It utilizes the ChatML prompt format and supports 16k context length, making it suitable for extended conversations and complex coding tasks.

Advanced DPO tuning using the ultrafeedback-binarized-preferences-cleaned dataset
Benchmark performance: 85.48% on HellaSwag, 63.24% on MMLU, 48.75% on GSM8k
Specialized training for enhanced coding capabilities
Implements BF16 tensor type for optimal performance

Core Capabilities

Superior coding assistance and generation
High compliance with user instructions
Extended context handling (16k tokens)
Strong performance in reasoning tasks (65.61% on AI2 Reasoning Challenge)
Enhanced truthfulness (61.47% on TruthfulQA)

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its combination of strong coding capabilities, instruction-following behavior, and DPO optimization. It achieves this while maintaining high performance across various benchmarks and supporting an extended 16k context window.

Q: What are the recommended use cases?

The model excels in coding tasks, general instruction-following, and complex reasoning scenarios. It's particularly well-suited for software development assistance, technical writing, and detailed analytical tasks.