dolphin-2.6-mistral-7b-dpo

Maintained By
cognitivecomputations

Dolphin-2.6-Mistral-7B-DPO

PropertyValue
Parameter Count7.24B
LicenseApache-2.0
Context Length16k tokens
Training Data8 specialized datasets
Average Benchmark Score67.20%

What is dolphin-2.6-mistral-7b-dpo?

Dolphin-2.6-Mistral-7B-DPO is an advanced language model built on the Mistral-7B architecture and enhanced through Direct Preference Optimization (DPO). This model represents a significant advancement in instruction-following and coding capabilities, trained on a diverse set of high-quality datasets including Magicoder, OpenHermes, and specialized coding instructions.

Implementation Details

The model was trained over 3 epochs using 4 A100 GPUs, implementing full weights fine-tuning via the Axolotl framework. It utilizes the ChatML prompt format and supports 16k context length, making it suitable for extended conversations and complex coding tasks.

  • Advanced DPO tuning using the ultrafeedback-binarized-preferences-cleaned dataset
  • Benchmark performance: 85.48% on HellaSwag, 63.24% on MMLU, 48.75% on GSM8k
  • Specialized training for enhanced coding capabilities
  • Implements BF16 tensor type for optimal performance

Core Capabilities

  • Superior coding assistance and generation
  • High compliance with user instructions
  • Extended context handling (16k tokens)
  • Strong performance in reasoning tasks (65.61% on AI2 Reasoning Challenge)
  • Enhanced truthfulness (61.47% on TruthfulQA)

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its combination of strong coding capabilities, instruction-following behavior, and DPO optimization. It achieves this while maintaining high performance across various benchmarks and supporting an extended 16k context window.

Q: What are the recommended use cases?

The model excels in coding tasks, general instruction-following, and complex reasoning scenarios. It's particularly well-suited for software development assistance, technical writing, and detailed analytical tasks.

The first platform built for prompt engineering