dolphin-2.8-mistral-7b-v02

Maintained By
cognitivecomputations

Dolphin-2.8-Mistral-7B-v02

PropertyValue
Parameter Count7.24B
Base ModelMistral-7B-v0.2
Context Length32k tokens
LicenseApache 2.0
Training Data7 specialized datasets

What is dolphin-2.8-mistral-7b-v02?

Dolphin-2.8 is an advanced language model developed by Eric Hartford and Cognitive Computations, built upon the Mistral-7B-v0.2 architecture. This model represents a significant evolution in the Dolphin series, featuring enhanced instruction-following capabilities and strong coding performance, achieving a 46.9% pass@1 rate on HumanEval.

Implementation Details

The model underwent a comprehensive 3-day training process on 10x L40S hardware provided by Crusoe Cloud. It utilizes a 32k context window and was fine-tuned with 16k sequence lengths, implementing advanced features like gradient checkpointing and flash attention for optimal performance.

  • BF16 tensor format for efficient computation
  • Trained using the Axolotl framework (v0.4.0)
  • Implements ChaML chat template
  • Utilizes adamw_bnb_8bit optimizer with cosine learning rate scheduling

Core Capabilities

  • Strong performance on multiple benchmarks (MMLU: 61.2%, ARC: 59.2%)
  • Advanced coding and instruction-following abilities
  • Uncensored responses with high compliance
  • Extended context handling (32k tokens)
  • Multi-turn conversation support

Frequently Asked Questions

Q: What makes this model unique?

This model combines the advanced architecture of Mistral-7B-v0.2 with comprehensive training on seven specialized datasets, resulting in strong performance across various tasks while maintaining an uncensored approach to responses.

Q: What are the recommended use cases?

The model excels in coding tasks, instruction-following scenarios, and general conversation. However, due to its uncensored nature, implementing appropriate safety layers is recommended before deployment in production environments.

The first platform built for prompt engineering