Dolphin-2.8-Mistral-7B-v02

Property	Value
Parameter Count	7.24B
Base Model	Mistral-7B-v0.2
Context Length	32k tokens
License	Apache 2.0
Training Data	7 specialized datasets

What is dolphin-2.8-mistral-7b-v02?

Dolphin-2.8 is an advanced language model developed by Eric Hartford and Cognitive Computations, built upon the Mistral-7B-v0.2 architecture. This model represents a significant evolution in the Dolphin series, featuring enhanced instruction-following capabilities and strong coding performance, achieving a 46.9% pass@1 rate on HumanEval.

Implementation Details

The model underwent a comprehensive 3-day training process on 10x L40S hardware provided by Crusoe Cloud. It utilizes a 32k context window and was fine-tuned with 16k sequence lengths, implementing advanced features like gradient checkpointing and flash attention for optimal performance.

BF16 tensor format for efficient computation
Trained using the Axolotl framework (v0.4.0)
Implements ChaML chat template
Utilizes adamw_bnb_8bit optimizer with cosine learning rate scheduling

Core Capabilities

Strong performance on multiple benchmarks (MMLU: 61.2%, ARC: 59.2%)
Advanced coding and instruction-following abilities
Uncensored responses with high compliance
Extended context handling (32k tokens)
Multi-turn conversation support

Frequently Asked Questions

Q: What makes this model unique?

This model combines the advanced architecture of Mistral-7B-v0.2 with comprehensive training on seven specialized datasets, resulting in strong performance across various tasks while maintaining an uncensored approach to responses.

Q: What are the recommended use cases?

The model excels in coding tasks, instruction-following scenarios, and general conversation. However, due to its uncensored nature, implementing appropriate safety layers is recommended before deployment in production environments.