Dolphin-2.8-Mistral-7B-v02
Property | Value |
---|---|
Parameter Count | 7.24B |
Base Model | Mistral-7B-v0.2 |
Context Length | 32k tokens |
License | Apache 2.0 |
Training Data | 7 specialized datasets |
What is dolphin-2.8-mistral-7b-v02?
Dolphin-2.8 is an advanced language model developed by Eric Hartford and Cognitive Computations, built upon the Mistral-7B-v0.2 architecture. This model represents a significant evolution in the Dolphin series, featuring enhanced instruction-following capabilities and strong coding performance, achieving a 46.9% pass@1 rate on HumanEval.
Implementation Details
The model underwent a comprehensive 3-day training process on 10x L40S hardware provided by Crusoe Cloud. It utilizes a 32k context window and was fine-tuned with 16k sequence lengths, implementing advanced features like gradient checkpointing and flash attention for optimal performance.
- BF16 tensor format for efficient computation
- Trained using the Axolotl framework (v0.4.0)
- Implements ChaML chat template
- Utilizes adamw_bnb_8bit optimizer with cosine learning rate scheduling
Core Capabilities
- Strong performance on multiple benchmarks (MMLU: 61.2%, ARC: 59.2%)
- Advanced coding and instruction-following abilities
- Uncensored responses with high compliance
- Extended context handling (32k tokens)
- Multi-turn conversation support
Frequently Asked Questions
Q: What makes this model unique?
This model combines the advanced architecture of Mistral-7B-v0.2 with comprehensive training on seven specialized datasets, resulting in strong performance across various tasks while maintaining an uncensored approach to responses.
Q: What are the recommended use cases?
The model excels in coding tasks, instruction-following scenarios, and general conversation. However, due to its uncensored nature, implementing appropriate safety layers is recommended before deployment in production environments.