Dolphin 2.9.3 Mistral Nemo 12B

Property	Value
Parameter Count	12.2B
Context Length	128K tokens
License	Apache 2.0
Base Model	Mistral-Nemo-Base-2407

What is dolphin-2.9.3-mistral-nemo-12b?

Dolphin 2.9.3 is an advanced language model built on the Mistral-Nemo architecture, developed by Cognitive Computations. This model represents a significant advancement in conversational AI, combining powerful instruction-following capabilities with extensive coding abilities and function calling support. The model was trained using the ChatML format and features an impressive 128K context window.

Implementation Details

The model utilizes a sophisticated training approach with BF16 precision and implements various optimization techniques including gradient checkpointing and flash attention. It was trained on 8 diverse datasets covering instruction following, coding, and mathematical reasoning. The training process employed a cosine learning rate scheduler with careful parameter unfreezing strategies to maintain model quality.

Trained using Axolotl framework version 0.4.1
Implements ChatML prompt template format
Uses advanced training optimizations including flash attention and gradient checkpointing
Trained with sequence length of 8192 tokens

Core Capabilities

Advanced instruction following and conversational abilities
Robust coding and programming support
Function calling capabilities
Mathematical reasoning and problem-solving
Multilingual support
Extensive context understanding (128K tokens)

Frequently Asked Questions

Q: What makes this model unique?

The model's unique strength lies in its combination of being uncensored while maintaining high performance across various tasks. It achieves notable scores in benchmarks like BBH (0.5549 acc_norm) and features extensive context length support, making it particularly suitable for complex, long-form interactions.

Q: What are the recommended use cases?

The model excels in coding tasks, instruction following, mathematical problem-solving, and general conversation. However, users should note that it's an uncensored model, requiring implementation of appropriate alignment layers for production use.