dolphin-2.9.3-mistral-nemo-12b

Maintained By
cognitivecomputations

Dolphin 2.9.3 Mistral Nemo 12B

PropertyValue
Parameter Count12.2B
Context Length128K tokens
LicenseApache 2.0
Base ModelMistral-Nemo-Base-2407

What is dolphin-2.9.3-mistral-nemo-12b?

Dolphin 2.9.3 is an advanced language model built on the Mistral-Nemo architecture, developed by Cognitive Computations. This model represents a significant advancement in conversational AI, combining powerful instruction-following capabilities with extensive coding abilities and function calling support. The model was trained using the ChatML format and features an impressive 128K context window.

Implementation Details

The model utilizes a sophisticated training approach with BF16 precision and implements various optimization techniques including gradient checkpointing and flash attention. It was trained on 8 diverse datasets covering instruction following, coding, and mathematical reasoning. The training process employed a cosine learning rate scheduler with careful parameter unfreezing strategies to maintain model quality.

  • Trained using Axolotl framework version 0.4.1
  • Implements ChatML prompt template format
  • Uses advanced training optimizations including flash attention and gradient checkpointing
  • Trained with sequence length of 8192 tokens

Core Capabilities

  • Advanced instruction following and conversational abilities
  • Robust coding and programming support
  • Function calling capabilities
  • Mathematical reasoning and problem-solving
  • Multilingual support
  • Extensive context understanding (128K tokens)

Frequently Asked Questions

Q: What makes this model unique?

The model's unique strength lies in its combination of being uncensored while maintaining high performance across various tasks. It achieves notable scores in benchmarks like BBH (0.5549 acc_norm) and features extensive context length support, making it particularly suitable for complex, long-form interactions.

Q: What are the recommended use cases?

The model excels in coding tasks, instruction following, mathematical problem-solving, and general conversation. However, users should note that it's an uncensored model, requiring implementation of appropriate alignment layers for production use.

The first platform built for prompt engineering