dolphin-2.9.2-qwen2-72b

Maintained By
cognitivecomputations

Dolphin 2.9.2 Qwen2 72B

PropertyValue
Parameter Count72.7B
Context Length128,000 tokens
Training Length8,192 tokens
Licensetongyi-qianwen
Model TypeDecoder-only Transformer

What is dolphin-2.9.2-qwen2-72b?

Dolphin 2.9.2 Qwen2 72B is a large language model developed by Cognitive Computations, based on the Qwen2-72B architecture. It represents a significant advancement in conversational AI, featuring full-weight fine-tuning and utilizing the ChatML prompt format. The model was trained using carefully selected parameters identified by the Laser Scanner tool, resulting in enhanced performance across various tasks.

Implementation Details

The model implements a sophisticated architecture with 72.7B parameters, trained using BF16 precision. It supports an impressive 128k context window, with training conducted using 8k sequence lengths. The implementation uses the ChatML template format for consistent interaction patterns.

  • Utilizes advanced parameter selection via Laser Scanner
  • Implements full-weight fine-tuning methodology
  • Features gradient checkpointing and flash attention
  • Trained on 8 diverse datasets including OpenHermes-2.5 and Dolphin Coder

Core Capabilities

  • Strong performance in instruction following and conversation
  • Advanced coding capabilities
  • Function calling support
  • Initial agentic abilities
  • Benchmark scores: 40.38% on IFEval, 47.7% on BBH, 49.52% on MMLU-PRO

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its uncensored nature and high compliance, combined with strong performance across various benchmarks. It features a significantly large context window of 128k tokens, making it suitable for processing lengthy documents and conversations.

Q: What are the recommended use cases?

The model excels in conversational AI applications, coding tasks, and function calling scenarios. It's particularly well-suited for applications requiring long context understanding and complex instruction following. However, users should implement their own alignment layer before deploying it as a service.

The first platform built for prompt engineering