Dolphin 2.9.2 Qwen2 72B
Property | Value |
---|---|
Parameter Count | 72.7B |
Context Length | 128,000 tokens |
Training Length | 8,192 tokens |
License | tongyi-qianwen |
Model Type | Decoder-only Transformer |
What is dolphin-2.9.2-qwen2-72b?
Dolphin 2.9.2 Qwen2 72B is a large language model developed by Cognitive Computations, based on the Qwen2-72B architecture. It represents a significant advancement in conversational AI, featuring full-weight fine-tuning and utilizing the ChatML prompt format. The model was trained using carefully selected parameters identified by the Laser Scanner tool, resulting in enhanced performance across various tasks.
Implementation Details
The model implements a sophisticated architecture with 72.7B parameters, trained using BF16 precision. It supports an impressive 128k context window, with training conducted using 8k sequence lengths. The implementation uses the ChatML template format for consistent interaction patterns.
- Utilizes advanced parameter selection via Laser Scanner
- Implements full-weight fine-tuning methodology
- Features gradient checkpointing and flash attention
- Trained on 8 diverse datasets including OpenHermes-2.5 and Dolphin Coder
Core Capabilities
- Strong performance in instruction following and conversation
- Advanced coding capabilities
- Function calling support
- Initial agentic abilities
- Benchmark scores: 40.38% on IFEval, 47.7% on BBH, 49.52% on MMLU-PRO
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its uncensored nature and high compliance, combined with strong performance across various benchmarks. It features a significantly large context window of 128k tokens, making it suitable for processing lengthy documents and conversations.
Q: What are the recommended use cases?
The model excels in conversational AI applications, coding tasks, and function calling scenarios. It's particularly well-suited for applications requiring long context understanding and complex instruction following. However, users should implement their own alignment layer before deploying it as a service.