DeepSeek-V2

Maintained By
deepseek-ai

DeepSeek-V2

PropertyValue
Total Parameters236B
Active Parameters21B per token
Context Length128k tokens
LicenseDeepSeek Model License
PaperarXiv:2405.04434

What is DeepSeek-V2?

DeepSeek-V2 represents a significant advancement in Mixture-of-Experts (MoE) language models, combining economic efficiency with powerful performance. Trained on 8.1 trillion tokens, this model introduces innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE, achieving superior results while reducing training costs by 42.5% and KV cache by 93.3% compared to its predecessors.

Implementation Details

The model employs a sophisticated architecture featuring MLA for attention mechanisms and DeepSeekMoE for Feed-Forward Networks. This design enables efficient inference while maintaining high performance across various tasks.

  • BF16 precision format
  • Requires 80GB*8 GPUs for inference
  • Supports both completion and chat interfaces
  • Compatible with Hugging Face Transformers and vLLM

Core Capabilities

  • Strong performance on MMLU (78.5%) and BBH (78.9%)
  • Exceptional Chinese language understanding (C-Eval: 81.7%, CMMLU: 84.0%)
  • Robust coding capabilities (HumanEval: 48.8%, MBPP: 66.6%)
  • Advanced mathematical reasoning (GSM8K: 79.2%, Math: 43.6%)

Frequently Asked Questions

Q: What makes this model unique?

DeepSeek-V2's uniqueness lies in its efficient MoE architecture that activates only 21B parameters per token while maintaining the power of a 236B parameter model, offering an optimal balance between performance and resource utilization.

Q: What are the recommended use cases?

The model excels in various applications including general text generation, code development, mathematical problem-solving, and multilingual tasks, with particular strength in Chinese language processing.

The first platform built for prompt engineering