DeepSeek-V2-Lite

Property	Value
Total Parameters	15.7B
Active Parameters	2.4B
Context Length	32k tokens
License	DeepSeek Model License
Paper	arXiv:2405.04434

What is DeepSeek-V2-Lite?

DeepSeek-V2-Lite is an innovative Mixture-of-Experts (MoE) language model that represents a significant advancement in efficient AI model design. Trained on 5.7T tokens from scratch, it achieves remarkable performance while maintaining deployment feasibility on a single 40GB GPU. The model utilizes Multi-head Latent Attention (MLA) and DeepSeekMoE architecture to deliver superior performance with economical resource usage.

Implementation Details

The model features 27 layers with a hidden dimension of 2048 and employs 16 attention heads. Its architecture includes unique elements like KV compression dimension of 512 and a mixture of 2 shared experts and 64 routed experts in MoE layers, with 6 experts activated per token.

Efficient inference through MLA architecture
DeepSeekMoE implementation for optimal resource utilization
BF16 format for balanced precision and performance
32k context length support

Core Capabilities

Strong performance on English benchmarks (MMLU: 58.3, BBH: 44.1)
Excellence in Chinese tasks (C-Eval: 60.3, CMMLU: 64.3)
Robust coding capabilities (HumanEval: 29.9, MBPP: 43.2)
Advanced mathematical reasoning (GSM8K: 41.1, Math: 17.1)

Frequently Asked Questions

Q: What makes this model unique?

DeepSeek-V2-Lite's uniqueness lies in its efficient MoE architecture that achieves high performance with only 2.4B active parameters, making it deployable on a single GPU while outperforming larger dense models.

Q: What are the recommended use cases?

The model excels in various applications including multilingual text processing, coding tasks, and mathematical reasoning. It's particularly suitable for deployment scenarios where resource efficiency is crucial while maintaining high performance standards.

DeepSeek-V2-Lite

DeepSeek-V2-Lite

What is DeepSeek-V2-Lite?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models

The first platform built for prompt engineering