DeepSeek-V2-Lite
Property | Value |
---|---|
Total Parameters | 15.7B |
Active Parameters | 2.4B |
Context Length | 32k tokens |
License | DeepSeek Model License |
Paper | arXiv:2405.04434 |
What is DeepSeek-V2-Lite?
DeepSeek-V2-Lite is an innovative Mixture-of-Experts (MoE) language model that represents a significant advancement in efficient AI model design. Trained on 5.7T tokens from scratch, it achieves remarkable performance while maintaining deployment feasibility on a single 40GB GPU. The model utilizes Multi-head Latent Attention (MLA) and DeepSeekMoE architecture to deliver superior performance with economical resource usage.
Implementation Details
The model features 27 layers with a hidden dimension of 2048 and employs 16 attention heads. Its architecture includes unique elements like KV compression dimension of 512 and a mixture of 2 shared experts and 64 routed experts in MoE layers, with 6 experts activated per token.
- Efficient inference through MLA architecture
- DeepSeekMoE implementation for optimal resource utilization
- BF16 format for balanced precision and performance
- 32k context length support
Core Capabilities
- Strong performance on English benchmarks (MMLU: 58.3, BBH: 44.1)
- Excellence in Chinese tasks (C-Eval: 60.3, CMMLU: 64.3)
- Robust coding capabilities (HumanEval: 29.9, MBPP: 43.2)
- Advanced mathematical reasoning (GSM8K: 41.1, Math: 17.1)
Frequently Asked Questions
Q: What makes this model unique?
DeepSeek-V2-Lite's uniqueness lies in its efficient MoE architecture that achieves high performance with only 2.4B active parameters, making it deployable on a single GPU while outperforming larger dense models.
Q: What are the recommended use cases?
The model excels in various applications including multilingual text processing, coding tasks, and mathematical reasoning. It's particularly suitable for deployment scenarios where resource efficiency is crucial while maintaining high performance standards.