Qwen2-0.5B
Property | Value |
---|---|
Parameter Count | 494M |
License | Apache 2.0 |
Format | BF16 |
Architecture | Transformer with SwiGLU |
What is Qwen2-0.5B?
Qwen2-0.5B is part of the new Qwen2 series of large language models, representing a compact yet powerful 494M parameter base model. It features advanced architecture improvements including SwiGLU activation, attention QKV bias, and group query attention, making it particularly suitable for various natural language processing tasks.
Implementation Details
The model is built on the Transformer architecture with several key technical enhancements. It requires transformers>=4.37.0 for proper functionality and features an improved tokenizer that's adaptive to multiple natural languages and code processing.
- Advanced Transformer architecture with SwiGLU activation
- Group query attention mechanism
- Attention QKV bias implementation
- Multi-language and code-adaptive tokenizer
Core Capabilities
- Strong performance on MMLU (45.4% accuracy)
- Competitive results on mathematical reasoning (GSM8K: 36.5%)
- Robust Chinese language understanding (C-Eval: 58.2%)
- Code generation capabilities (HumanEval: 22.0%)
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its efficient architecture and strong performance despite its relatively small size, particularly in multilingual tasks and fundamental reasoning capabilities. It achieves impressive results compared to larger models in its class.
Q: What are the recommended use cases?
The model is primarily designed as a base language model for further fine-tuning. It's not recommended for direct text generation but rather serves as a foundation for post-training applications such as SFT, RLHF, or continued pretraining.