Qwen2-0.5B

Property	Value
Parameter Count	494M
License	Apache 2.0
Format	BF16
Architecture	Transformer with SwiGLU

What is Qwen2-0.5B?

Qwen2-0.5B is part of the new Qwen2 series of large language models, representing a compact yet powerful 494M parameter base model. It features advanced architecture improvements including SwiGLU activation, attention QKV bias, and group query attention, making it particularly suitable for various natural language processing tasks.

Implementation Details

The model is built on the Transformer architecture with several key technical enhancements. It requires transformers>=4.37.0 for proper functionality and features an improved tokenizer that's adaptive to multiple natural languages and code processing.

Advanced Transformer architecture with SwiGLU activation
Group query attention mechanism
Attention QKV bias implementation
Multi-language and code-adaptive tokenizer

Core Capabilities

Strong performance on MMLU (45.4% accuracy)
Competitive results on mathematical reasoning (GSM8K: 36.5%)
Robust Chinese language understanding (C-Eval: 58.2%)
Code generation capabilities (HumanEval: 22.0%)

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its efficient architecture and strong performance despite its relatively small size, particularly in multilingual tasks and fundamental reasoning capabilities. It achieves impressive results compared to larger models in its class.

Q: What are the recommended use cases?

The model is primarily designed as a base language model for further fine-tuning. It's not recommended for direct text generation but rather serves as a foundation for post-training applications such as SFT, RLHF, or continued pretraining.

Qwen2-0.5B

Qwen2-0.5B

What is Qwen2-0.5B?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models