Qwen2-0.5B

Maintained By
Qwen

Qwen2-0.5B

PropertyValue
Parameter Count494M
LicenseApache 2.0
FormatBF16
ArchitectureTransformer with SwiGLU

What is Qwen2-0.5B?

Qwen2-0.5B is part of the new Qwen2 series of large language models, representing a compact yet powerful 494M parameter base model. It features advanced architecture improvements including SwiGLU activation, attention QKV bias, and group query attention, making it particularly suitable for various natural language processing tasks.

Implementation Details

The model is built on the Transformer architecture with several key technical enhancements. It requires transformers>=4.37.0 for proper functionality and features an improved tokenizer that's adaptive to multiple natural languages and code processing.

  • Advanced Transformer architecture with SwiGLU activation
  • Group query attention mechanism
  • Attention QKV bias implementation
  • Multi-language and code-adaptive tokenizer

Core Capabilities

  • Strong performance on MMLU (45.4% accuracy)
  • Competitive results on mathematical reasoning (GSM8K: 36.5%)
  • Robust Chinese language understanding (C-Eval: 58.2%)
  • Code generation capabilities (HumanEval: 22.0%)

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its efficient architecture and strong performance despite its relatively small size, particularly in multilingual tasks and fundamental reasoning capabilities. It achieves impressive results compared to larger models in its class.

Q: What are the recommended use cases?

The model is primarily designed as a base language model for further fine-tuning. It's not recommended for direct text generation but rather serves as a foundation for post-training applications such as SFT, RLHF, or continued pretraining.

The first platform built for prompt engineering