Qwen2-0.5B-Instruct

Maintained By
Qwen

Qwen2-0.5B-Instruct

PropertyValue
Parameter Count494M
LicenseApache 2.0
Tensor TypeBF16
ArchitectureTransformer with SwiGLU activation

What is Qwen2-0.5B-Instruct?

Qwen2-0.5B-Instruct is part of the new Qwen2 series of large language models, representing a significant advancement in compact AI models. This instruction-tuned variant contains 494M parameters and demonstrates impressive capabilities across various benchmarks, showing notable improvements over its predecessor Qwen1.5.

Implementation Details

The model is built on the Transformer architecture with several key enhancements, including SwiGLU activation, attention QKV bias, and group query attention. It requires transformers>=4.37.0 for proper implementation and features an improved tokenizer optimized for multiple natural languages and code processing.

  • Supports both supervised finetuning and direct preference optimization
  • Implements advanced attention mechanisms for improved performance
  • Features an adaptive tokenizer for multiple languages

Core Capabilities

  • Significantly improved performance in MMLU (37.9% vs 35.0% in predecessor)
  • Enhanced coding capabilities with 17.1% success in HumanEval
  • Strong mathematical reasoning with 40.1% accuracy in GSM8K
  • Robust multilingual understanding shown in C-Eval (45.2%)

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its impressive performance-to-size ratio, offering significant improvements over its predecessor while maintaining a compact size of 494M parameters. It demonstrates particularly strong capabilities in mathematical reasoning and multilingual tasks.

Q: What are the recommended use cases?

This model is well-suited for general text generation, chat applications, mathematical problem-solving, and basic coding tasks. It's particularly effective for applications requiring a balance between model size and performance.

The first platform built for prompt engineering