Qwen2-1.5B-Instruct

Property	Value
Parameter Count	1.54B
License	Apache 2.0
Tensor Type	BF16
Downloads	150,581

What is Qwen2-1.5B-Instruct?

Qwen2-1.5B-Instruct is part of the new Qwen2 series of large language models, representing a significant advancement in AI language processing. This instruction-tuned model, with 1.54 billion parameters, demonstrates impressive capabilities across various benchmarks and shows competitive performance against both open-source and proprietary models.

Implementation Details

The model is built on the Transformer architecture with several key optimizations including SwiGLU activation, attention QKV bias, and group query attention. It requires transformers>=4.37.0 for proper implementation and features an improved tokenizer optimized for multiple natural languages and code processing.

Advanced Transformer architecture with SwiGLU activation
Improved multilingual tokenizer
Trained through supervised finetuning and direct preference optimization
Supports both CPU and GPU deployment with automatic device mapping

Core Capabilities

Strong performance in MMLU (52.4%) showing excellent language understanding
Impressive coding capabilities with 37.8% on HumanEval
Mathematical reasoning with 61.6% accuracy on GSM8K
Multilingual proficiency demonstrated by 63.8% on C-Eval
Advanced instruction following with 29.0% on IFEval

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its significant improvements over its predecessor, Qwen1.5-1.8B-Chat, showing substantial gains across all benchmarks while maintaining a relatively compact parameter count. It particularly excels in mathematical reasoning and coding tasks.

Q: What are the recommended use cases?

This model is well-suited for a wide range of applications including code generation, mathematical problem-solving, multilingual text processing, and general language understanding tasks. It's particularly effective for scenarios requiring precise instruction following and reasoning capabilities.