Qwen2-1.5B-Instruct
Property | Value |
---|---|
Parameter Count | 1.54B |
License | Apache 2.0 |
Tensor Type | BF16 |
Downloads | 150,581 |
What is Qwen2-1.5B-Instruct?
Qwen2-1.5B-Instruct is part of the new Qwen2 series of large language models, representing a significant advancement in AI language processing. This instruction-tuned model, with 1.54 billion parameters, demonstrates impressive capabilities across various benchmarks and shows competitive performance against both open-source and proprietary models.
Implementation Details
The model is built on the Transformer architecture with several key optimizations including SwiGLU activation, attention QKV bias, and group query attention. It requires transformers>=4.37.0 for proper implementation and features an improved tokenizer optimized for multiple natural languages and code processing.
- Advanced Transformer architecture with SwiGLU activation
- Improved multilingual tokenizer
- Trained through supervised finetuning and direct preference optimization
- Supports both CPU and GPU deployment with automatic device mapping
Core Capabilities
- Strong performance in MMLU (52.4%) showing excellent language understanding
- Impressive coding capabilities with 37.8% on HumanEval
- Mathematical reasoning with 61.6% accuracy on GSM8K
- Multilingual proficiency demonstrated by 63.8% on C-Eval
- Advanced instruction following with 29.0% on IFEval
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its significant improvements over its predecessor, Qwen1.5-1.8B-Chat, showing substantial gains across all benchmarks while maintaining a relatively compact parameter count. It particularly excels in mathematical reasoning and coding tasks.
Q: What are the recommended use cases?
This model is well-suited for a wide range of applications including code generation, mathematical problem-solving, multilingual text processing, and general language understanding tasks. It's particularly effective for scenarios requiring precise instruction following and reasoning capabilities.