Qwen1.5-0.5B
Property | Value |
---|---|
Parameter Count | 620M |
Model Type | Transformer-based decoder-only |
License | tongyi-qianwen-research |
Paper | arXiv:2309.16609 |
Tensor Type | BF16 |
What is Qwen1.5-0.5B?
Qwen1.5-0.5B is the beta version of Qwen2, representing the smallest variant in a series of transformer-based decoder-only language models. As part of the Qwen1.5 family, it features 620M parameters and incorporates significant improvements over its predecessors, including stable 32K context length support and enhanced multilingual capabilities.
Implementation Details
The model is built on the Transformer architecture and incorporates several advanced features:
- SwiGLU activation function for improved performance
- Attention QKV bias implementation
- Group query attention mechanisms
- Improved tokenizer for multiple natural languages and code processing
- Requires transformers >= 4.37.0 for proper functionality
Core Capabilities
- 32K context length support across all model sizes
- Multilingual text processing
- Base model functionality for further fine-tuning
- Efficient processing with BF16 tensor type
- Optimized for post-training applications (SFT, RLHF, continued pretraining)
Frequently Asked Questions
Q: What makes this model unique?
This model represents a significant evolution in the Qwen series, offering stable 32K context length support and enhanced multilingual capabilities in a compact 620M parameter package, making it ideal for resource-conscious applications requiring robust language understanding.
Q: What are the recommended use cases?
The model is not recommended for direct text generation. Instead, it serves as an excellent foundation for post-training applications such as supervised fine-tuning (SFT), reinforcement learning from human feedback (RLHF), or continued pretraining for specific use cases.