Qwen2.5-Coder-1.5B

Maintained By
Qwen

Qwen2.5-Coder-1.5B

PropertyValue
Parameter Count1.54B
LicenseApache 2.0
Context Length32,768 tokens
ArchitectureTransformers with RoPE, SwiGLU, RMSNorm
Research PaperarXiv:2409.12186

What is Qwen2.5-Coder-1.5B?

Qwen2.5-Coder-1.5B is part of the latest series of Code-Specific Qwen large language models, specifically designed for code-related tasks. Built on the foundation of Qwen2.5, this model represents a significant advancement in code generation, reasoning, and fixing capabilities.

Implementation Details

The model features a sophisticated architecture with 28 layers and uses Group Query Attention (GQA) with 12 heads for queries and 2 for key-values. It's trained on 5.5 trillion tokens including source code, text-code grounding, and synthetic data.

  • Full 32K token context window
  • Transformers architecture with RoPE, SwiGLU, and RMSNorm
  • 1.54B total parameters (1.31B non-embedding)
  • BF16 tensor type for efficient computation

Core Capabilities

  • Advanced code generation and completion
  • Sophisticated code reasoning and analysis
  • Code fixing and debugging support
  • Strong foundation for Code Agents
  • Mathematical reasoning capabilities

Frequently Asked Questions

Q: What makes this model unique?

This model combines efficient size with extensive capabilities, featuring a full 32K context window and specialized code understanding abilities, making it particularly suitable for developers who need a balance between performance and resource efficiency.

Q: What are the recommended use cases?

While it's not recommended for direct conversations, the model excels in code-related tasks and can be enhanced through post-training methods like SFT, RLHF, or continued pretraining for specific applications.

The first platform built for prompt engineering