Qwen-14B

Maintained By
Qwen

Qwen-14B

PropertyValue
Parameter Count14.2B
Architecture40 layers, 40 heads, 5120 d_model
Context Length2048 tokens (expandable to 8K+)
PaperarXiv:2309.16609
Training Data3T+ tokens across web text, books, code

What is Qwen-14B?

Qwen-14B is a large language model developed by Alibaba Cloud that represents a significant advancement in multilingual AI capabilities. Built on the Transformer architecture, it features 14.2 billion parameters and demonstrates exceptional performance across various tasks including reasoning, coding, and mathematical problem-solving.

Implementation Details

The model utilizes state-of-the-art architectural choices including RoPE position encoding, SwiGLU activation functions, and RMSNorm. It employs a unique tokenizer with over 150K tokens optimized for multiple languages, especially efficient for Chinese, English, and code processing.

  • Advanced tokenization system based on tiktoken with multilingual optimization
  • Supports context length extension using NTK interpolation and LogN attention scaling
  • Trained on diverse high-quality data including web text, books, code, and domain-specific content

Core Capabilities

  • Strong performance on benchmarks like MMLU (66.3%), C-Eval (72.1%), and GSM8K (61.3%)
  • Exceptional code generation abilities with 32.3% success on HumanEval
  • Robust mathematical reasoning with 24.8% accuracy on MATH benchmark
  • Enhanced multilingual support with efficient compression rates for various languages

Frequently Asked Questions

Q: What makes this model unique?

Qwen-14B stands out for its comprehensive multilingual vocabulary, strong performance across diverse tasks, and ability to handle extended context lengths efficiently. Its optimized tokenizer makes it particularly effective for Asian languages while maintaining strong capabilities in English and code.

Q: What are the recommended use cases?

The model excels in various applications including multilingual text generation, code development, mathematical problem-solving, and general knowledge tasks. It's particularly suitable for applications requiring strong reasoning capabilities or multilingual support.

The first platform built for prompt engineering