Qwen2.5 Bakeneko 32B Instruct V2
Property | Value |
---|---|
Model Type | Instruction-tuned Language Model |
Architecture | 64-layer transformer with 5120 hidden size |
License | Apache License 2.0 |
Release Date | February 19, 2025 |
Authors | Xinqi Chen, Toshiaki Wakatsuki, Kei Sawada |
What is qwen2.5-bakeneko-32b-instruct-v2?
This is an advanced Japanese-focused language model that builds upon the Qwen2.5 architecture, enhanced through a sophisticated two-stage training process. It combines Chat Vector technology with Odds Ratio Preference Optimization (ORPO) to achieve superior instruction-following capabilities, particularly in Japanese language tasks.
Implementation Details
The model employs a unique training approach involving model merging and distillation. It uses Chat Vector addition to enhance instruction-following capabilities, followed by ORPO training on 1.3k carefully curated samples from DeepSeek-R1. The architecture consists of 64 layers with a 5120 hidden size, following the Qwen2.5 framework.
- Achieves state-of-the-art performance on Japanese MT-Bench with scores of 8.86 (first turn) and 8.53 (multi-turn)
- Implements advanced parameter vector manipulation during training
- Utilizes bfloat16 precision for optimal performance
Core Capabilities
- Superior Japanese language understanding and generation
- Enhanced instruction-following abilities
- Strong reasoning capabilities without additional reasoning processes
- Excellent performance in both single-turn and multi-turn conversations
- Optimized for practical applications through ORPO training
Frequently Asked Questions
Q: What makes this model unique?
The model's distinctive feature is its combination of Chat Vector technology and ORPO optimization, resulting in superior Japanese language capabilities without requiring additional reasoning processes. It achieves this while maintaining strong performance across various benchmarks.
Q: What are the recommended use cases?
The model excels in Japanese language tasks, particularly in instruction-following scenarios. It's well-suited for applications requiring sophisticated language understanding, multi-turn conversations, and complex reasoning tasks in Japanese.