Qwen2.5 Bakeneko 32B Instruct V2

Property	Value
Model Type	Instruction-tuned Language Model
Architecture	64-layer transformer with 5120 hidden size
License	Apache License 2.0
Release Date	February 19, 2025
Authors	Xinqi Chen, Toshiaki Wakatsuki, Kei Sawada

What is qwen2.5-bakeneko-32b-instruct-v2?

This is an advanced Japanese-focused language model that builds upon the Qwen2.5 architecture, enhanced through a sophisticated two-stage training process. It combines Chat Vector technology with Odds Ratio Preference Optimization (ORPO) to achieve superior instruction-following capabilities, particularly in Japanese language tasks.

Implementation Details

The model employs a unique training approach involving model merging and distillation. It uses Chat Vector addition to enhance instruction-following capabilities, followed by ORPO training on 1.3k carefully curated samples from DeepSeek-R1. The architecture consists of 64 layers with a 5120 hidden size, following the Qwen2.5 framework.

Achieves state-of-the-art performance on Japanese MT-Bench with scores of 8.86 (first turn) and 8.53 (multi-turn)
Implements advanced parameter vector manipulation during training
Utilizes bfloat16 precision for optimal performance

Core Capabilities

Superior Japanese language understanding and generation
Enhanced instruction-following abilities
Strong reasoning capabilities without additional reasoning processes
Excellent performance in both single-turn and multi-turn conversations
Optimized for practical applications through ORPO training

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its combination of Chat Vector technology and ORPO optimization, resulting in superior Japanese language capabilities without requiring additional reasoning processes. It achieves this while maintaining strong performance across various benchmarks.

Q: What are the recommended use cases?

The model excels in Japanese language tasks, particularly in instruction-following scenarios. It's well-suited for applications requiring sophisticated language understanding, multi-turn conversations, and complex reasoning tasks in Japanese.