qwen2.5-bakeneko-32b-instruct-v2

Maintained By
rinna

Qwen2.5 Bakeneko 32B Instruct V2

PropertyValue
Model TypeInstruction-tuned Language Model
Architecture64-layer transformer with 5120 hidden size
LicenseApache License 2.0
Release DateFebruary 19, 2025
AuthorsXinqi Chen, Toshiaki Wakatsuki, Kei Sawada

What is qwen2.5-bakeneko-32b-instruct-v2?

This is an advanced Japanese-focused language model that builds upon the Qwen2.5 architecture, enhanced through a sophisticated two-stage training process. It combines Chat Vector technology with Odds Ratio Preference Optimization (ORPO) to achieve superior instruction-following capabilities, particularly in Japanese language tasks.

Implementation Details

The model employs a unique training approach involving model merging and distillation. It uses Chat Vector addition to enhance instruction-following capabilities, followed by ORPO training on 1.3k carefully curated samples from DeepSeek-R1. The architecture consists of 64 layers with a 5120 hidden size, following the Qwen2.5 framework.

  • Achieves state-of-the-art performance on Japanese MT-Bench with scores of 8.86 (first turn) and 8.53 (multi-turn)
  • Implements advanced parameter vector manipulation during training
  • Utilizes bfloat16 precision for optimal performance

Core Capabilities

  • Superior Japanese language understanding and generation
  • Enhanced instruction-following abilities
  • Strong reasoning capabilities without additional reasoning processes
  • Excellent performance in both single-turn and multi-turn conversations
  • Optimized for practical applications through ORPO training

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its combination of Chat Vector technology and ORPO optimization, resulting in superior Japanese language capabilities without requiring additional reasoning processes. It achieves this while maintaining strong performance across various benchmarks.

Q: What are the recommended use cases?

The model excels in Japanese language tasks, particularly in instruction-following scenarios. It's well-suited for applications requiring sophisticated language understanding, multi-turn conversations, and complex reasoning tasks in Japanese.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.