japanese-gpt-neox-3.6b-instruction-ppo

Maintained By
rinna

japanese-gpt-neox-3.6b-instruction-ppo

PropertyValue
Parameter Count3.6B
Model TypeGPT-NeoX
LicenseMIT
PaperLink
Architecture36-layer, 2816-hidden-size transformer

What is japanese-gpt-neox-3.6b-instruction-ppo?

This is an advanced Japanese language model that implements Reinforcement Learning from Human Feedback (RLHF) using PPO (Proximal Policy Optimization). Built upon the SFT variant, it has been specifically aligned to better follow instructions and engage in natural conversations. Human evaluation shows a 47% win rate compared to its SFT counterpart, with ChatGPT-based evaluation showing even better results at 63%.

Implementation Details

The model utilizes a sophisticated two-stage training approach: first Supervised Fine-Tuning (SFT), followed by reinforcement learning using PPO. It's built on CarperAI/trlx's implementation and trained on Japanese-translated Anthropic HH RLHF data.

  • Advanced tokenization using SentencePiece with 32,000 vocabulary size
  • Byte fallback feature to handle unknown text
  • Customized generation parameters with temperature=0.7 and repetition_penalty=1.1
  • Special input format using ユーザー/システム conversation structure

Core Capabilities

  • Instruction-following in Japanese language
  • Natural conversation handling with structured input/output
  • Improved response quality compared to SFT version
  • Efficient handling of unknown characters through byte fallback

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its successful application of RLHF in Japanese language modeling, showing measurable improvements over its SFT variant. It uses a special conversation format and has been optimized for instruction-following tasks.

Q: What are the recommended use cases?

The model is ideal for Japanese language conversation systems, chatbots, and instruction-following applications. It's particularly suited for scenarios requiring natural dialogue flow and accurate response generation in Japanese.

The first platform built for prompt engineering