calm3-22b-chat-selfimprove-experimental

Property	Value
Base Model	calm3-22b-chat
Training Method	Direct Preference Optimization (DPO)
Release Date	February 13, 2025
Authors	CyberAgent (Sakamoto, Jinnai, Morimura, Abe, Ariu)
Model Access	Hugging Face

What is calm3-22b-chat-selfimprove-experimental?

This model is an enhanced version of cyberagent/calm3-22b-chat, specifically trained to improve response safety and alignment. It utilizes Self-Augmented Direct Preference Optimization (DPO) and the Answer Carefully Dataset (ACv1) to generate more appropriate responses, particularly in scenarios requiring ethical consideration.

Implementation Details

The model employs a sophisticated training approach that combines data augmentation with DPO. It shows significant improvements in toxicity handling while maintaining general language performance, as evidenced by its scores on the Nejumi LLM leaderboard (ALT toxicity score improved from 0.7053 to 0.8239).

Built on the calm3-22b-chat architecture
Implements custom data augmentation prompts for generating training examples
Uses transformers library for easy implementation
Supports streaming responses

Core Capabilities

Enhanced safety responses to inappropriate queries
Balanced performance between general language tasks and alignment
Improved toxicity handling (demonstrated by benchmark results)
Maintains natural Japanese language capabilities
Streaming text generation support

Frequently Asked Questions

Q: What makes this model unique?

The model's unique strength lies in its self-improvement methodology, using carefully crafted data augmentation techniques and DPO training to enhance response safety while maintaining general language capabilities. It shows particular improvement in handling potentially toxic or inappropriate queries.

Q: What are the recommended use cases?

This model is ideal for applications requiring safe and ethical AI interactions in Japanese, particularly in public-facing applications where response appropriateness is crucial. It's well-suited for chatbots, content moderation, and general dialogue systems requiring strong alignment with ethical guidelines.