calm3-22b-chat-selfimprove-experimental
Property | Value |
---|---|
Base Model | calm3-22b-chat |
Training Method | Direct Preference Optimization (DPO) |
Release Date | February 13, 2025 |
Authors | CyberAgent (Sakamoto, Jinnai, Morimura, Abe, Ariu) |
Model Access | Hugging Face |
What is calm3-22b-chat-selfimprove-experimental?
This model is an enhanced version of cyberagent/calm3-22b-chat, specifically trained to improve response safety and alignment. It utilizes Self-Augmented Direct Preference Optimization (DPO) and the Answer Carefully Dataset (ACv1) to generate more appropriate responses, particularly in scenarios requiring ethical consideration.
Implementation Details
The model employs a sophisticated training approach that combines data augmentation with DPO. It shows significant improvements in toxicity handling while maintaining general language performance, as evidenced by its scores on the Nejumi LLM leaderboard (ALT toxicity score improved from 0.7053 to 0.8239).
- Built on the calm3-22b-chat architecture
- Implements custom data augmentation prompts for generating training examples
- Uses transformers library for easy implementation
- Supports streaming responses
Core Capabilities
- Enhanced safety responses to inappropriate queries
- Balanced performance between general language tasks and alignment
- Improved toxicity handling (demonstrated by benchmark results)
- Maintains natural Japanese language capabilities
- Streaming text generation support
Frequently Asked Questions
Q: What makes this model unique?
The model's unique strength lies in its self-improvement methodology, using carefully crafted data augmentation techniques and DPO training to enhance response safety while maintaining general language capabilities. It shows particular improvement in handling potentially toxic or inappropriate queries.
Q: What are the recommended use cases?
This model is ideal for applications requiring safe and ethical AI interactions in Japanese, particularly in public-facing applications where response appropriateness is crucial. It's well-suited for chatbots, content moderation, and general dialogue systems requiring strong alignment with ethical guidelines.