ruT5-base-detox
Property | Value |
---|---|
Parameter Count | 223M |
Model Type | Text-to-Text Generation |
Base Model | ai-forever/ruT5-base |
License | OpenRAIL++ |
Language | Russian |
What is ruT5-base-detox?
ruT5-base-detox is a specialized Russian language model designed for text detoxification. Built on the ruT5-base architecture, this model was specifically trained to transform toxic Russian text from social media platforms like Odnoklassniki, Pikabu, and Twitter into neutral, non-offensive language while preserving the original meaning.
Implementation Details
The model utilizes the T5 architecture and was trained on the RUSSE 2022 competition's training dataset. It employs a sequence-to-sequence approach, processing input text through the T5 encoder-decoder framework to generate detoxified output.
- Based on the ruT5-base architecture with 223M parameters
- Implements PyTorch framework with F32 tensor type
- Supports text generation inference endpoints
- Utilizes Safetensors for model weight storage
Core Capabilities
- Russian text detoxification
- Preserves original message meaning while removing offensive content
- Handles various types of toxic content from different social media sources
- Supports batch processing and inference endpoints
Frequently Asked Questions
Q: What makes this model unique?
This model specifically targets Russian language detoxification, a specialized task that requires understanding of Russian cultural and linguistic nuances in toxic speech. It's trained on real-world data from multiple social media platforms, making it practical for actual use cases.
Q: What are the recommended use cases?
The model is ideal for content moderation systems, social media platforms, and applications requiring automatic transformation of toxic Russian text into more appropriate language. It can be integrated into content filtering systems, chatbots, or any application requiring text sanitization.