rubert-tiny-toxicity

Maintained By
cointegrated

rubert-tiny-toxicity

PropertyValue
Parameter Count11.8M
Research PaperView Paper
Authorcointegrated
Downloads5,616
TagsText Classification, Russian, Toxicity, Multilabel

What is rubert-tiny-toxicity?

rubert-tiny-toxicity is a specialized Russian language model designed for detecting toxic and inappropriate content in short informal texts, particularly social media comments. Built on the rubert-tiny architecture, this model performs multilabel classification across five key categories: non-toxic content, insults, obscenity, threats, and dangerous content.

Implementation Details

The model was trained using the Adam optimizer with a learning rate of 1e-5 and batch size of 64 for 15 epochs. It achieves impressive ROC AUC scores, particularly for toxicity detection (0.9937 for non-toxic classification) and threat detection (0.9910).

  • Utilizes PyTorch and Transformers framework
  • Supports batch processing of multiple texts
  • Implements efficient tokenization with truncation and padding
  • Returns probability scores for each category

Core Capabilities

  • Multilabel classification for 5 distinct categories
  • High accuracy in detecting various forms of toxic content
  • Specialized for Russian language text analysis
  • Efficient processing with only 11.8M parameters
  • Support for both individual and batch text processing

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its specialized focus on Russian language toxicity detection while maintaining a lightweight architecture. Its multilabel approach provides granular insight into different types of inappropriate content, making it particularly valuable for content moderation systems.

Q: What are the recommended use cases?

The model is ideal for content moderation in Russian social media platforms, online forums, and comment sections. It can be used to automatically flag potentially harmful content, maintain community standards, and ensure safe online discussions.

The first platform built for prompt engineering