Toxic-BERT

Property	Value
Parameter Count	109M
License	Apache 2.0
Framework	PyTorch, JAX
Papers	Link, Link

What is toxic-bert?

Toxic-BERT is a sophisticated text classification model designed to detect various forms of toxic content in online communications. Developed by Unitary, it's built on BERT architecture and trained on multiple Jigsaw challenges datasets, making it capable of identifying different types of toxicity including threats, obscenity, insults, and identity-based hate.

Implementation Details

The model utilizes the BERT-base-uncased architecture and has been trained using PyTorch Lightning and Hugging Face Transformers. It offers multi-label classification capabilities and achieves impressive performance scores, with a 0.98636 score on the original Toxic Comment Classification Challenge.

Supports multiple toxicity labels including toxic, severe_toxic, obscene, threat, insult, and identity_hate
Implements bias-aware classification to minimize unintended bias towards identity groups
Offers multilingual support for 7 languages including English, French, Spanish, Italian, Portuguese, Turkish, and Russian

Core Capabilities

Multi-label toxic content classification
Bias-aware prediction system
Multilingual support
High accuracy with 109M parameters
Easy integration through PyTorch and JAX frameworks

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its comprehensive approach to toxicity detection, combining high accuracy with bias awareness and multilingual capabilities. It's been trained on multiple Jigsaw challenge datasets, making it robust across different types of toxic content.

Q: What are the recommended use cases?

The model is primarily intended for research purposes and content moderation assistance. It's particularly useful for automated content filtering systems, helping content moderators flag potentially harmful content more efficiently. However, users should be aware of potential limitations regarding biases and consider the ethical implications discussed in the model's documentation.

toxic-bert