tamil-codemixed-abusive-MuRIL

Property	Value
License	AFL-3.0
Language	Tamil-English (Code-mixed)
Research Paper	View Paper
Downloads	666,021

What is tamil-codemixed-abusive-MuRIL?

tamil-codemixed-abusive-MuRIL is a specialized natural language processing model designed to detect abusive speech in code-mixed Tamil-English text. Built on the MuRIL architecture, this model addresses the challenging task of content moderation in multilingual Indian social media contexts.

Implementation Details

The model is fine-tuned on the MuRIL base architecture with a learning rate of 2e-5. It implements a binary classification system, categorizing text as either normal (LABEL_0) or abusive (LABEL_1). The implementation leverages PyTorch and the Transformers library, making it suitable for production deployments.

Built on MuRIL's multilingual understanding capabilities
Optimized for Tamil-English code-mixed content
Implements binary classification architecture
Supports Inference Endpoints for scalable deployment

Core Capabilities

Accurate detection of abusive content in code-mixed text
Handles both Tamil and English language elements
Optimized for social media content analysis
Supports real-time content moderation

Frequently Asked Questions

Q: What makes this model unique?

This model specifically addresses the challenge of detecting abusive content in code-mixed Tamil-English text, a task that traditional monolingual models struggle with. It's built on the robust MuRIL architecture and has been extensively validated through academic research.

Q: What are the recommended use cases?

The model is ideal for social media platforms, content moderation systems, and online communities where Tamil-English code-mixed communications are common. It can be integrated into automated content filtering systems or used for research in online behavior analysis.