Piiranha v1: Personal Information Detection Model

Property	Value
Parameter Count	278M
License	cc-by-nc-nd-4.0
Base Model	microsoft/mdeberta-v3-base
Supported Languages	English, Spanish, French, German, Italian, Dutch

What is piiranha-v1-detect-personal-information?

Piiranha is a specialized transformer-based model designed for detecting and protecting personally identifiable information (PII) across multiple languages. Built on DeBERTa-v3, it demonstrates exceptional accuracy in identifying 17 different types of PII, making it a powerful tool for privacy protection and data security.

Implementation Details

The model is implemented using the Transformers library and features a context length of 256 tokens. It achieves remarkable performance metrics with 98.48% precision and 98.27% recall for PII detection, alongside an impressive overall accuracy of 99.44%.

Architecture: Fine-tuned DeBERTa-v3 base model
Training: Optimized using Adam optimizer with linear learning rate scheduler
Batch Size: 128 for both training and evaluation
Mixed Precision Training: Native AMP

Core Capabilities

Perfect detection (100%) for email addresses
98% accuracy for passwords and usernames
High precision in detecting phone numbers and credit card information
Multilingual support across six major European languages
17 distinct PII categories including personal, financial, and identification information

Frequently Asked Questions

Q: What makes this model unique?

The model's exceptional ability to detect multiple types of PII across six languages with near-perfect accuracy, combined with its comprehensive coverage of 17 PII categories, makes it stand out. Its perfect detection rate for emails and high accuracy for sensitive data like passwords and credit card numbers makes it particularly valuable for privacy protection.

Q: What are the recommended use cases?

The model is ideal for automated PII detection in documents, privacy compliance checking, data redaction systems, and security auditing. It's particularly useful for organizations handling multilingual content and requiring robust PII protection measures.

piiranha-v1-detect-personal-information