piiranha-v1-detect-personal-information

Maintained By
iiiorg

Piiranha v1: Personal Information Detection Model

PropertyValue
Parameter Count278M
Licensecc-by-nc-nd-4.0
Base Modelmicrosoft/mdeberta-v3-base
Supported LanguagesEnglish, Spanish, French, German, Italian, Dutch

What is piiranha-v1-detect-personal-information?

Piiranha is a specialized transformer-based model designed for detecting and protecting personally identifiable information (PII) across multiple languages. Built on DeBERTa-v3, it demonstrates exceptional accuracy in identifying 17 different types of PII, making it a powerful tool for privacy protection and data security.

Implementation Details

The model is implemented using the Transformers library and features a context length of 256 tokens. It achieves remarkable performance metrics with 98.48% precision and 98.27% recall for PII detection, alongside an impressive overall accuracy of 99.44%.

  • Architecture: Fine-tuned DeBERTa-v3 base model
  • Training: Optimized using Adam optimizer with linear learning rate scheduler
  • Batch Size: 128 for both training and evaluation
  • Mixed Precision Training: Native AMP

Core Capabilities

  • Perfect detection (100%) for email addresses
  • 98% accuracy for passwords and usernames
  • High precision in detecting phone numbers and credit card information
  • Multilingual support across six major European languages
  • 17 distinct PII categories including personal, financial, and identification information

Frequently Asked Questions

Q: What makes this model unique?

The model's exceptional ability to detect multiple types of PII across six languages with near-perfect accuracy, combined with its comprehensive coverage of 17 PII categories, makes it stand out. Its perfect detection rate for emails and high accuracy for sensitive data like passwords and credit card numbers makes it particularly valuable for privacy protection.

Q: What are the recommended use cases?

The model is ideal for automated PII detection in documents, privacy compliance checking, data redaction systems, and security auditing. It's particularly useful for organizations handling multilingual content and requiring robust PII protection measures.

The first platform built for prompt engineering