NSFW Image Detector

Property	Value
Base Model	google/vit-base-patch16-224-in21k
Training Dataset	28k custom images
Accuracy	93.16%
Alternative Version	384 resolution available

What is nsfw-image-detector?

The NSFW Image Detector is a specialized vision transformer model fine-tuned to classify images into five distinct categories: drawings, hentai, neutral, porn, and sexy. Built upon Google's ViT architecture, this model achieves an impressive 93.16% accuracy and 98.87% top-k accuracy on its evaluation set.

Implementation Details

The model was trained using a carefully curated dataset of approximately 28,000 images. Training was conducted over 10 epochs using the Adam optimizer with a learning rate of 2e-05 and linear scheduling. The implementation leverages mixed precision training with Native AMP for optimal performance.

Fine-tuned ViT architecture with 224x224 input resolution
Trained with batch size of 32 for both training and evaluation
Implements warmup steps and linear learning rate scheduling
Achieves 0.8138 final loss on validation set

Core Capabilities

Multi-class classification across 5 distinct categories
High accuracy for content moderation tasks
Efficient processing with 224x224 resolution (384x384 version available)
Robust performance with 98.87% top-k accuracy

Frequently Asked Questions

Q: What makes this model unique?

This model combines the powerful ViT architecture with specialized training for NSFW content detection, achieving high accuracy while maintaining efficient processing capabilities. The availability of both 224 and 384 resolution versions provides flexibility for different use cases.

Q: What are the recommended use cases?

The model is ideal for content moderation systems, automated content filtering, and safe-for-work verification of image databases. It's particularly useful for platforms that need to maintain appropriate content standards.