Whisper-NER Tag and Mask Model

Property	Value
Parameter Count	1.54B
License	MIT
Paper	WhisperNER Paper
Tensor Type	F32
Language	English

What is whisper-ner-tag-and-mask-v1?

WhisperNER is an innovative model that combines automatic speech recognition (ASR) with named entity recognition (NER) capabilities. Built on the Whisper architecture, it enables simultaneous transcription of speech and identification of entities, with additional support for entity masking - particularly valuable for privacy-sensitive applications.

Implementation Details

The model was fine-tuned from aiola/whisper-ner-v1 using the NuNER dataset, specifically designed for joint audio transcription and NER tagging or masking. It implements a unified approach to speech recognition and entity detection, supporting open-type NER that can recognize diverse and evolving entities during inference.

Supports both entity tagging and masking capabilities
Built on Whisper's robust ASR architecture
Fine-tuned on specialized NER datasets
Implements F32 tensor type for processing

Core Capabilities

Joint speech transcription and entity recognition
Open-type NER support for flexible entity detection
Optional entity masking for privacy protection
Custom prompt-based entity type specification
Integration with the Transformers library for easy deployment

Frequently Asked Questions

Q: What makes this model unique?

This model uniquely combines ASR and NER in a single architecture, allowing for simultaneous speech transcription and entity recognition. The addition of masking capabilities makes it particularly valuable for privacy-sensitive applications.

Q: What are the recommended use cases?

The model is ideal for applications requiring both speech transcription and entity recognition, such as automated transcription services, content analysis, and privacy-focused applications. However, for PII-specific use cases, additional fine-tuning is recommended.

whisper-ner-tag-and-mask-v1