Whisper-NER Tag and Mask Model
Property | Value |
---|---|
Parameter Count | 1.54B |
License | MIT |
Paper | WhisperNER Paper |
Tensor Type | F32 |
Language | English |
What is whisper-ner-tag-and-mask-v1?
WhisperNER is an innovative model that combines automatic speech recognition (ASR) with named entity recognition (NER) capabilities. Built on the Whisper architecture, it enables simultaneous transcription of speech and identification of entities, with additional support for entity masking - particularly valuable for privacy-sensitive applications.
Implementation Details
The model was fine-tuned from aiola/whisper-ner-v1 using the NuNER dataset, specifically designed for joint audio transcription and NER tagging or masking. It implements a unified approach to speech recognition and entity detection, supporting open-type NER that can recognize diverse and evolving entities during inference.
- Supports both entity tagging and masking capabilities
- Built on Whisper's robust ASR architecture
- Fine-tuned on specialized NER datasets
- Implements F32 tensor type for processing
Core Capabilities
- Joint speech transcription and entity recognition
- Open-type NER support for flexible entity detection
- Optional entity masking for privacy protection
- Custom prompt-based entity type specification
- Integration with the Transformers library for easy deployment
Frequently Asked Questions
Q: What makes this model unique?
This model uniquely combines ASR and NER in a single architecture, allowing for simultaneous speech transcription and entity recognition. The addition of masking capabilities makes it particularly valuable for privacy-sensitive applications.
Q: What are the recommended use cases?
The model is ideal for applications requiring both speech transcription and entity recognition, such as automated transcription services, content analysis, and privacy-focused applications. However, for PII-specific use cases, additional fine-tuning is recommended.