bert-base-parsbert-ner-uncased
Property | Value |
---|---|
License | Apache 2.0 |
Paper | arXiv:2005.12515 |
Author | HooshvareLab |
Task | Named Entity Recognition |
What is bert-base-parsbert-ner-uncased?
ParsBERT NER is a specialized transformer-based model designed for Persian language named entity recognition. Built on Google's BERT architecture, it achieves state-of-the-art performance in identifying and classifying named entities in Persian text. The model supports multiple entity types including organizations, locations, persons, dates, and more.
Implementation Details
The model is trained on two primary datasets: ARMAN and PEYMA, as well as their combination. It uses the IOB (Inside, Outside, Beginning) tagging format for entity classification and demonstrates exceptional performance with F1 scores reaching 98.79% on PEYMA and 93.10% on ARMAN datasets.
- Supports 7 entity classes in PEYMA (Organization, Money, Location, Date, Time, Person, Percent)
- Handles 6 entity classes in ARMAN (Organization, Location, Facility, Event, Product, Person)
- Uncased model with whole word masking
Core Capabilities
- High-accuracy named entity recognition for Persian text
- Multi-class token classification
- State-of-the-art performance compared to other Persian NER models
- Compatible with Hugging Face transformers pipeline
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its exceptional performance on Persian NER tasks, significantly outperforming traditional approaches like LSTM-CRF and Rule-Based CRF systems. It's particularly notable for achieving F1 scores above 95% on combined datasets.
Q: What are the recommended use cases?
The model is ideal for Persian text processing applications requiring named entity recognition, such as information extraction, document analysis, and automated content categorization. It's particularly useful for identifying organizations, locations, persons, and temporal expressions in Persian text.