bert-fa-base-uncased-ner-peyma
Property | Value |
---|---|
License | Apache 2.0 |
Author | HooshvareLab |
Downloads | 148,053 |
Task | Named Entity Recognition |
What is bert-fa-base-uncased-ner-peyma?
This is a specialized Persian language model based on ParsBERT architecture, fine-tuned for Named Entity Recognition (NER) tasks on the PEYMA dataset. It achieves state-of-the-art performance with a 93.40% F1 score, surpassing previous models including mBERT and traditional approaches like LSTM-CRF.
Implementation Details
The model is trained on the PEYMA dataset, which contains 7,145 sentences with 302,530 tokens, of which 41,148 are tagged entities. It uses the IOB (Inside, Outside, Beginning) tagging format to identify and classify named entities in Persian text.
- Trained on 7 entity classes: Organization, Money, Location, Date, Time, Person, and Percent
- Built on ParsBERT v2.0 architecture with uncased tokenization
- Implements transformer-based token classification
Core Capabilities
- Entity detection across 7 distinct categories
- Processing of uncased Persian text
- Handles complex entity relationships
- Superior performance compared to previous NER models
Frequently Asked Questions
Q: What makes this model unique?
This model represents a significant improvement over previous Persian NER systems, achieving a 93.40% F1 score on the PEYMA dataset. It's specifically optimized for Persian language understanding and handles a comprehensive range of entity types.
Q: What are the recommended use cases?
The model is ideal for Persian text analysis tasks requiring entity extraction, including information retrieval, content analysis, and automated text processing systems. It's particularly suited for applications needing to identify organizations, locations, dates, and personal names in Persian text.