bert-fa-base-uncased-ner-peyma

Property	Value
License	Apache 2.0
Author	HooshvareLab
Downloads	148,053
Task	Named Entity Recognition

What is bert-fa-base-uncased-ner-peyma?

This is a specialized Persian language model based on ParsBERT architecture, fine-tuned for Named Entity Recognition (NER) tasks on the PEYMA dataset. It achieves state-of-the-art performance with a 93.40% F1 score, surpassing previous models including mBERT and traditional approaches like LSTM-CRF.

Implementation Details

The model is trained on the PEYMA dataset, which contains 7,145 sentences with 302,530 tokens, of which 41,148 are tagged entities. It uses the IOB (Inside, Outside, Beginning) tagging format to identify and classify named entities in Persian text.

Trained on 7 entity classes: Organization, Money, Location, Date, Time, Person, and Percent
Built on ParsBERT v2.0 architecture with uncased tokenization
Implements transformer-based token classification

Core Capabilities

Entity detection across 7 distinct categories
Processing of uncased Persian text
Handles complex entity relationships
Superior performance compared to previous NER models

Frequently Asked Questions

Q: What makes this model unique?

This model represents a significant improvement over previous Persian NER systems, achieving a 93.40% F1 score on the PEYMA dataset. It's specifically optimized for Persian language understanding and handles a comprehensive range of entity types.

Q: What are the recommended use cases?

The model is ideal for Persian text analysis tasks requiring entity extraction, including information retrieval, content analysis, and automated text processing systems. It's particularly suited for applications needing to identify organizations, locations, dates, and personal names in Persian text.