bert-base-parsbert-uncased

Maintained By
HooshvareLab

ParsBERT - Persian Language Understanding Model

PropertyValue
Research PaperarXiv:2005.12515
DeveloperHooshvareLab
Framework SupportPyTorch, TensorFlow
Community Stats24,433 downloads, 31 likes

What is bert-base-parsbert-uncased?

ParsBERT is a monolingual language model specifically designed for Persian language understanding, based on Google's BERT architecture. Trained on a diverse corpus of over 2M documents spanning scientific texts, novels, and news articles, it represents a significant advancement in Persian NLP capabilities.

Implementation Details

The model implements extensive pre-processing combining POS tagging and WordPiece segmentation, processing over 40M true sentences. It follows BERT-Base configurations and is trained with whole word masking.

  • Comprehensive pre-training on varied Persian texts
  • State-of-the-art performance across multiple NLP tasks
  • Uncased tokenization with whole word masking
  • Compatible with both PyTorch and TensorFlow frameworks

Core Capabilities

  • Sentiment Analysis: Achieves 81.74% F1 on Digikala User Comments
  • Text Classification: 93.59% accuracy on Digikala Magazine
  • Named Entity Recognition: 98.79% F1 score on ARMAN dataset
  • Outperforms multilingual BERT in all Persian language tasks

Frequently Asked Questions

Q: What makes this model unique?

ParsBERT is the first comprehensive BERT model specifically trained for Persian language understanding, combining extensive pre-processing with a large-scale Persian corpus, resulting in superior performance compared to multilingual alternatives.

Q: What are the recommended use cases?

The model excels in sentiment analysis, text classification, and named entity recognition tasks for Persian text. It's particularly suitable for applications involving user comments analysis, news classification, and automated text understanding in Persian.

The first platform built for prompt engineering