ParsBERT (bert-fa-zwnj-base)

Property	Value
License	Apache 2.0
Author	HooshvareLab
Paper	arXiv:2005.12515
Language	Persian

What is bert-fa-zwnj-base?

ParsBERT v3.0 is a specialized monolingual language model based on Google's BERT architecture, specifically designed for Persian language understanding. This version introduces crucial support for zero-width non-joiner characters, which are essential for proper Persian text processing. The model has been trained on a diverse collection of Persian corpora, encompassing various writing styles and subjects including scientific texts, novels, and news articles.

Implementation Details

The model implements a transformer-based architecture utilizing the BERT framework, optimized for Persian language processing. It features a custom vocabulary set specifically designed for Persian language nuances and implements specialized handling of zero-width non-joiner characters.

Comprehensive pre-training on diverse Persian corpora
Advanced handling of zero-width non-joiner characters
Multi-domain training across scientific, literary, and news content

Core Capabilities

Fill-mask task processing for Persian text
Support for both PyTorch and TensorFlow frameworks
Inference endpoint compatibility
Advanced Persian text understanding and processing

Frequently Asked Questions

Q: What makes this model unique?

ParsBERT stands out due to its specialized handling of Persian language nuances, particularly the zero-width non-joiner characters, and its training on a diverse range of Persian texts. This makes it particularly effective for Persian language understanding tasks.

Q: What are the recommended use cases?

The model is ideal for Persian language processing tasks including text classification, named entity recognition, and general language understanding applications. It's particularly well-suited for academic and commercial applications requiring sophisticated Persian text processing.

bert-fa-zwnj-base