bert-fa-zwnj-base

Maintained By
HooshvareLab

ParsBERT (bert-fa-zwnj-base)

PropertyValue
LicenseApache 2.0
AuthorHooshvareLab
PaperarXiv:2005.12515
LanguagePersian

What is bert-fa-zwnj-base?

ParsBERT v3.0 is a specialized monolingual language model based on Google's BERT architecture, specifically designed for Persian language understanding. This version introduces crucial support for zero-width non-joiner characters, which are essential for proper Persian text processing. The model has been trained on a diverse collection of Persian corpora, encompassing various writing styles and subjects including scientific texts, novels, and news articles.

Implementation Details

The model implements a transformer-based architecture utilizing the BERT framework, optimized for Persian language processing. It features a custom vocabulary set specifically designed for Persian language nuances and implements specialized handling of zero-width non-joiner characters.

  • Comprehensive pre-training on diverse Persian corpora
  • Advanced handling of zero-width non-joiner characters
  • Multi-domain training across scientific, literary, and news content

Core Capabilities

  • Fill-mask task processing for Persian text
  • Support for both PyTorch and TensorFlow frameworks
  • Inference endpoint compatibility
  • Advanced Persian text understanding and processing

Frequently Asked Questions

Q: What makes this model unique?

ParsBERT stands out due to its specialized handling of Persian language nuances, particularly the zero-width non-joiner characters, and its training on a diverse range of Persian texts. This makes it particularly effective for Persian language understanding tasks.

Q: What are the recommended use cases?

The model is ideal for Persian language processing tasks including text classification, named entity recognition, and general language understanding applications. It's particularly well-suited for academic and commercial applications requiring sophisticated Persian text processing.

The first platform built for prompt engineering