ParsBERT (bert-fa-zwnj-base)
Property | Value |
---|---|
License | Apache 2.0 |
Author | HooshvareLab |
Paper | arXiv:2005.12515 |
Language | Persian |
What is bert-fa-zwnj-base?
ParsBERT v3.0 is a specialized monolingual language model based on Google's BERT architecture, specifically designed for Persian language understanding. This version introduces crucial support for zero-width non-joiner characters, which are essential for proper Persian text processing. The model has been trained on a diverse collection of Persian corpora, encompassing various writing styles and subjects including scientific texts, novels, and news articles.
Implementation Details
The model implements a transformer-based architecture utilizing the BERT framework, optimized for Persian language processing. It features a custom vocabulary set specifically designed for Persian language nuances and implements specialized handling of zero-width non-joiner characters.
- Comprehensive pre-training on diverse Persian corpora
- Advanced handling of zero-width non-joiner characters
- Multi-domain training across scientific, literary, and news content
Core Capabilities
- Fill-mask task processing for Persian text
- Support for both PyTorch and TensorFlow frameworks
- Inference endpoint compatibility
- Advanced Persian text understanding and processing
Frequently Asked Questions
Q: What makes this model unique?
ParsBERT stands out due to its specialized handling of Persian language nuances, particularly the zero-width non-joiner characters, and its training on a diverse range of Persian texts. This makes it particularly effective for Persian language understanding tasks.
Q: What are the recommended use cases?
The model is ideal for Persian language processing tasks including text classification, named entity recognition, and general language understanding applications. It's particularly well-suited for academic and commercial applications requiring sophisticated Persian text processing.