heBERT

Maintained By
avichr

heBERT

PropertyValue
Research PaperView Paper
ArchitectureBERT-Base
Training Data~10.6GB (OSCAR, Wikipedia, UGC)
Primary TasksSentiment Analysis, NER, Masked-LM

What is heBERT?

heBERT is a sophisticated Hebrew language model based on Google's BERT architecture, specifically designed for Hebrew text analysis. The model was trained on an extensive dataset comprising 1 billion words across 20.8 million sentences from various sources, making it a robust tool for Hebrew natural language processing tasks.

Implementation Details

The model leverages three primary datasets: a 9.8GB Hebrew OSCAR corpus, a 650MB Wikipedia dump, and 150MB of user-generated content from news sites. It implements the BERT-Base configuration and offers multiple specialized versions for different tasks, including sentiment analysis and named entity recognition.

  • Pre-trained on combined datasets totaling over 1 billion words
  • Supports masked language modeling for transfer learning
  • Includes fine-tuned versions for sentiment analysis and NER
  • Available through the Transformers library and AWS

Core Capabilities

  • Masked Language Modeling for general language understanding
  • Sentiment Analysis with three-way classification (positive, negative, neutral)
  • Named Entity Recognition for Hebrew text
  • Emotion recognition across 8 distinct emotions (upcoming feature)

Frequently Asked Questions

Q: What makes this model unique?

heBERT is specifically optimized for Hebrew language processing, trained on a diverse and extensive Hebrew corpus. Its ability to handle both formal and user-generated content makes it particularly valuable for real-world applications.

Q: What are the recommended use cases?

The model excels in sentiment analysis of Hebrew text, named entity recognition, and can be fine-tuned for various downstream tasks. It's particularly suitable for applications requiring Hebrew text understanding, such as social media analysis, content moderation, and automated text classification.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.