Qari-OCR-0.2.2.1-VL-2B-Instruct
Property | Value |
---|---|
Base Model | Qwen2 VL |
Parameters | 2 Billion |
License | Follows Qwen2 VL licensing terms |
Author | NAMAA-Space |
Model URL | Hugging Face |
What is Qari-OCR-0.2.2.1-VL-2B-Instruct?
Qari-OCR-0.2.2.1-VL-2B-Instruct is a state-of-the-art Arabic Optical Character Recognition (OCR) model fine-tuned on Qwen2-VL-2B-Instruct. It represents a significant advancement in Arabic text recognition, achieving impressive metrics with a Word Error Rate (WER) of 0.221 and Character Error Rate (CER) of 0.059.
Implementation Details
The model was trained on a comprehensive dataset of 50,000 records, incorporating various font sizes (14-40pt) and multiple page layouts including A4, Letter, and custom formats. It supports 12 different Arabic fonts, making it highly versatile for real-world applications.
- Superior accuracy compared to existing solutions like easyOCR and pytesseract
- Full diacritics (tashkeel) support including fatḥah, kasrah, ḍammah, and more
- Flexible layout handling for various document formats
- Trained on multiple font styles and sizes
Core Capabilities
- High-accuracy Arabic text extraction (BLEU score: 0.597)
- Complete diacritical mark recognition
- Support for multiple page layouts and formats
- Robust performance across various font styles
- Enhanced handling of complex Arabic typography
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its exceptional accuracy in Arabic OCR, particularly in handling diacritical marks and various font styles. It achieves significantly better performance metrics compared to existing solutions, with a WER of 0.221 versus competitors' 0.757-1.294.
Q: What are the recommended use cases?
The model is ideal for digitizing Arabic documents, processing academic texts with diacritics, handling business documents, and converting printed Arabic text to digital format. It works best with font sizes between 14-40pt and supports various standard document layouts.