Qari-OCR-0.2.2.1-VL-2B-Instruct

Property	Value
Base Model	Qwen2 VL
Parameters	2 Billion
License	Follows Qwen2 VL licensing terms
Author	NAMAA-Space
Model URL	Hugging Face

What is Qari-OCR-0.2.2.1-VL-2B-Instruct?

Qari-OCR-0.2.2.1-VL-2B-Instruct is a state-of-the-art Arabic Optical Character Recognition (OCR) model fine-tuned on Qwen2-VL-2B-Instruct. It represents a significant advancement in Arabic text recognition, achieving impressive metrics with a Word Error Rate (WER) of 0.221 and Character Error Rate (CER) of 0.059.

Implementation Details

The model was trained on a comprehensive dataset of 50,000 records, incorporating various font sizes (14-40pt) and multiple page layouts including A4, Letter, and custom formats. It supports 12 different Arabic fonts, making it highly versatile for real-world applications.

Superior accuracy compared to existing solutions like easyOCR and pytesseract
Full diacritics (tashkeel) support including fatḥah, kasrah, ḍammah, and more
Flexible layout handling for various document formats
Trained on multiple font styles and sizes

Core Capabilities

High-accuracy Arabic text extraction (BLEU score: 0.597)
Complete diacritical mark recognition
Support for multiple page layouts and formats
Robust performance across various font styles
Enhanced handling of complex Arabic typography

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its exceptional accuracy in Arabic OCR, particularly in handling diacritical marks and various font styles. It achieves significantly better performance metrics compared to existing solutions, with a WER of 0.221 versus competitors' 0.757-1.294.

Q: What are the recommended use cases?

The model is ideal for digitizing Arabic documents, processing academic texts with diacritics, handling business documents, and converting printed Arabic text to digital format. It works best with font sizes between 14-40pt and supports various standard document layouts.