trocr-large-printed

Maintained By
microsoft

TrOCR Large Printed Model

PropertyValue
Parameter Count608M parameters
Model TypeVision-encoder-decoder
ArchitectureTransformer-based OCR
PaperResearch Paper
Downloads215,547

What is trocr-large-printed?

TrOCR large-printed is a sophisticated optical character recognition (OCR) model developed by Microsoft, specifically designed for processing printed text. This model represents a significant advancement in OCR technology, utilizing a transformer-based architecture with 608 million parameters to achieve high-accuracy text recognition from images.

Implementation Details

The model employs a dual-transformer architecture, combining an image transformer encoder initialized from BEiT weights and a text transformer decoder based on RoBERTa. Images are processed in 16x16 pixel patches with linear embedding and position encoding, enabling efficient text extraction from document images.

  • Pre-trained on extensive document datasets
  • Fine-tuned on SROIE dataset for optimal performance
  • Supports F32 tensor operations
  • Implements vision-encoder-decoder architecture

Core Capabilities

  • High-accuracy printed text recognition
  • Single text-line image processing
  • Automated document text extraction
  • Support for various document formats

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its transformer-based architecture that combines vision and text processing capabilities, making it particularly effective for printed text recognition with state-of-the-art accuracy.

Q: What are the recommended use cases?

The model is optimized for processing printed text in documents, making it ideal for automated document processing, form digitization, and text extraction from printed materials.

The first platform built for prompt engineering