TFT-ID-1.0

Maintained By
yifeihu

TFT-ID-1.0

PropertyValue
Parameter Count823M
LicenseMIT
ArchitectureFlorence-2 based Transformer
AuthorYifei Hu

What is TFT-ID-1.0?

TFT-ID-1.0 is a specialized object detection model designed to extract tables, figures, and text sections from academic papers. Built upon Microsoft's Florence-2 architecture, it represents a significant advancement in document understanding and parsing technology. The model was trained on a carefully curated dataset of over 36,000 manually annotated bounding boxes from Hugging Face Daily Papers.

Implementation Details

The model processes single-page academic paper images and outputs precise bounding boxes for tables, figures, and text sections. It utilizes a transformer-based architecture and maintains F32 tensor precision for optimal performance. The implementation achieves remarkable accuracy, with a 96.78% success rate for comprehensive content identification and 98.84% for table and figure detection specifically.

  • Trained on 36,000+ manually verified bounding boxes
  • Compatible with downstream OCR workflows
  • Outputs structured bbox coordinates in [x1, y1, x2, y2] format
  • Integrates seamlessly with TB-OCR-preview-0.1 for text extraction

Core Capabilities

  • Accurate identification of tables, figures, and text sections
  • High-precision bounding box generation
  • Clean text section isolation for OCR processing
  • Multi-component detection handling

Frequently Asked Questions

Q: What makes this model unique?

TFT-ID-1.0 stands out for its exceptional accuracy in academic paper parsing and its comprehensive approach to detecting multiple content types. The manual verification of training data by the author ensures high-quality results.

Q: What are the recommended use cases?

The model is ideal for academic paper processing pipelines, automated content extraction systems, and research document analysis. It's particularly effective when combined with OCR tools for complete document understanding.

The first platform built for prompt engineering