rtdetr_r50vd

Maintained By
PekingU

RT-DETR R50VD Model

PropertyValue
Parameter Count43M parameters
LicenseApache-2.0
PaperDETRs Beat YOLOs on Real-time Object Detection
Performance53.1% AP on COCO, 108 FPS on T4 GPU

What is rtdetr_r50vd?

RT-DETR (Real-Time Detection Transformer) is a groundbreaking object detection model that bridges the gap between DETR's accuracy and YOLO's speed. Developed by researchers at Peking University, it's the first real-time end-to-end object detector that eliminates the need for Non-Maximum Suppression (NMS) while maintaining high performance.

Implementation Details

The model utilizes a hybrid architecture combining an efficient hybrid encoder with uncertainty-minimal query selection. It processes multi-scale features through two key components: Attention-based Intra-scale Feature Interaction (AIFI) and CNN-based Cross-scale Feature Fusion (CCFF). Images are preprocessed to 640x640 pixels with specific normalization parameters.

  • Trained on COCO 2017 dataset (118k training images)
  • Supports flexible speed tuning through adjustable decoder layers
  • Achieves 53.1% AP on COCO validation set
  • Operates at 108 FPS on T4 GPU

Core Capabilities

  • Real-time object detection with state-of-the-art accuracy
  • End-to-end detection without NMS post-processing
  • Multi-scale feature processing
  • Flexible speed-accuracy trade-off

Frequently Asked Questions

Q: What makes this model unique?

RT-DETR uniquely combines transformer-based detection with real-time performance, outperforming both YOLO models in speed and accuracy while eliminating the need for NMS. It's 21 times faster than DINO-R50 while achieving better accuracy.

Q: What are the recommended use cases?

The model is ideal for real-time object detection applications requiring both speed and accuracy, such as surveillance systems, autonomous driving, and real-time video analysis. Its flexible architecture allows for deployment in various scenarios with different speed-accuracy requirements.

The first platform built for prompt engineering