yolos-tiny

Maintained By
hustvl

YOLOS-tiny

PropertyValue
Parameter Count6.49M
LicenseApache 2.0
PaperView Paper
Performance28.7 AP on COCO
FrameworkPyTorch

What is yolos-tiny?

YOLOS-tiny is a compact Vision Transformer (ViT) model designed specifically for object detection tasks. Developed by hustvl, it represents a lightweight implementation of the YOLOS architecture that achieves impressive performance while maintaining a small parameter footprint. The model utilizes a novel approach that applies transformer architecture directly to vision tasks, departing from traditional CNN-based detection methods.

Implementation Details

The model employs a bipartite matching loss system and processes images using a transformer-based architecture. It handles 100 object queries simultaneously and uses the Hungarian matching algorithm to optimize object detection. The model has been pre-trained on ImageNet-1k and fine-tuned on COCO 2017, with 300 epochs for each phase.

  • Transformer-based architecture optimized for vision tasks
  • Bipartite matching loss with Hungarian algorithm
  • Pre-trained on ImageNet-1k and fine-tuned on COCO
  • F32 tensor type for precise computations

Core Capabilities

  • Object detection with 28.7 AP on COCO validation
  • Efficient processing with only 6.49M parameters
  • Support for multiple object detection in single images
  • Integration with HuggingFace Transformers library

Frequently Asked Questions

Q: What makes this model unique?

YOLOS-tiny stands out for its efficient implementation of Vision Transformers for object detection, achieving competitive performance with significantly fewer parameters than traditional models. It demonstrates that transformer architectures can be effectively scaled down while maintaining useful detection capabilities.

Q: What are the recommended use cases?

The model is ideal for applications requiring lightweight object detection, particularly where computational resources are limited. It's suitable for real-time object detection tasks, mobile applications, and scenarios where efficient deployment is prioritized over maximum accuracy.

The first platform built for prompt engineering