yoloe

Maintained By
jameslahm

YOLOE

PropertyValue
AuthorAo Wang et al.
PaperarXiv:2503.07465
Model SizesS, M, L variants
Parameters12M-50M

What is YOLOE?

YOLOE (YOLO Eye) is a groundbreaking unified object detection and segmentation model that introduces real-time "seeing anything" capabilities. It uniquely combines efficiency with versatility by supporting multiple prompt mechanisms - text, visual, and prompt-free paradigms - all within a single model architecture.

Implementation Details

The model implements three key innovations: Re-parameterizable Region-Text Alignment (RepRTA) for text prompts, Semantic-Activated Visual Prompt Encoder (SAVPE) for visual prompts, and Lazy Region-Prompt Contrast (LRPC) for prompt-free scenarios. These components enable state-of-the-art performance while maintaining high inference efficiency.

  • Multiple model variants (v8-S/M/L and 11-S/M/L) with different parameter sizes
  • Achieves up to 305.8 FPS on T4 GPU
  • Supports both detection and segmentation tasks
  • Zero-shot capabilities on LVIS dataset

Core Capabilities

  • Real-time object detection and segmentation
  • Multi-prompt support (text, visual, prompt-free)
  • Efficient re-parameterization for transfer learning
  • Superior performance compared to YOLO-Worldv2
  • CoreML and TensorRT deployment support

Frequently Asked Questions

Q: What makes this model unique?

YOLOE's uniqueness lies in its ability to handle multiple prompt types within a single efficient architecture, while achieving real-time performance. It offers 3× less training cost and 1.4× inference speedup compared to similar models.

Q: What are the recommended use cases?

The model is ideal for real-time object detection and segmentation applications, especially in scenarios requiring flexible object recognition without predefined categories. It's particularly suitable for deployment on both GPU (T4) and mobile devices (iPhone).

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.