Florence-2-large

Maintained By
microsoft

Florence-2-large

PropertyValue
Parameter Count0.77B
LicenseMIT
PaperLink
AuthorMicrosoft
Model TypeVision Foundation Model

What is Florence-2-large?

Florence-2-large is an advanced vision foundation model developed by Microsoft that leverages a prompt-based approach to handle multiple vision and vision-language tasks. Built on a massive FLD-5B dataset containing 5.4 billion annotations across 126 million images, this 0.77B parameter model demonstrates impressive capabilities in both zero-shot and fine-tuned settings.

Implementation Details

The model utilizes a sequence-to-sequence architecture and is implemented using HuggingFace's transformers library. It operates with float16 precision on GPU and supports various vision tasks through simple prompt engineering.

  • Supports multiple vision tasks through specialized prompts
  • Trained on FLD-5B dataset with comprehensive annotations
  • Implements efficient inference with float16 precision
  • Achieves state-of-the-art performance in various benchmarks

Core Capabilities

  • Image Captioning (Basic, Detailed, and More Detailed)
  • Object Detection with confidence scores
  • Dense Region Caption generation
  • OCR and OCR with Region detection
  • Caption to Phrase Grounding
  • Region Proposal generation

Frequently Asked Questions

Q: What makes this model unique?

Florence-2-large stands out for its ability to handle multiple vision tasks with a single model using simple prompts, achieving competitive performance with just 0.77B parameters compared to much larger models. It demonstrates strong zero-shot capabilities and can be fine-tuned for improved performance.

Q: What are the recommended use cases?

The model excels in various vision tasks including image captioning (achieving 135.6 CIDEr score on COCO), object detection (37.5 mAP on COCO), and reference visual grounding (84.4% on Flickr30k). It's particularly suitable for applications requiring multi-task vision processing.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.