Florence-2-base-PromptGen-v1.5

Property	Value
Parameter Count	271M
License	MIT
Tensor Type	F32
Author	MiaoshouAI

What is Florence-2-base-PromptGen-v1.5?

Florence-2-base-PromptGen-v1.5 is an advanced image captioning model specifically designed for AI art workflows. Built on Microsoft's Florence-2 architecture, this model represents a significant upgrade with new caption instructions and improved accuracy. It's specifically optimized for generating detailed image descriptions and tags, making it particularly valuable for AI image generation pipelines.

Implementation Details

The model implements a sophisticated captioning system with multiple instruction modes: GENERATE_TAGS, CAPTION, DETAILED_CAPTION, MORE_DETAILED_CAPTION, and MIXED_CAPTION. It operates with remarkable efficiency, requiring just over 1GB of VRAM while delivering fast and high-quality image captions.

Memory-efficient architecture requiring minimal VRAM
Compatible with Flux model for both T5XXL CLIP and CLIP_L
Supports structured caption format with position detection
Improved watermark recognition capabilities

Core Capabilities

Generates Danbooru-style tags with the GENERATE_TAGS instruction
Produces structured captions with subject position information
Creates detailed descriptions with MORE_DETAILED_CAPTION mode
Supports mixed caption style for FLUX model integration
Efficient processing with low VRAM requirements

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its specialized training focused on AI art workflows, avoiding common issues like Lora trigger words and inaccurate tags. It's designed specifically for prompt generation and image tagging, unlike general-purpose vision models.

Q: What are the recommended use cases?

The model is ideal for AI artists and developers working with image generation pipelines, particularly those using ComfyUI. It's especially useful for creating detailed image descriptions, generating tags, and working with Flux models requiring both T5XXL and CLIP_L encoding.