Florence-2-large-PromptGen-v1.5

Property	Value
Parameter Count	823M
License	MIT
Tensor Type	F32
Author	MiaoshouAI

What is Florence-2-large-PromptGen-v1.5?

Florence-2-large-PromptGen-v1.5 is an advanced image captioning model specifically designed for AI art prompt generation and tagging. Built upon Microsoft's Florence-2 Large model, it's been fine-tuned to excel at creating detailed image descriptions and tags for AI image generation workflows.

Implementation Details

The model implements a sophisticated architecture optimized for memory efficiency, requiring just over 1GB of VRAM while maintaining high-speed performance. It supports multiple instruction types including GENERATE_TAGS, CAPTION, DETAILED_CAPTION, MORE_DETAILED_CAPTION, and MIXED_CAPTION, each serving different description needs.

Memory-efficient architecture with 823M parameters
Supports multiple caption instruction types
Compatible with Flux model for both T5XXL CLIP and CLIP_L
Improved accuracy through refined training datasets

Core Capabilities

Generates detailed image descriptions with positioning information
Creates structured captions with subject position detection
Produces Danbooru-style tags
Recognizes and processes watermarks
Supports mixed caption styles for FLUX model integration

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its specialized training focused on AI art prompting and tagging, offering multiple caption styles while maintaining low VRAM usage. Unlike general vision models, it's specifically optimized for generating prompts that can be used to recreate similar images.

Q: What are the recommended use cases?

The model is ideal for AI artists and developers working with image generation pipelines, particularly those using ComfyUI. It excels at creating detailed image descriptions, generating tags, and providing structured captions that can be used as prompts for image generation.