Florence-2-base-PromptGen-v2.0

Property	Value
Parameter Count	271M
License	MIT
Tensor Type	F32
Author	MiaoshouAI

What is Florence-2-base-PromptGen-v2.0?

Florence-2-base-PromptGen-v2.0 is an advanced image captioning model that builds upon its predecessor with enhanced capabilities for generating detailed image descriptions and tags. This lightweight model offers exceptional performance while requiring minimal VRAM usage (~1GB), making it highly efficient for various applications.

Implementation Details

The model implements multiple specialized instruction modes for different captioning tasks, utilizing a transformer-based architecture optimized for efficient processing. It's designed to work seamlessly with both T5XXL CLIP and CLIP_L in the Flux model ecosystem.

Memory-efficient architecture requiring only 1GB VRAM
Support for multiple instruction types including GENERATE_TAGS, CAPTION, and ANALYZE
Integrated support for Flux model CLIP implementations
Optimized for fast processing and high-quality output

Core Capabilities

Generate Danbooru-style tags with improved accuracy
Create structured captions with spatial awareness
Perform detailed image composition analysis
Produce mixed-style captions combining detailed descriptions with tags
Support for partial image analysis through ComfyUI integration

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its efficient resource usage while maintaining high-quality output. It uniquely combines multiple caption styles and analysis capabilities in a single lightweight package, making it particularly valuable for workflows requiring both detailed descriptions and technical tags.

Q: What are the recommended use cases?

The model is ideal for automated image captioning systems, content management platforms, and AI art workflows, particularly when working with Flux models. It's especially useful for applications requiring both technical tags and natural language descriptions of images.