Florence-2-large-PromptGen-v1.5
Property | Value |
---|---|
Parameter Count | 823M |
License | MIT |
Tensor Type | F32 |
Author | MiaoshouAI |
What is Florence-2-large-PromptGen-v1.5?
Florence-2-large-PromptGen-v1.5 is an advanced image captioning model specifically designed for AI art prompt generation and tagging. Built upon Microsoft's Florence-2 Large model, it's been fine-tuned to excel at creating detailed image descriptions and tags for AI image generation workflows.
Implementation Details
The model implements a sophisticated architecture optimized for memory efficiency, requiring just over 1GB of VRAM while maintaining high-speed performance. It supports multiple instruction types including GENERATE_TAGS, CAPTION, DETAILED_CAPTION, MORE_DETAILED_CAPTION, and MIXED_CAPTION, each serving different description needs.
- Memory-efficient architecture with 823M parameters
- Supports multiple caption instruction types
- Compatible with Flux model for both T5XXL CLIP and CLIP_L
- Improved accuracy through refined training datasets
Core Capabilities
- Generates detailed image descriptions with positioning information
- Creates structured captions with subject position detection
- Produces Danbooru-style tags
- Recognizes and processes watermarks
- Supports mixed caption styles for FLUX model integration
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its specialized training focused on AI art prompting and tagging, offering multiple caption styles while maintaining low VRAM usage. Unlike general vision models, it's specifically optimized for generating prompts that can be used to recreate similar images.
Q: What are the recommended use cases?
The model is ideal for AI artists and developers working with image generation pipelines, particularly those using ComfyUI. It excels at creating detailed image descriptions, generating tags, and providing structured captions that can be used as prompts for image generation.