TIPO-200M-ft
Property | Value |
---|---|
Parameter Count | 203M |
Architecture | LLaMA |
Training Datasets | Danbooru2023, Coyo-HD-11M, Multiple Caption Datasets |
License | Kohaku License 1.0 |
Context Length | 1024 tokens |
What is TIPO-200M-ft?
TIPO-200M-ft is an advanced language model designed for Text-to-Image Prompt Optimization (TIPO). It's a fine-tuned version of the base TIPO-200M model, trained on an additional 10B tokens of specialized image caption data. The model serves as a text preprocessor that enhances user-provided prompts to generate better quality images through various text-to-image systems.
Implementation Details
Built on the LLaMA architecture, this model was trained using 4 RTX 3090 GPUs over 120 hours. It processes up to 1024 tokens and was trained with a batch size of 2048. The model has shown impressive results in scenery tag tests and short/truncated prompt optimization.
- Training included 50B tokens of specialized image caption data
- Uses advanced text presampling techniques
- Optimized for both short and long-form prompt enhancement
Core Capabilities
- Enhanced prompt generation for text-to-image systems
- Superior performance in aesthetic score improvements
- Effective handling of both simple tags and complex descriptions
- Compatible with various stable diffusion implementations
Frequently Asked Questions
Q: What makes this model unique?
TIPO-200M-ft stands out for its specialized training in prompt optimization, showing superior results in FDD (Fréchet Distance) scores and aesthetic metrics compared to other prompt generation methods. It's particularly effective at maintaining image quality while achieving desired output distributions.
Q: What are the recommended use cases?
The model is ideal for enhancing text-to-image generation workflows, particularly when using stable-diffusion-webui, stable-diffusion-webui-forge, or ComfyUI. It's especially effective for improving both simple tag-based prompts and complex descriptive prompts.