TinyClick

Property	Value
Parameter Count	271M
Base Model	Florence-2-base
License	MIT
Paper	arXiv:2410.11871
Tensor Type	F32

What is TinyClick?

TinyClick is a specialized single-turn agent designed for GUI automation tasks, built on Microsoft's Florence-2-Base vision-language model. It represents a significant advancement in human-computer interaction, enabling precise clicking actions on user interface elements through natural language commands while maintaining impressive efficiency with only 271M parameters.

Implementation Details

The model is implemented using the transformers library and operates on both images and text inputs. It processes screenshots alongside user commands to determine exact clicking locations. The architecture leverages the Florence-2-Base foundation while optimizing for minimal latency and resource usage.

Compact architecture with 271M parameters
F32 tensor precision for optimal performance
Built-in processor for image and text handling
Optimized for single-turn interaction

Core Capabilities

Natural language command interpretation for GUI interactions
Precise UI element identification and targeting
Strong performance on Screenspot and OmniAct benchmarks
Low-latency response suitable for real-time applications

Frequently Asked Questions

Q: What makes this model unique?

TinyClick stands out for its efficient balance between performance and size. At just 271M parameters, it achieves competitive results while maintaining minimal latency, making it practical for real-world GUI automation tasks.

Q: What are the recommended use cases?

The model is ideal for automated GUI testing, accessibility applications, and general UI automation scenarios where precise element clicking based on natural language commands is required. It's particularly effective for single-turn interactions where quick response times are crucial.

TinyClick

TinyClick

What is TinyClick?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models