OS-Atlas-Pro-7B
Property | Value |
---|---|
Parameter Count | 8.29B |
Model Type | Image-Text-to-Text |
Architecture | Transformer-based (Qwen2-VL) |
License | Apache 2.0 |
Paper | arXiv:2410.23218 |
What is OS-Atlas-Pro-7B?
OS-Atlas-Pro-7B is an advanced GUI action model specifically designed for generating executable actions in graphical user interface environments. Built upon the OS-Atlas-Base-7B architecture and fine-tuned for enhanced performance, this model excels at processing visual information and generating appropriate actions based on user instructions.
Implementation Details
The model is implemented using the Transformers library and builds upon the Qwen2-VL-7B-Instruct architecture. It processes both text and visual inputs, utilizing BF16 tensor types for efficient computation. The model requires specific dependencies including transformers and qwen-vl-utils for operation.
- Built on Qwen2-VL-7B-Instruct base model
- Supports both basic and custom GUI actions
- Implements a sophisticated system prompt structure
- Provides detailed reasoning through 'thought' outputs
Core Capabilities
- Processes visual information from GUI screenshots
- Generates thoughtful reasoning for each action
- Supports basic actions (CLICK, TYPE, SCROLL)
- Implements custom actions (LONG_PRESS, OPEN_APP, various navigation commands)
- Provides wait and completion status handling
Frequently Asked Questions
Q: What makes this model unique?
OS-Atlas-Pro-7B stands out for its ability to combine visual understanding with action generation in GUI environments. It demonstrates superior generalizability compared to previous versions and isn't constrained to specific tasks or training datasets.
Q: What are the recommended use cases?
The model is ideal for GUI automation tasks, user interface testing, and developing AI assistants that need to interact with graphical interfaces. It's particularly useful for scenarios requiring thoughtful reasoning before executing actions.