OS-Atlas-Pro-7B

Property	Value
Parameter Count	8.29B
Model Type	Image-Text-to-Text
Architecture	Transformer-based (Qwen2-VL)
License	Apache 2.0
Paper	arXiv:2410.23218

What is OS-Atlas-Pro-7B?

OS-Atlas-Pro-7B is an advanced GUI action model specifically designed for generating executable actions in graphical user interface environments. Built upon the OS-Atlas-Base-7B architecture and fine-tuned for enhanced performance, this model excels at processing visual information and generating appropriate actions based on user instructions.

Implementation Details

The model is implemented using the Transformers library and builds upon the Qwen2-VL-7B-Instruct architecture. It processes both text and visual inputs, utilizing BF16 tensor types for efficient computation. The model requires specific dependencies including transformers and qwen-vl-utils for operation.

Built on Qwen2-VL-7B-Instruct base model
Supports both basic and custom GUI actions
Implements a sophisticated system prompt structure
Provides detailed reasoning through 'thought' outputs

Core Capabilities

Processes visual information from GUI screenshots
Generates thoughtful reasoning for each action
Supports basic actions (CLICK, TYPE, SCROLL)
Implements custom actions (LONG_PRESS, OPEN_APP, various navigation commands)
Provides wait and completion status handling

Frequently Asked Questions

Q: What makes this model unique?

OS-Atlas-Pro-7B stands out for its ability to combine visual understanding with action generation in GUI environments. It demonstrates superior generalizability compared to previous versions and isn't constrained to specific tasks or training datasets.

Q: What are the recommended use cases?

The model is ideal for GUI automation tasks, user interface testing, and developing AI assistants that need to interact with graphical interfaces. It's particularly useful for scenarios requiring thoughtful reasoning before executing actions.

OS-Atlas-Pro-7B

OS-Atlas-Pro-7B

What is OS-Atlas-Pro-7B?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models