UI-TARS-7B-DPO

Property	Value
Model Size	7B parameters
Paper	arXiv:2501.12326
Author	ByteDance Research
Model URL	https://huggingface.co/bytedance-research/UI-TARS-7B-DPO

What is UI-TARS-7B-DPO?

UI-TARS-7B-DPO is a next-generation GUI interaction model that represents a significant advancement in automated interface navigation and task completion. It integrates perception, reasoning, grounding, and memory capabilities into a single vision-language model, enabling end-to-end automation without predefined workflows.

Implementation Details

The model utilizes a unified architecture that processes both visual and textual information to understand and interact with graphical user interfaces. It has been trained using Direct Preference Optimization (DPO) to enhance its decision-making capabilities and task execution accuracy.

Achieves 89.5% average accuracy on ScreenSpot benchmarks
Demonstrates superior performance in cross-domain tasks with 67.1% success rate
Excels in both mobile and desktop interface interactions

Core Capabilities

Advanced perception with 79.7% accuracy on VisualWebBench
Robust element grounding across different interface types
Seamless handling of text and icon/widget interactions
Enhanced performance in online and offline task automation
Support for multiple platforms including mobile, desktop, and web interfaces

Frequently Asked Questions

Q: What makes this model unique?

UI-TARS-7B-DPO stands out for its integrated approach to GUI interaction, combining all necessary components in a single model rather than using traditional modular frameworks. It achieves state-of-the-art performance across multiple benchmarks and can handle complex interface interactions without predefined rules.

Q: What are the recommended use cases?

The model is ideal for automated GUI testing, task automation across different platforms, interface accessibility enhancement, and development of intelligent user assistance systems. It performs particularly well in scenarios requiring understanding of complex interfaces and multi-step task execution.

UI-TARS-7B-DPO

UI-TARS-7B-DPO

What is UI-TARS-7B-DPO?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models