Aria-UI
A multimodal model for visual localization of GUI commands.
CommonProductProductivityVisual LocalizationMultimodal Model
Aria-UI is a large-scale multimodal model specifically designed for visual localization of GUI commands. It employs a purely visual approach without relying on auxiliary inputs, accommodating a variety of planning commands and generating diverse, high-quality command samples to adapt to different tasks. Aria-UI has set new records in both offline and online agent benchmarks, surpassing baselines that rely solely on visual inputs or AXTree.
Aria-UI Visit Over Time
Monthly Visits
279
Bounce Rate
42.22%
Page per Visit
1.0
Visit Duration
00:00:00