ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation

  • 2024-01-01 14:26:39
  • Difei Gao, Lei Ji, Zechen Bai, Mingyu Ouyang, Peiran Li, Dongxing Mao, Qinchen Wu, Weichen Zhang, Peiyi Wang, Xiangwu Guo, Hengxu Wang, Luowei Zhou, Mike Zheng Shou
  • 0

Abstract

Graphical User Interface (GUI) automation holds significant promise forassisting users with complex tasks, thereby boosting human productivity.Existing works leveraging Large Language Model (LLM) or LLM-based AI agentshave shown capabilities in automating tasks on Android and Web platforms.However, these tasks are primarily aimed at simple device usage andentertainment operations. This paper presents a novel benchmark, AssistGUI, toevaluate whether models are capable of manipulating the mouse and keyboard onthe Windows platform in response to user-requested tasks. We carefullycollected a set of 100 tasks from nine widely-used software applications, suchas, After Effects and MS Word, each accompanied by the necessary project filesfor better evaluation. Moreover, we propose an advanced Actor-Critic EmbodiedAgent framework, which incorporates a sophisticated GUI parser driven by anLLM-agent and an enhanced reasoning mechanism adept at handling lengthyprocedural tasks. Our experimental results reveal that our GUI Parser andReasoning mechanism outshine existing methods in performance. Nevertheless, thepotential remains substantial, with the best model attaining only a 46% successrate on our benchmark. We conclude with a thorough analysis of the currentmethods' limitations, setting the stage for future breakthroughs in thisdomain.

 

Quick Read (beta)

loading the full paper ...