Robi Butler: Remote Multimodal Interactions with Household Robot Assistant

  • 2024-09-30 18:49:09
  • Anxing Xiao, Nuwan Janaka, Tianrun Hu, Anshul Gupta, Kaixin Li, Cunjun Yu, David Hsu
  • 0

Abstract

In this paper, we introduce Robi Butler, a novel household robotic systemthat enables multimodal interactions with remote users. Building on theadvanced communication interfaces, Robi Butler allows users to monitor therobot's status, send text or voice instructions, and select target objects byhand pointing. At the core of our system is a high-level behavior module,powered by Large Language Models (LLMs), that interprets multimodalinstructions to generate action plans. These plans are composed of a set ofopen vocabulary primitives supported by Vision Language Models (VLMs) thathandle both text and pointing queries. The integration of the above componentsallows Robi Butler to ground remote multimodal instructions in the real-worldhome environment in a zero-shot manner. We demonstrate the effectiveness andefficiency of this system using a variety of daily household tasks that involveremote users giving multimodal instructions. Additionally, we conducted a userstudy to analyze how multimodal interactions affect efficiency and userexperience during remote human-robot interaction and discuss the potentialimprovements.

 

Quick Read (beta)

loading the full paper ...