RHINO: Learning Real-Time Humanoid-Human-Object Interaction from Human Demonstrations

Abstract

Humanoid robots have shown success in locomotion and manipulation. Despitethese basic abilities, humanoids are still required to quickly understand humaninstructions and react based on human interaction signals to become valuableassistants in human daily life. Unfortunately, most existing works only focuson multi-stage interactions, treating each task separately, and neglectingreal-time feedback. In this work, we aim to empower humanoid robots withreal-time reaction abilities to achieve various tasks, allowing human tointerrupt robots at any time, and making robots respond to humans immediately.To support such abilities, we propose a general humanoid-human-objectinteraction framework, named RHINO, i.e., Real-time Humanoid-human Interactionand Object manipulation. RHINO provides a unified view of reactive motion,instruction-based manipulation, and safety concerns, over multiple human signalmodalities, such as languages, images, and motions. RHINO is a hierarchicallearning framework, enabling humanoids to learn reaction skills fromhuman-human-object demonstrations and teleoperation data. In particular, itdecouples the interaction process into two levels: 1) a high-level plannerinferring human intentions from real-time human behaviors; and 2) a low-levelcontroller achieving reactive motion behaviors and object manipulation skillsbased on the predicted intentions. We evaluate the proposed framework on a realhumanoid robot and demonstrate its effectiveness, flexibility, and safety invarious scenarios.

Quick Read (beta)

loading the full paper ...