Learning Flexible Translation between Robot Actions and Language Descriptions

Abstract

Handling various robot action-language translation tasks flexibly is anessential requirement for natural interaction between a robot and a human.Previous approaches require change in the configuration of the modelarchitecture per task during inference, which undermines the premise ofmulti-task learning. In this work, we propose the paired gated autoencoders(PGAE) for flexible translation between robot actions and language descriptionsin a tabletop object manipulation scenario. We train our model in an end-to-endfashion by pairing each action with appropriate descriptions that contain asignal informing about the translation direction. During inference, our modelcan flexibly translate from action to language and vice versa according to thegiven language signal. Moreover, with the option to use a pretrained languagemodel as the language encoder, our model has the potential to recognise unseennatural language input. Another capability of our model is that it canrecognise and imitate actions of another agent by utilising robotdemonstrations. The experiment results highlight the flexible bidirectionaltranslation capabilities of our approach alongside with the ability togeneralise to the actions of the opposite-sitting agent.

Quick Read (beta)

loading the full paper ...