Language Model-Based Paired Variational Autoencoders for Robotic Language Learning

Abstract

Human infants learn language while interacting with their environment inwhich their caregivers may describe the objects and actions they perform.Similar to human infants, artificial agents can learn language whileinteracting with their environment. In this work, first, we present a neuralmodel that bidirectionally binds robot actions and their language descriptionsin a simple object manipulation scenario. Building on our previous PairedVariational Autoencoders (PVAE) model, we demonstrate the superiority of thevariational autoencoder over standard autoencoders by experimenting with cubesof different colours, and by enabling the production of alternativevocabularies. Additional experiments show that the model's channel-separatedvisual feature extraction module can cope with objects of different shapes.Next, we introduce PVAE-BERT, which equips the model with a pretrainedlarge-scale language model, i.e., Bidirectional Encoder Representations fromTransformers (BERT), enabling the model to go beyond comprehending only thepredefined descriptions that the network has been trained on; the recognitionof action descriptions generalises to unconstrained natural language as themodel becomes capable of understanding unlimited variations of the samedescriptions. Our experiments suggest that using a pretrained language model asthe language encoder allows our approach to scale up for real-world scenarioswith instructions from human users.

Quick Read (beta)

loading the full paper ...