Grounding Language Attributes to Objects using Bayesian Eigenobjects

  • 2019-05-30 16:15:36
  • Vanya Cohen, Benjamin Burchfiel, Thao Nguyen, Nakul Gopalan, Stefanie Tellex, George Konidaris
  • 11

Abstract

We develop a system to disambiguate objects based on simple physicaldescriptions. The system takes as input a natural language phrase and a depthimage containing a segmented object and predicts how similar the observedobject is to the described object. Our system is designed to learn from only asmall amount of human-labeled language data and generalize to viewpoints notrepresented in the language-annotated depth-image training set. By decoupling3D shape representation from language representation, our method is able toground language to novel objects using a small amount of language-annotateddepth-data and a larger corpus of unlabeled 3D object meshes, even when theseobjects are partially observed from unusual viewpoints. Our system is able todisambiguate between novel objects, observed via depth-images, based on naturallanguage descriptions. Our method also enables view-point transfer; trained onhuman-annotated data on a small set of depth-images captured from frontalviewpoints, our system successfully predicted object attributes from rear viewsdespite having no such depth images in its training set. Finally, wedemonstrate our system on a Baxter robot, enabling it to pick specific objectsbased on human-provided natural language descriptions.

 

Quick Read (beta)

loading the full paper ...