Scheduled Policy Optimization for Natural Language Communication with Intelligent Agents

  • 2018-06-16 05:17:32
  • Wenhan Xiong, Xiaoxiao Guo, Mo Yu, Shiyu Chang, Bowen Zhou, William Yang Wang
  • 2

Abstract

We investigate the task of learning to follow natural language instructionsby jointly reasoning with visual observations and language inputs. In contrastto existing methods which start with learning from demonstrations (LfD) andthen use reinforcement learning (RL) to fine-tune the model parameters, wepropose a novel policy optimization algorithm which dynamically schedulesdemonstration learning and RL. The proposed training paradigm providesefficient exploration and better generalization beyond existing methods.Comparing to existing ensemble models, the best single model based on ourproposed method tremendously decreases the execution error by over 50% on ablock-world environment. To further illustrate the exploration strategy of ourRL algorithm, We also include systematic studies on the evolution of policyentropy during training.

 

Quick Read (beta)

loading the full paper ...