Learning from Dialogue after Deployment: Feed Yourself, Chatbot!

  • 2019-01-16 18:02:44
  • Braden Hancock, Antoine Bordes, Pierre-Emmanuel Mazare, Jason Weston
  • 85

Abstract

The majority of conversations a dialogue agent sees over its lifetime occurafter it has already been trained and deployed, leaving a vast store ofpotential training signal untapped. In this work, we propose the self-feedingchatbot, a dialogue agent with the ability to extract new training examplesfrom the conversations it participates in. As our agent engages inconversation, it also estimates user satisfaction in its responses. When theconversation appears to be going well, the user's responses become new trainingexamples to imitate. When the agent believes it has made a mistake, it asks forfeedback; learning to predict the feedback that will be given improves thechatbot's dialogue abilities further. On the PersonaChat chit-chat dataset withover 131k training examples, we find that learning from dialogue with aself-feeding chatbot significantly improves performance, regardless of theamount of traditional supervision.

 

Quick Read (beta)

loading the full paper ...