Learning When Not to Answer: A Ternary Reward Structure for Reinforcement Learning based Question Answering

Abstract

In this paper, we investigate the challenges of using reinforcement learningagents for question-answering over knowledge graphs for real-worldapplications. We examine the performance metrics used by state-of-the-artsystems and determine that they are inadequate for such settings. Morespecifically, they do not evaluate the systems correctly for situations whenthere is no answer available and thus agents optimized for these metrics arepoor at modeling confidence. We introduce a simple new performance metric forevaluating question-answering agents that is more representative of practicalusage conditions, and optimize for this metric by extending the binary rewardstructure used in prior work to a ternary reward structure which also rewardsan agent for not answering a question rather than giving an incorrect answer.We show that this can drastically improve the precision of answered questionswhile only not answering a limited number of previously correctly answeredquestions. Employing a supervised learning strategy using depth-first-searchpaths to bootstrap the reinforcement learning algorithm further improvesperformance.

Quick Read (beta)

loading the full paper ...