Abstract
In order to bring artificial agents into our lives, we will need to go beyondsupervised learning on closed datasets to having the ability to continuouslyexpand knowledge. Inspired by a student learning in a classroom, we present anagent that can continuously learn by posing natural language questions tohumans. Our agent is composed of three interacting modules, one that performscaptioning, another that generates questions and a decision maker that learnswhen to ask questions by implicitly reasoning about the uncertainty of theagent and expertise of the teacher. As compared to current active learningmethods which query images for full captions, our agent is able to ask pointedquestions to improve the generated captions. The agent trains on the improvedcaptions, expanding its knowledge. We show that our approach achieves betterperformance using less human supervision than the baselines on the challengingMSCOCO dataset.