Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

Abstract

We present a new question set, text corpus, and baselines assembled toencourage AI research in advanced question answering. Together, theseconstitute the AI2 Reasoning Challenge (ARC), which requires far more powerfulknowledge and reasoning than previous challenges such as SQuAD or SNLI. The ARCquestion set is partitioned into a Challenge Set and an Easy Set, where theChallenge Set contains only questions answered incorrectly by both aretrieval-based algorithm and a word co-occurence algorithm. The datasetcontains only natural, grade-school science questions (authored for humantests), and is the largest public-domain set of this kind (7,787 questions). Wetest several baselines on the Challenge Set, including leading neural modelsfrom the SQuAD and SNLI tasks, and find that none are able to significantlyoutperform a random baseline, reflecting the difficult nature of this task. Weare also releasing the ARC Corpus, a corpus of 14M science sentences relevantto the task, and implementations of the three neural baseline models tested.Can your model perform better? We pose ARC as a challenge to the community.

Quick Read (beta)

loading the full paper ...