I Could've Asked That: Reformulating Unanswerable Questions

Abstract

When seeking information from unfamiliar documents, users frequently posequestions that cannot be answered by the documents. While existing largelanguage models (LLMs) identify these unanswerable questions, they do notassist users in reformulating their questions, thereby reducing their overallutility. We curate CouldAsk, an evaluation benchmark composed of existing andnew datasets for document-grounded question answering, specifically designed tostudy reformulating unanswerable questions. We evaluate state-of-the-artopen-source and proprietary LLMs on CouldAsk. The results demonstrate thelimited capabilities of these models in reformulating questions. Specifically,GPT-4 and Llama2-7B successfully reformulate questions only 26% and 12% of thetime, respectively. Error analysis shows that 62% of the unsuccessfulreformulations stem from the models merely rephrasing the questions or evengenerating identical questions. We publicly release the benchmark and the codeto reproduce the experiments.

Quick Read (beta)

loading the full paper ...