Self-Questioning Language Models

Abstract

Can large language models improve without external data -- by generatingtheir own questions and answers? We hypothesize that a pre-trained languagemodel can improve its reasoning skills given only a single prompt specifyingthe topic (e.g., algebra word problems) and asking the model to generate itsown questions. To do this, we propose Self-Questioning Language Models (SQLM):an asymmetric self-play framework where a proposer is given the topic andgenerates a question for a solver, who tries to answer it. Both the proposerand solver are trained via reinforcement learning. The proposer receives areward if the problem is not too easy or too difficult, and the solver receivesa reward based on majority voting, a proxy for correctness in the absence ofground-truth answers. For coding, the proposer can instead generate unit testswhich are used for verification. We study this asymmetric self-play frameworkon three benchmarks: three-digit multiplication, algebra problems from theOMEGA benchmark, and programming problems from Codeforces. By continuallygenerating more interesting problems and attempting to solve them, languagemodels can improve on downstream benchmarks without access to any curatedtraining datasets.

Quick Read (beta)

loading the full paper ...