Explaining Patterns in Data with Language Models via Interpretable Autoprompting

Abstract

Large language models (LLMs) have displayed an impressive ability to harnessnatural language to perform complex tasks. In this work, we explore whether wecan leverage this learned ability to find and explain patterns in data.Specifically, given a pre-trained LLM and data examples, we introduceinterpretable autoprompting (iPrompt), an algorithm that generates anatural-language string explaining the data. iPrompt iteratively alternatesbetween generating explanations with an LLM and reranking them based on theirperformance when used as a prompt. Experiments on a wide range of datasets,from synthetic mathematics to natural-language understanding, show that iPromptcan yield meaningful insights by accurately finding groundtruth datasetdescriptions. Moreover, the prompts produced by iPrompt are simultaneouslyhuman-interpretable and highly effective for generalization: on real-worldsentiment classification datasets, iPrompt produces prompts that match or evenimprove upon human-written prompts for GPT-3. Finally, experiments with an fMRIdataset show the potential for iPrompt to aid in scientific discovery. All codefor using the methods and data here is made available on Github.

Quick Read (beta)

loading the full paper ...