TRPrompt: Bootstrapping Query-Aware Prompt Optimization from Textual Rewards

Abstract

Prompt optimization improves the reasoning abilities of large language models(LLMs) without requiring parameter updates to the target model. Followingheuristic-based "Think step by step" approaches, the field has evolved in twomain directions: while one group of methods uses textual feedback to elicitimproved prompts from general-purpose LLMs in a training-free way, a concurrentline of research relies on numerical rewards to train a special prompt model,tailored for providing optimal prompts to the target model. In this paper, weintroduce the Textual Reward Prompt framework (TRPrompt), which unifies theseapproaches by directly incorporating textual feedback into training of theprompt model. Our framework does not require prior dataset collection and isbeing iteratively improved with the feedback on the generated prompts. Whencoupled with the capacity of an LLM to internalize the notion of what a "good"prompt is, the high-resolution signal provided by the textual rewards allows usto train a prompt model yielding state-of-the-art query-specific prompts forthe problems from the challenging math datasets GSMHard and MATH.

Quick Read (beta)

loading the full paper ...