Abstract
The development of Large Language Models (LLMs) has notably transformednumerous sectors, offering impressive text generation capabilities. Yet, thereliability and truthfulness of these models remain pressing concerns. To thisend, we investigate iterative prompting, a strategy hypothesized to refine LLMresponses, assessing its impact on LLM truthfulness, an area which has not beenthoroughly explored. Our extensive experiments delve into the intricacies ofiterative prompting variants, examining their influence on the accuracy andcalibration of model responses. Our findings reveal that naive promptingmethods significantly undermine truthfulness, leading to exacerbatedcalibration errors. In response to these challenges, we introduce severalprompting variants designed to address the identified issues. These variantsdemonstrate marked improvements over existing baselines, signaling a promisingdirection for future research. Our work provides a nuanced understanding ofiterative prompting and introduces novel approaches to enhance the truthfulnessof LLMs, thereby contributing to the development of more accurate andtrustworthy AI systems.