Assessing the Impact of Prompting Methods on ChatGPT's Mathematical Capabilities

Abstract

This study critically evaluates the efficacy of prompting methods inenhancing the mathematical reasoning capability of large language models(LLMs). The investigation uses three prescriptive prompting methods - simple,persona, and conversational prompting - known for their effectiveness inenhancing the linguistic tasks of LLMs. We conduct this analysis on OpenAI'sLLM chatbot, ChatGPT-3.5, on extensive problem sets from the MATH, GSM8K, andMMLU datasets, encompassing a broad spectrum of mathematical challenges. Agrading script adapted to each dataset is used to determine the effectivenessof these prompting interventions in enhancing the model's mathematical analysispower. Contrary to expectations, our empirical analysis reveals that none ofthe investigated methods consistently improves over ChatGPT-3.5's baselineperformance, with some causing significant degradation. Our findings suggestthat prompting strategies do not necessarily generalize to new domains, in thisstudy failing to enhance mathematical performance.

Quick Read (beta)

loading the full paper ...