Abstract
Large Language Models (LLMs) are commonly used to generate solutions formathematical reasoning problems in the following formats: natural language,code, or a combination of both. In this paper, we explore fundamental questionsrelated to solving mathematical reasoning problems using natural language andcode with state-of-the-art LLMs, including GPT-4o-mini and LLama-3.1-8b-Turbo.Our findings show that LLMs are better at reasoning in natural languagecompared to code. Additionally, although natural language and code serve ascomplementary forms of reasoning, they can affect each other in a negative wayin certain scenarios. These insights motivate our development of a newprompting method, INC-Math, which leverages an LLM to dynamically select themost appropriate reasoning form, resulting in improved performance overcomparable baselines with GPT-4o-mini.