Abstract
Studies have underscored how, regardless of the recent breakthrough and swiftadvances in AI research, even state-of-the-art Large Language models (LLMs)continue to struggle when performing logical and mathematical reasoning. Theresults seem to suggest that LLMs still work as (highly advanced) data patternidentifiers, scoring poorly when attempting to generalise and solve reasoningproblems the models have never previously seen or that are not close to samplespresented in their training data. To address this compelling concern, thispaper makes use of the notion of critical questions from the literature onargumentation theory, focusing in particular on Toulmin's model ofargumentation. We show that employing these critical questions can improve thereasoning capabilities of LLMs. By probing the rationale behind the models'reasoning process, the LLM can assess whether some logical mistake is occurringand correct it before providing the final reply to the user prompt. Theunderlying idea is drawn from the gold standard of any valid argumentativeprocedure: the conclusion is valid if it is entailed by accepted premises. Or,to paraphrase such Aristotelian principle in a real-world approximation,characterised by incomplete information and presumptive logic, the conclusionis valid if not proved otherwise. This approach successfully steers the models'output through a reasoning pipeline, resulting in better performance againstthe baseline and its Chain-of-Thought (CoT) implementation. To this end, anextensive evaluation of the proposed approach on the MT-Bench Reasoning andMath tasks across a range of LLMs is provided.