Abstract
Although LLMs have shown great performance on Mathematics and Coding relatedreasoning tasks, the reasoning capabilities of LLMs regarding other forms ofreasoning are still an open problem. Here, we examine the issue of reasoningfrom the perspective of claim verification. We propose a framework designed tobreak down any claim paired with evidence into atomic reasoning types that arenecessary for verification. We use this framework to create RECV, the firstclaim verification benchmark, incorporating real-world claims, to assess thedeductive and abductive reasoning capabilities of LLMs. The benchmark comprisesof three datasets, covering reasoning problems of increasing complexity. Weevaluate three state-of-the-art proprietary LLMs under multiple promptsettings. Our results show that while LLMs can address deductive reasoningproblems, they consistently fail in cases of abductive reasoning. Moreover, weobserve that enhancing LLMs with rationale generation is not always beneficial.Nonetheless, we find that generated rationales are semantically similar tothose provided by humans, especially in deductive reasoning cases.