LLMs' Understanding of Natural Language Revealed

Abstract

Large language models (LLMs) are the result of a massive experiment inbottom-up, data-driven reverse engineering of language at scale. Despite theirutility in a number of downstream NLP tasks, ample research has shown that LLMsare incapable of performing reasoning in tasks that require quantification overand the manipulation of symbolic variables (e.g., planning and problemsolving); see for example [25][26]. In this document, however, we will focus ontesting LLMs for their language understanding capabilities, their supposedforte. As we will show here, the language understanding capabilities of LLMshave been widely exaggerated. While LLMs have proven to generate human-likecoherent language (since that's how they were designed), their languageunderstanding capabilities have not been properly tested. In particular, webelieve that the language understanding capabilities of LLMs should be testedby performing an operation that is the opposite of 'text generation' andspecifically by giving the LLM snippets of text as input and then querying whatthe LLM "understood". As we show here, when doing so it will become apparentthat LLMs do not truly understand language, beyond very superficial inferencesthat are essentially the byproduct of the memorization of massive amounts ofingested text.

Quick Read (beta)

loading the full paper ...