Abstract
Recently, much work has concerned itself with the enigma of what exactlypretrained language models~(PLMs) learn about different aspects of language,and how they learn it. One stream of this type of research investigates theknowledge that PLMs have about semantic relations. However, many aspects ofsemantic relations were left unexplored. Generally, only one relation has beenconsidered, namely hypernymy. Furthermore, previous work did not measurehumans' performance on the same task as that performed by the PLMs. This meansthat at this point in time, there is only an incomplete view of the extent ofthese models' semantic relation knowledge. To address this gap, we introduce acomprehensive evaluation framework covering five relations beyond hypernymy,namely hyponymy, holonymy, meronymy, antonymy, and synonymy. We use fivemetrics (two newly introduced here) for recently untreated aspects of semanticrelation knowledge, namely soundness, completeness, symmetry, prototypicality,and distinguishability. Using these, we can fairly compare humans and models onthe same task. Our extensive experiments involve six PLMs, four masked and twocausal language models. The results reveal a significant knowledge gap betweenhumans and models for all semantic relations. In general, causal languagemodels, despite their wide use, do not always perform significantly better thanmasked language models. Antonymy is the outlier relation where all modelsperform reasonably well.