Compositional Evaluation on Japanese Textual Entailment and Similarity

Abstract

Natural Language Inference (NLI) and Semantic Textual Similarity (STS) arewidely used benchmark tasks for compositional evaluation of pre-trainedlanguage models. Despite growing interest in linguistic universals, mostNLI/STS studies have focused almost exclusively on English. In particular,there are no available multilingual NLI/STS datasets in Japanese, which istypologically different from English and can shed light on the currentlycontroversial behavior of language models in matters such as sensitivity toword order and case particles. Against this background, we introduce JSICK, aJapanese NLI/STS dataset that was manually translated from the English datasetSICK. We also present a stress-test dataset for compositional inference,created by transforming syntactic structures of sentences in JSICK toinvestigate whether language models are sensitive to word order and caseparticles. We conduct baseline experiments on different pre-trained languagemodels and compare the performance of multilingual models when applied toJapanese and other languages. The results of the stress-test experimentssuggest that the current pre-trained language models are insensitive to wordorder and case marking.

Quick Read (beta)

loading the full paper ...