Abstract
Sensory language expresses embodied experiences ranging from taste and soundto excitement and stomachache. This language is of interest to scholars from awide range of domains including robotics, narratology, linguistics, andcognitive science. In this work, we explore whether language models, which arenot embodied, can approximate human use of embodied language. We extend anexisting corpus of parallel human and model responses to short story promptswith an additional 18,000 stories generated by 18 popular models. We find thatall models generate stories that differ significantly from human usage ofsensory language, but the direction of these differences varies considerablybetween model families. Namely, Gemini models use significantly more sensorylanguage than humans along most axes whereas most models from the remainingfive families use significantly less. Linear probes run on five models suggestthat they are capable of identifying sensory language. However, we findpreliminary evidence suggesting that instruction tuning may discourage usage ofsensory language. Finally, to support further work, we release our expandedstory dataset.