Advancements and limitations of LLMs in replicating human color-word associations

Abstract

Color-word associations play a fundamental role in human cognition and designapplications. Large Language Models (LLMs) have become widely available anddemonstrated intelligent behaviors in various benchmarks with naturalconversation skills. However, their ability to replicate human color-wordassociations remains understudied. We compared multiple generations of LLMs(from GPT-3 to GPT-4o) against human color-word associations using datacollected from over 10,000 Japanese participants, involving 17 colors and wordsfrom eight categories in Japanese. Our findings reveal a clear progression inLLM performance across generations, with GPT-4o achieving the highest accuracyin predicting the best voted word for each color and category. However, thehighest median performance was approximately 50% even for GPT-4o with visualinputs (chance level is 10%), and the performance levels varied significantlyacross word categories and colors, indicating a failure to fully replicatehuman color-word associations. On the other hand, color discrimination abilityestimated from our color-word association data showed that LLMs demonstratedhigh correlation with human color discrimination patterns, similarly toprevious studies. Our study highlights both the advancements in LLMcapabilities and their persistent limitations, suggesting differences insemantic memory structures between humans and LLMs in representing color-wordassociations.

Quick Read (beta)

loading the full paper ...