Efficient human-like semantic representations via the Information Bottleneck principle

Abstract

Maintaining efficient semantic representations of the environment is a majorchallenge both for humans and for machines. While human languages representuseful solutions to this problem, it is not yet clear what computationalprinciple could give rise to similar solutions in machines. In this work wepropose an answer to this open question. We suggest that languages compresspercepts into words by optimizing the Information Bottleneck (IB) tradeoffbetween the complexity and accuracy of their lexicons. We present empiricalevidence that this principle may give rise to human-like semanticrepresentations, by exploring how human languages categorize colors. We showthat color naming systems across languages are near-optimal in the IB sense,and that these natural systems are similar to artificial IB color namingsystems with a single tradeoff parameter controlling the cross-languagevariability. In addition, the IB systems evolve through a sequence ofstructural phase transitions, demonstrating a possible adaptation process. Thiswork thus identifies a computational principle that characterizes humansemantic systems, and that could usefully inform semantic representations inmachines.

Quick Read (beta)

loading the full paper ...