Totally Looks Like - How Humans Compare, Compared to Machines

Abstract

Perceptual judgment of image similarity by humans relies on a rich internalrepresentations ranging from low-level features to high-level concepts, sceneproperties and even cultural associations. Existing methods and datasetsattempting to explain perceived similarity use stimuli which arguably do notcover the full breadth of factors that affect human similarity judgments, eventhose geared toward this goal. We introduce a new dataset dubbed\textbf{Totally-Looks-Like} (TTL) after a popular entertainment website, whichcontains images paired by humans as being visually similar. The datasetcontains 6016 image-pairs from the wild, shedding light upon a rich and diverseset of criteria employed by human beings. We conduct experiments to try toreproduce the pairings via features extracted from state-of-the-art deepconvolutional neural networks, as well as additional human experiments toverify the consistency of the collected data. Though we create conditions toartificially make the matching task increasingly easier, we show thatmachine-extracted representations perform very poorly in terms of reproducingthe matching selected by humans. We discuss and analyze these results,suggesting future directions for improvement of learned image representations.

Quick Read (beta)

loading the full paper ...