Abstract
We introduce CHARTOM, a visual theory-of-mind benchmark for multimodal largelanguage models. CHARTOM consists of specially designed data visualizingcharts. Given a chart, a language model needs to not only correctly comprehendthe chart (the FACT question) but also judge if the chart will be misleading toa human reader (the MIND question). Both questions have significant societalbenefits. We detail the construction of the CHARTOM benchmark including itscalibration on human performance.
Quick Read (beta)
loading the full paper ...