In recent years, a range of problems within the broad umbrella of automatic,computer vision based analysis of ancient coins has been attracting anincreasing amount of attention. Notwithstanding this research effort, theresults achieved by the state of the art in the published literature remainpoor and far from sufficiently well performing for any practical purpose. Inthe present paper we present a series of contributions which we believe willbenefit the interested community. Firstly, we explain that the approach ofvisual matching of coins, universally adopted in all existing published paperson the topic, is not of practical interest because the number of ancient cointypes exceeds by far the number of those types which have been imaged, be it indigital form (e.g. online) or otherwise (traditional film, in print, etc.).Rather, we argue that the focus should be on the understanding of the semanticcontent of coins. Hence, we describe a novel method which uses real-worldmultimodal input to extract and associate semantic concepts with the correctcoin images and then using a novel convolutional neural network learn theappearance of these concepts. Empirical evidence on a real-world and by far thelargest data set of ancient coins, we demonstrate highly promising results.