In many information extraction applications, entity linking (EL) has emergedas a crucial task that allows leveraging information about named entities froma knowledge base. In this paper, we address the task of multimodal entitylinking (MEL), an emerging research field in which textual and visualinformation is used to map an ambiguous mention to an entity in a knowledgebase (KB). First, we propose a method for building a fully annotated Twitterdataset for MEL, where entities are defined in a Twitter KB. Then, we propose amodel for jointly learning a representation of both mentions and entities fromtheir textual and visual contexts. We demonstrate the effectiveness of theproposed model by evaluating it on the proposed dataset and highlight theimportance of leveraging visual information when it is available.