Emojis are small images that are commonly included in social media textmessages. The combination of visual and textual content in the same messagebuilds up a modern way of communication, that automatic systems are not used todeal with. In this paper we extend recent advances in emoji prediction byputting forward a multimodal approach that is able to predict emojis inInstagram posts. Instagram posts are composed of pictures together with textswhich sometimes include emojis. We show that these emojis can be predicted byusing the text, but also using the picture. Our main finding is thatincorporating the two synergistic modalities, in a combined model, improvesaccuracy in an emoji prediction task. This result demonstrates that these twomodalities (text and images) encode different information on the use of emojisand therefore can complement each other.