Mis- and disinformation online have become a major societal problem as majorsources of online harms of different kinds. One common form of mis- anddisinformation is out-of-context (OOC) information, where different pieces ofinformation are falsely associated, e.g., a real image combined with a falsetextual caption or a misleading textual description. Although some past studieshave attempted to defend against OOC mis- and disinformation through externalevidence, they tend to disregard the role of different pieces of evidence withdifferent stances. Motivated by the intuition that the stance of evidencerepresents a bias towards different detection results, we propose a stanceextraction network (SEN) that can extract the stances of different pieces ofmulti-modal evidence in a unified framework. Moreover, we introduce asupport-refutation score calculated based on the co-occurrence relations ofnamed entities into the textual SEN. Extensive experiments on a publiclarge-scale dataset demonstrated that our proposed method outperformed thestate-of-the-art baselines, with the best model achieving a performance gain of3.2% in accuracy.