Abstract
Extracting entities and their relations from text is an important task forunderstanding massive text corpora. Open information extraction (IE) systemsmine relation tuples (i.e., entity arguments and a predicate string to describetheir relation) from sentences. However, current open IE systems ignore thefact that global statistics in a large corpus can be collectively leveraged toidentify high-quality sentence-level extractions. In this paper, we propose anovel open IE system, called ReMine, which integrates local context signal andglobal structural signal in a unified framework with distant supervision. Thenew system can be efficiently applied to different domains as it uses factsfrom external knowledge bases as supervision; and can effectively scoresentence-level tuple extractions based on corpus-level statistics.Specifically, we design a joint optimization problem to unify (1) segmentingentity/relation phrases in individual sentences based on local context; and (2)measuring the quality of sentence-level extractions with a translating-basedobjective. Experiments on real-world corpora from different domains demonstratethe effectiveness and robustness of ReMine when compared to other open IEsystems.