Using Text Embeddings for Causal Inference

Abstract

We address causal inference with text documents. For example, does adding atheorem to a paper affect its chance of acceptance? Does reporting the genderof a forum post author affect the popularity of the post? We estimate theseeffects from observational data, where they may be confounded by features ofthe text such as the subject or writing quality. Although the text suffices forcausal adjustment, it is prohibitively high-dimensional. The challenge is tofind a low-dimensional text representation that can be used in causalinference. A key insight is that causal adjustment requires only the aspects oftext that are predictive of both the treatment and outcome. Our proposed methodadapts deep language models to learn low-dimensional embeddings from text thatpredict these values well; these embeddings suffice for causal adjustment. Weestablish theoretical properties of this method. We study it empirically onsemi-simulated and real data on paper acceptance and forum post popularity.Code is available at https://github.com/blei-lab/causal-text-embeddings.

Quick Read (beta)

loading the full paper ...