Counterfactual Normalization: Proactively Addressing Dataset Shift and Improving Reliability Using Causal Mechanisms

Abstract

Predictive models can fail to generalize from training to deploymentenvironments because of dataset shift, posing a threat to model reliability andthe safety of downstream decisions made in practice. Instead of using samplesfrom the target distribution to reactively correct dataset shift, we usegraphical knowledge of the causal mechanisms relating variables in a predictionproblem to proactively remove relationships that do not generalize acrossenvironments, even when these relationships may depend on unobserved variables(violations of the "no unobserved confounders" assumption). To accomplish this,we identify variables with unstable paths of statistical influence and removethem from the model. We also augment the causal graph with latentcounterfactual variables that isolate unstable paths of statistical influence,allowing us to retain stable paths that would otherwise be removed. Ourexperiments demonstrate that models that remove vulnerable variables and useestimates of the latent variables transfer better, often outperforming in thetarget domain despite some accuracy loss in the training domain.

Quick Read (beta)

loading the full paper ...