Abstract
Existing methods for estimating personalized treatment effects typically relyon structured covariates, limiting their applicability to unstructured data.Yet, leveraging unstructured data for causal inference has considerableapplication potential, for instance in healthcare, where clinical notes ormedical images are abundant. To this end, we first introduce an approximate'plug-in' method trained directly on the neural representations of unstructureddata. However, when these fail to capture all confounding information, themethod may be subject to confounding bias. We therefore introduce twotheoretically grounded estimators that leverage structured measurements of theconfounders during training, but allow estimating personalized treatmenteffects purely from unstructured inputs, while avoiding confounding bias. Whenthese structured measurements are only available for a non-representativesubset of the data, these estimators may suffer from sampling bias. To addressthis, we further introduce a regression-based correction that accounts for thenon-uniform sampling, assuming the sampling mechanism is known or can bewell-estimated. Our experiments on two benchmark datasets show that the plug-inmethod, directly trainable on large unstructured datasets, achieves strongempirical performance across all settings, despite its simplicity.