Why resampling outperforms reweighting for correcting sampling bias

Abstract

A data set sampled from a certain population is biased if the subgroups ofthe population are sampled at proportions that are significantly different fromtheir underlying proportions. Training machine learning models on biased datasets requires correction techniques to compensate for potential biases. Weconsider two commonly-used techniques, resampling and reweighting, thatrebalance the proportions of the subgroups to maintain the desired objectivefunction. Though statistically equivalent, it has been observed thatreweighting outperforms resampling when combined with stochastic gradientalgorithms. By analyzing illustrative examples, we explain the reason behindthis phenomenon using tools from dynamical stability and stochasticasymptotics. We also present experiments from regression, classification, andoff-policy prediction to demonstrate that this is a general phenomenon. Weargue that it is imperative to consider the objective function design and theoptimization algorithm together while addressing the sampling bias.

Quick Read (beta)

loading the full paper ...