A replica analysis of under-bagging

Abstract

A sharp asymptotics of the under-bagging (UB) method, which is a popularensemble learning method for training classifiers from an imbalanced data, isderived and used to compare with several other standard methods for learningfrom imbalanced data, in the scenario where a linear classifier is trained froma binary mixture data. The methods compared include the under-sampling (US)method, which trains a model using a single realization of the subsampleddataset, and the simple weighting (SW) method, which trains a model with aweighted loss on the entire data. It is shown that the performance of UB isimproved by increasing the size of the majority class, even if the classimbalance can be large, especially when the size of the minority class issmall. This is in contrast to US, whose performance does not change as the sizeof the majority class increases, and SW, whose performance decreases as theimbalance increases. These results are different from the case of the naivebagging in training generalized linear models without considering the structureof class imbalance, indicating the intrinsic difference between the ensemblingand the direct regularization on the parameters.

Quick Read (beta)

loading the full paper ...