Imbalance-XGBoost: Leveraging Weighted and Focal Losses for Binary Label-Imbalanced Classification with XGBoost

Abstract

The paper presents Imbalance-XGBoost, a Python package that combines thepowerful XGBoost software with weighted and focal losses to tackle binarylabel-imbalanced classification tasks. Though a small-scale program in terms ofsize, the package is, to the best of the authors' knowledge, the first of itskind which provides an integrated implementation for the two losses on XGBoostand brings a general-purpose extension on XGBoost for label-imbalancedscenarios. In this paper, the design and usage of the package are describedwith exemplar code listings, and its convenience to be integrated intoPython-driven Machine Learning projects is illustrated. Furthermore, as thefirst- and second-order derivatives of the loss functions are essential for theimplementations, the algebraic derivation is discussed and it can be deemed asa separate algorithmic contribution. The performances of the algorithmsimplemented in the package are empirically evaluated on Parkinson's diseaseclassification data set, and multiple state-of-the-art performances have beenobserved. Given the scalable nature of XGBoost, the package has greatpotentials to be applied to real-life binary classification tasks, which areusually of large-scale and label-imbalanced.

Quick Read (beta)

loading the full paper ...