CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification

  • 2017-12-12 15:33:26
  • Farshid Rayhan, Sajid Ahmed, Asif Mahbub, Md. Rafsan Jani, Swakkhar Shatabda, Dewan Md. Farid
  • 27

Abstract

Class imbalance classification is a challenging research problem in datamining and machine learning, as most of the real-life datasets are oftenimbalanced in nature. Existing learning algorithms maximise the classificationaccuracy by correctly classifying the majority class, but misclassify theminority class. However, the minority class instances are representing theconcept with greater interest than the majority class instances in real-lifeapplications. Recently, several techniques based on sampling methods(under-sampling of the majority class and over-sampling the minority class),cost-sensitive learning methods, and ensemble learning have been used in theliterature for classifying imbalanced datasets. In this paper, we introduce anew clustering-based under-sampling approach with boosting (AdaBoost)algorithm, called CUSBoost, for effective imbalanced classification. Theproposed algorithm provides an alternative to RUSBoost (random under-samplingwith AdaBoost) and SMOTEBoost (synthetic minority over-sampling with AdaBoost)algorithms. We evaluated the performance of CUSBoost algorithm with thestate-of-the-art methods based on ensemble learning like AdaBoost, RUSBoost,SMOTEBoost on 13 imbalance binary and multi-class datasets with variousimbalance ratios. The experimental results show that the CUSBoost is apromising and effective approach for dealing with highly imbalanced datasets.

 

Quick Read (beta)

loading the full paper ...