A Comprehensive Survey on Imbalanced Data Learning

  • 2025-09-12 10:37:30
  • Xinyi Gao, Dongting Xie, Yihang Zhang, Zhengren Wang, Chong Chen, Conghui He, Hongzhi Yin, Wentao Zhang
  • 0

Abstract

With the expansion of data availability, machine learning (ML) has achievedremarkable breakthroughs in both academia and industry. However, imbalanceddata distributions are prevalent in various types of raw data and severelyhinder the performance of ML by biasing the decision-making processes. Todeepen the understanding of imbalanced data and facilitate the related researchand applications, this survey systematically analyzes various real-world dataformats and concludes existing researches for different data formats into fourdistinct categories: data re-balancing, feature representation, trainingstrategy, and ensemble learning. This structured analysis helps researcherscomprehensively understand the pervasive nature of imbalance across diversedata formats, thereby paving a clearer path toward achieving specific researchgoals. We provide an overview of relevant open-source libraries, spotlightcurrent challenges, and offer novel insights aimed at fostering futureadvancements in this critical area of study.

 

Quick Read (beta)

loading the full paper ...