Abstract
Distributed computing systems are essential for meeting the demands of modernapplications, yet transitioning from single-system to distributed environmentspresents significant challenges. Misallocating resources in shared systems canlead to resource contention, system instability, degraded performance, priorityinversion, inefficient utilization, increased latency, and environmentalimpact. We present BanditWare, an online recommendation system that dynamicallyselects the most suitable hardware for applications using a contextualmulti-armed bandit algorithm. BanditWare balances exploration and exploitation,gradually refining its hardware recommendations based on observed applicationperformance while continuing to explore potentially better options. Unliketraditional statistical and machine learning approaches that rely heavily onlarge historical datasets, BanditWare operates online, learning and adapting inreal-time as new workloads arrive. We evaluated BanditWare on three workflow applications: Cycles (anagricultural science scientific workflow) BurnPro3D (a web-based platform forfire science) and a matrix multiplication application. Designed for seamlessintegration with the National Data Platform (NDP), BanditWare enables users ofall experience levels to optimize resource allocation efficiently.