Abstract
Deep Reinforcement Learning (DRL) has emerged as a promising approach forhandling highly dynamic and nonlinear Active Flow Control (AFC) problems.However, the computational cost associated with training DRL models presents asignificant performance bottleneck. To address this challenge and enableefficient scaling on high-performance computing architectures, this studyfocuses on optimizing DRL-based algorithms in parallel settings. We validate anexisting state-of-the-art DRL framework used for AFC problems and discuss itsefficiency bottlenecks. Subsequently, by deconstructing the overall frameworkand conducting extensive scalability benchmarks for individual components, weinvestigate various hybrid parallelization configurations and propose efficientparallelization strategies. Moreover, we refine input/output (I/O) operationsin multi-environment DRL training to tackle critical overhead associated withdata movement. Finally, we demonstrate the optimized framework for a typicalAFC problem where near-linear scaling can be obtained for the overallframework. We achieve a significant boost in parallel efficiency from around49% to approximately 78%, and the training process is accelerated byapproximately 47 times using 60 central processing unit (CPU) cores. Thesefindings are expected to provide valuable insights for further advancements inDRL-based AFC studies. Consequently, it continues to be a prominent andactively studied problem of significant interest.