Long-Term Mapping of the Douro River Plume with Multi-Agent Reinforcement Learning

Abstract

We study the problem of long-term (multiple days) mapping of a river plumeusing multiple autonomous underwater vehicles (AUVs), focusing on the Douroriver representative use-case. We propose an energy - and communication -efficient multi-agent reinforcement learning approach in which a centralcoordinator intermittently communicates with the AUVs, collecting measurementsand issuing commands. Our approach integrates spatiotemporal Gaussian processregression (GPR) with a multi-head Q-network controller that regulatesdirection and speed for each AUV. Simulations using the Delft3D ocean modeldemonstrate that our method consistently outperforms both single- andmulti-agent benchmarks, with scaling the number of agents both improving meansquared error (MSE) and operational endurance. In some instances, our algorithmdemonstrates that doubling the number of AUVs can more than double endurancewhile maintaining or improving accuracy, underscoring the benefits ofmulti-agent coordination. Our learned policies generalize across unseenseasonal regimes over different months and years, demonstrating promise forfuture developments of data-driven long-term monitoring of dynamic plumeenvironments.

Quick Read (beta)

loading the full paper ...