Abstract
Reinforcement learning (RL) agents are powerful tools for managing powergrids. They use large amounts of data to inform their actions and receiverewards or penalties as feedback to learn favorable responses for the system.Once trained, these agents can efficiently make decisions that would be toocomputationally complex for a human operator. This ability is especiallyvaluable in decarbonizing power networks, where the demand for RL agents isincreasing. These agents are well suited to control grid actions since theaction space is constantly growing due to uncertainties in renewablegeneration, microgrid integration, and cybersecurity threats. To assess theefficacy of RL agents in response to an adverse grid event, we use the Grid2Opplatform for agent training. We employ a proximal policy optimization (PPO)algorithm in conjunction with graph neural networks (GNNs). By simulatingagents' responses to grid events, we assess their performance in avoiding gridfailure for as long as possible. The performance of an agent is expressedconcisely through its reward function, which helps the agent learn the mostoptimal ways to reconfigure a grid's topology amidst certain events. To modelmulti-actor scenarios that threaten modern power networks, particularly thoseresulting from cyberattacks, we integrate an opponent that acts iterativelyagainst a given agent. This interplay between the RL agent and opponent isutilized in N-k contingency screening, providing a novel alternative to thetraditional security assessment.