### Abstract

Energy markets can provide incentives for undesired behavior of marketparticipants. Multi-agent Reinforcement learning (MARL) is a promising newapproach to predicting the expected behavior of energy market participants.However, reinforcement learning requires many interactions with the system toconverge, and the power system environment often consists of extensivecomputations, e.g., optimal power flow (OPF) calculation for market clearing.To tackle this complexity, we provide a model of the energy market to a basicMARL algorithm in the form of a learned OPF approximation and explicit marketrules. The learned OPF surrogate model makes an explicit solving of the OPFcompletely unnecessary. Our experiments demonstrate that the model additionallyreduces training time by about one order of magnitude but at the cost of aslightly worse approximation of the Nash equilibrium. Potential applications ofour method are market design, more realistic modeling of market participants,and analysis of manipulative behavior.