Abstract
This paper addresses a critical challenge in the high-speed passenger railwayindustry: designing effective dynamic pricing strategies in the context ofcompeting and cooperating operators. To address this, a multi-agentreinforcement learning (MARL) framework based on a non-zero-sum Markov game isproposed, incorporating random utility models to capture passenger decisionmaking. Unlike prior studies in areas such as energy, airlines, and mobilenetworks, dynamic pricing for railway systems using deep reinforcement learninghas received limited attention. A key contribution of this paper is aparametrisable and versatile reinforcement learning simulator designed to modela variety of railway network configurations and demand patterns while enablingrealistic, microscopic modelling of user behaviour, called RailPricing-RL. Thisenvironment supports the proposed MARL framework, which models heterogeneousagents competing to maximise individual profits while fostering cooperativebehaviour to synchronise connecting services. Experimental results validate theframework, demonstrating how user preferences affect MARL performance and howpricing policies influence passenger choices, utility, and overall systemdynamics. This study provides a foundation for advancing dynamic pricingstrategies in railway systems, aligning profitability with system-wideefficiency, and supporting future research on optimising pricing policies.