Abstract
Reinforcement learning demonstrated immense success in modelling complexphysics-driven systems, providing end-to-end trainable solutions by interactingwith a simulated or real environment, maximizing a scalar reward signal. Inthis work, we propose, building upon previous work, a multi-agent reinforcementlearning approach with assignment constraints for reconstructing particletracks in pixelated particle detectors. Our approach optimizes collaborativelya parametrized policy, functioning as a heuristic to a multidimensionalassignment problem, by jointly minimizing the total amount of particlescattering over the reconstructed tracks in a readout frame. To satisfyconstraints, guaranteeing a unique assignment of particle hits, we propose asafety layer solving a linear assignment problem for every joint action.Further, to enforce cost margins, increasing the distance of the local policiespredictions to the decision boundaries of the optimizer mappings, we recommendthe use of an additional component in the blackbox gradient estimation, forcingthe policy to solutions with lower total assignment costs. We empirically showon simulated data, generated for a particle detector developed for protonimaging, the effectiveness of our approach, compared to multiple single- andmulti-agent baselines. We further demonstrate the effectiveness of constraintswith cost margins for both optimization and generalization, introduced by widerregions with high reconstruction performance as well as reduced predictiveinstabilities. Our results form the basis for further developments in RL-basedtracking, offering both enhanced performance with constrained policies andgreater flexibility in optimizing tracking algorithms through the option forindividual and team rewards.