Abstract
Achieving safe and coordinated behavior in dynamic, constraint-richenvironments remains a major challenge for learning-based control. Pureend-to-end learning often suffers from poor sample efficiency and limitedreliability, while model-based methods depend on predefined references andstruggle to generalize. We propose a hierarchical framework that combinestactical decision-making via reinforcement learning (RL) with low-levelexecution through Model Predictive Control (MPC). For the case of multi-agentsystems this means that high-level policies select abstract targets fromstructured regions of interest (ROIs), while MPC ensures dynamically feasibleand safe motion. Tested on a predator-prey benchmark, our approach outperformsend-to-end and shielding-based RL baselines in terms of reward, safety, andconsistency, underscoring the benefits of combining structured learning withmodel-based control.