Abstract
Reinforcement Learning (RL) has proven largely effective in obtaining stablelocomotion gaits for legged robots. However, designing control algorithms whichcan robustly navigate unseen environments with obstacles remains an ongoingproblem within quadruped locomotion. To tackle this, it is convenient to solvenavigation tasks by means of a hierarchical approach with a low-levellocomotion policy and a high-level navigation policy. Crucially, the high-levelpolicy needs to be robust to dynamic obstacles along the path of the agent. Inthis work, we propose a novel way to endow navigation policies with robustnessby a training process that models obstacles as adversarial agents, followingthe adversarial RL paradigm. Importantly, to improve the reliability of thetraining process, we bound the rationality of the adversarial agent resortingto quantal response equilibria, and place a curriculum over its rationality. Wecalled this method Hierarchical policies via Quantal response AdversarialReinforcement Learning (Hi-QARL). We demonstrate the robustness of our methodby benchmarking it in unseen randomized mazes with multiple obstacles. To proveits applicability in real scenarios, our method is applied on a Unitree GO1robot in simulation.