Safe Reinforcement Learning on the Constraint Manifold: Theory and Applications

Abstract

Integrating learning-based techniques, especially reinforcement learning,into robotics is promising for solving complex problems in unstructuredenvironments. However, most existing approaches are trained in well-tunedsimulators and subsequently deployed on real robots without online fine-tuning.In this setting, extensive engineering is required to mitigate the sim-to-realgap, which can be challenging for complex systems. Instead, learning withreal-world interaction data offers a promising alternative: it not onlyeliminates the need for a fine-tuned simulator but also applies to a broaderrange of tasks where accurate modeling is unfeasible. One major problem foron-robot reinforcement learning is ensuring safety, as uncontrolled explorationcan cause catastrophic damage to the robot or the environment. Indeed, safetyspecifications, often represented as constraints, can be complex andnon-linear, making safety challenging to guarantee in learning systems. In thispaper, we show how we can impose complex safety constraints on learning-basedrobotics systems in a principled manner, both from theoretical and practicalpoints of view. Our approach is based on the concept of the ConstraintManifold, representing the set of safe robot configurations. Exploitingdifferential geometry techniques, i.e., the tangent space, we can construct asafe action space, allowing learning agents to sample arbitrary actions whileensuring safety. We demonstrate the method's effectiveness in a real-worldRobot Air Hockey task, showing that our method can handle high-dimensionaltasks with complex constraints. Videos of the real robot experiments areavailable on the project website (https://puzeliu.github.io/TRO-ATACOM).

Quick Read (beta)

loading the full paper ...