Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach

Abstract

Deep reinforcement learning agents achieve state-of-the-art performance in awide range of simulated control tasks. However, successful applications toreal-world problems remain limited. One reason for this dichotomy is becausethe learned policies are not robust to observation noise or adversarialattacks. In this paper, we investigate the robustness of deep RL policies to asingle small state perturbation in deterministic continuous control tasks. Wedemonstrate that RL policies can be deterministically chaotic as smallperturbations to the system state have a large impact on subsequent state andreward trajectories. This unstable non-linear behaviour has two consequences:First, inaccuracies in sensor readings, or adversarial attacks, can causesignificant performance degradation; Second, even policies that show robustperformance in terms of rewards may have unpredictable behaviour in practice.These two facets of chaos in RL policies drastically restrict the applicationof deep RL to real-world problems. To address this issue, we propose animprovement on the successful Dreamer V3 architecture, implementing a MaximalLyapunov Exponent regularisation. This new approach reduces the chaotic statedynamics, rendering the learnt policies more resilient to sensor noise oradversarial attacks and thereby improving the suitability of Deep ReinforcementLearning for real-world applications.

Quick Read (beta)

loading the full paper ...