Trustworthy Reinforcement Learning Against Intrinsic Vulnerabilities: Robustness, Safety, and Generalizability

Abstract

A trustworthy reinforcement learning algorithm should be competent in solvingchallenging real-world problems, including {robustly} handling uncertainties,satisfying {safety} constraints to avoid catastrophic failures, and{generalizing} to unseen scenarios during deployments. This study aims tooverview these main perspectives of trustworthy reinforcement learningconsidering its intrinsic vulnerabilities on robustness, safety, andgeneralizability. In particular, we give rigorous formulations, categorizecorresponding methodologies, and discuss benchmarks for each perspective.Moreover, we provide an outlook section to spur promising future directionswith a brief discussion on extrinsic vulnerabilities considering humanfeedback. We hope this survey could bring together separate threads of studiestogether in a unified framework and promote the trustworthiness ofreinforcement learning.

Quick Read (beta)

loading the full paper ...