Conditioning of Reinforcement Learning Agents and its Policy Regularization Application

Abstract

The outcome of Jacobian singular values regularization was studied forsupervised learning problems. It also was shown that Jacobian conditioningregularization can help to avoid the ``mode-collapse'' problem in GenerativeAdversarial Networks. In this paper, we try to answer the following question:Can information about policy conditioning help to shape a more stable andgeneral policy of reinforcement learning agents? To answer this question, weconduct a study of Jacobian conditioning behavior during policy optimization.To the best of our knowledge, this is the first work that research conditionnumber in reinforcement learning agents. We propose a conditioningregularization algorithm and test its performance on the range of continuouscontrol tasks. Finally, we compare algorithms on the CoinRun environment withseparated train end test levels to analyze how conditioning regularizationcontributes to agents' generalization.

Quick Read (beta)

loading the full paper ...