Deep Reinforcement Learning with Smooth Policy

Abstract

Deep neural networks have been widely adopted in modern reinforcementlearning (RL) algorithms with great empirical successes in various domains.However, the large search space of training a neural network requires asignificant amount of data, which makes the current RL algorithms not sampleefficient. Motivated by the fact that many environments with continuous statespace have smooth transitions, we propose to learn a smooth policy that behavessmoothly with respect to states. In contrast to policies parameterized bylinear/reproducing kernel functions, where simple regularization techniquessuffice to control smoothness, for neural network based reinforcement learningalgorithms, there is no readily available solution to learn a smooth policy. Inthis paper, we develop a new training framework --- $\textbf{S}$mooth$\textbf{R}$egularized $\textbf{R}$einforcement $\textbf{L}$earning($\textbf{SR}^2\textbf{L}$), where the policy is trained withsmoothness-inducing regularization. Such regularization effectively constrainsthe search space of the learning algorithms and enforces smoothness in thelearned policy. We apply the proposed framework to both on-policy (TRPO) andoff-policy algorithm (DDPG). Through extensive experiments, we demonstrate thatour method achieves improved sample efficiency.

Quick Read (beta)

loading the full paper ...