Actively Learning Reinforcement Learning: A Stochastic Optimal Control Approach

Abstract

In this paper we propose a framework towards achieving two intertwinedobjectives: (i) equipping reinforcement learning with active exploration anddeliberate information gathering, such that it regulates state and parameteruncertainties resulting from modeling mismatches and noisy sensory; and (ii)overcoming the computational intractability of stochastic optimal control. Weapproach both objectives by using reinforcement learning to compute thestochastic optimal control law. On one hand, we avoid the curse ofdimensionality prohibiting the direct solution of the stochastic dynamicprogramming equation. On the other hand, the resulting stochastic optimalcontrol reinforcement learning agent admits caution and probing, that is,optimal online exploration and exploitation. Unlike fixed exploration andexploitation balance, caution and probing are employed automatically by thecontroller in real-time, even after the learning process is terminated. Weconclude the paper with a numerical simulation, illustrating how a LinearQuadratic Regulator with the certainty equivalence assumption may lead to poorperformance and filter divergence, while our proposed approach is stabilizing,of an acceptable performance, and computationally convenient.

Quick Read (beta)

loading the full paper ...