Improving reinforcement learning algorithms: towards optimal learning rate policies

Abstract

This paper investigates to what extent we can improve reinforcement learningalgorithms. Our study is split in three parts. First, our analysis shows thatthe classical asymptotic convergence rate $O(1/\sqrt{N})$ is pessimistic andcan be replaced by $O((\log(N)/N)^{\beta})$ with $\frac{1}{2}\leq \beta \leq 1$and $N$ the number of iterations. Second, we propose a dynamic optimal policyfor the choice of the learning rate $(\gamma_k)_{k\geq 0}$ used in stochasticalgorithms. We decompose our policy into two interacting levels: the inner andthe outer level. In the inner level, we present the PASS algorithm (for "PAstSign Search") which, based on a predefined sequence $(\gamma^o_k)_{k\geq 0}$,constructs a new sequence $(\gamma^i_k)_{k\geq 0}$ whose error decreasesfaster. In the outer level, we propose an optimal methodology for the selectionof the predefined sequence $(\gamma^o_k)_{k\geq 0}$. Third, we show empiricallythat our selection methodology of the learning rate outperforms significantlystandard algorithms used in reinforcement learning (RL) in the three followingapplications: the estimation of a drift, the optimal placement of limit ordersand the optimal execution of large number of shares.

Quick Read (beta)

loading the full paper ...