Improving reinforcement learning algorithms: towards optimal learning rate policies

Abstract

This paper investigates to what extent one can improve reinforcement learningalgorithms. Our study is split in three parts. First, our analysis shows thatthe classical asymptotic convergence rate $O(1/\sqrt{N})$ is pessimistic andcan be replaced by $O((\log(N)/N)^{\beta})$ with $\frac{1}{2}\leq \beta \leq 1$and $N$ the number of iterations. Second, we propose a dynamic optimal policyfor the choice of the learning rate $(\gamma_k)_{k\geq 0}$ used in stochasticapproximation (SA). We decompose our policy into two interacting levels: theinner and the outer level. In the inner level, we present the\nameref{Alg:v_4_s} algorithm (for "PAst Sign Search") which, based on apredefined sequence $(\gamma^o_k)_{k\geq 0}$, constructs a new sequence$(\gamma^i_k)_{k\geq 0}$ whose error decreases faster. In the outer level, wepropose an optimal methodology for the selection of the predefined sequence$(\gamma^o_k)_{k\geq 0}$. Third, we show empirically that our selectionmethodology of the learning rate outperforms significantly standard algorithmsused in reinforcement learning (RL) in the three following applications: theestimation of a drift, the optimal placement of limit orders and the optimalexecution of large number of shares.

Quick Read (beta)

loading the full paper ...