Improving reinforcement learning algorithms: towards optimal learning rate policies

  • 2021-10-17 00:44:56
  • Othmane Mounjid, Charles-Albert Lehalle
  • 0

Abstract

This paper investigates to what extent one can improve reinforcement learningalgorithms. Our study is split in three parts. First, our analysis shows thatthe classical asymptotic convergence rate $O(1/\sqrt{N})$ is pessimistic andcan be replaced by $O((\log(N)/N)^{\beta})$ with $\frac{1}{2}\leq \beta \leq 1$and $N$ the number of iterations. Second, we propose a dynamic optimal policyfor the choice of the learning rate $(\gamma_k)_{k\geq 0}$ used in stochasticapproximation (SA). We decompose our policy into two interacting levels: theinner and the outer level. In the inner level, we present the\nameref{Alg:v_4_s} algorithm (for "PAst Sign Search") which, based on apredefined sequence $(\gamma^o_k)_{k\geq 0}$, constructs a new sequence$(\gamma^i_k)_{k\geq 0}$ whose error decreases faster. In the outer level, wepropose an optimal methodology for the selection of the predefined sequence$(\gamma^o_k)_{k\geq 0}$. Third, we show empirically that our selectionmethodology of the learning rate outperforms significantly standard algorithmsused in reinforcement learning (RL) in the three following applications: theestimation of a drift, the optimal placement of limit orders and the optimalexecution of large number of shares.

 

Quick Read (beta)

loading the full paper ...