### Abstract

This paper investigates to what extent we can improve reinforcement learningalgorithms. Our study is split in three parts. First, our analysis shows thatthe classical asymptotic convergence rate $O(1/\sqrt{N})$ is pessimistic andcan be replaced by $O((\log(N)/N)^{\beta})$ with $\frac{1}{2}\leq \beta \leq 1$and $N$ the number of iterations. Second, we propose a dynamic optimal policyfor the choice of the learning rate $(\gamma_k)_{k\geq 0}$ used in stochasticalgorithms. We decompose our policy into two interacting levels: the inner andthe outer level. In the inner level, we present the PASS algorithm (for "PAstSign Search") which, based on a predefined sequence $(\gamma^o_k)_{k\geq 0}$,constructs a new sequence $(\gamma^i_k)_{k\geq 0}$ whose error decreasesfaster. In the outer level, we propose an optimal methodology for the selectionof the predefined sequence $(\gamma^o_k)_{k\geq 0}$. Third, we show empiricallythat our selection methodology of the learning rate outperforms significantlystandard algorithms used in reinforcement learning (RL) in the three followingapplications: the estimation of a drift, the optimal placement of limit ordersand the optimal execution of large number of shares.