Abstract
Reinforcement learning algorithms describe how an agent can learn an optimalaction policy in a sequential decision process, through repeated experience. Ina given environment, the agent policy provides him some running and terminalrewards. As in online learning, the agent learns sequentially. As inmulti-armed bandit problems, when an agent picks an action, he can not inferex-post the rewards induced by other action choices. In reinforcement learning,his actions have consequences: they influence not only rewards, but also futurestates of the world. The goal of reinforcement learning is to find an optimalpolicy -- a mapping from the states of the world to the set of actions, inorder to maximize cumulative reward, which is a long term strategy. Exploringmight be sub-optimal on a short-term horizon but could lead to optimallong-term ones. Many problems of optimal control, popular in economics for morethan forty years, can be expressed in the reinforcement learning framework, andrecent advances in computational science, provided in particular by deeplearning algorithms, can be used by economists in order to solve complexbehavioral problems. In this article, we propose a state-of-the-art ofreinforcement learning techniques, and present applications in economics, gametheory, operation research and finance.