Large scale continuous-time mean-variance portfolio allocation via reinforcement learning

Abstract

We propose to solve large scale Markowitz mean-variance (MV) portfolioallocation problem using reinforcement learning (RL). By adopting the recentlydeveloped continuous-time exploratory control framework, we formulate theexploratory MV problem in high dimensions. We further show the optimality of amultivariate Gaussian feedback policy, with time-decaying variance, in tradingoff exploration and exploitation. Based on a provable policy improvementtheorem, we devise a scalable and data-efficient RL algorithm and conduct largescale empirical tests using data from the S&P 500 stocks. We found that ourmethod consistently achieves over 10% annualized returns and it outperformseconometric methods and the deep RL method by large margins, for both long andmedium terms of investment with monthly and daily trading.

Quick Read (beta)

loading the full paper ...