Deep reinforcement learning for time series: playing idealized trading games

Abstract

Deep Q-learning is investigated as an end-to-end solution to estimate theoptimal strategies for acting on time series input. Experiments are conductedon two idealized trading games. 1) Univariate: the only input is a wave-likeprice time series, and 2) Bivariate: the input includes a random stepwise pricetime series and a noisy signal time series, which is positively correlated withfuture price changes. The Univariate game tests whether the agent can capturethe underlying dynamics, and the Bivariate game tests whether the agent canutilize the hidden relation among the inputs. Stacked Gated Recurrent Unit(GRU), Long Short-Term Memory (LSTM) units, Convolutional Neural Network (CNN),and multi-layer perceptron (MLP) are used to model Q values. For both games,all agents successfully find a profitable strategy. The GRU-based agents showbest overall performance in the Univariate game, while the MLP-based agentsoutperform others in the Bivariate game.

Quick Read (beta)

loading the full paper ...