Sequential Knockoffs for Variable Selection in Reinforcement Learning

Abstract

In real-world applications of reinforcement learning, it is often challengingto obtain a state representation that is parsimonious and satisfies the Markovproperty without prior knowledge. Consequently, it is common practice toconstruct a state larger than necessary, e.g., by concatenating measurementsover contiguous time points. However, needlessly increasing the dimension ofthe state may slow learning and obfuscate the learned policy. We introduce thenotion of a minimal sufficient state in a Markov decision process (MDP) as thesubvector of the original state under which the process remains an MDP andshares the same reward function as the original process. We propose a novelSEquEntial Knockoffs (SEEK) algorithm that estimates the minimal sufficientstate in a system with high-dimensional complex nonlinear dynamics. In largesamples, the proposed method achieves selection consistency. As the method isagnostic to the reinforcement learning algorithm being applied, it benefitsdownstream tasks such as policy learning. Empirical experiments verifytheoretical results and show the proposed approach outperforms severalcompeting methods regarding variable selection accuracy and regret.

Quick Read (beta)

loading the full paper ...