Efficient Reinforcement Learning in Factored MDPs with Application to Constrained RL

  • 2020-09-15 07:09:43
  • Xiaoyu Chen, Jiachen Hu, Lihong Li, Liwei Wang
  • 0

Abstract

Reinforcement learning (RL) in episodic, factored Markov decision processes(FMDPs) is studied. We propose an algorithm called FMDP-BF, which leverages thefactorization structure of FMDP. The regret of FMDP-BF is shown to beexponentially smaller than that of optimal algorithms designed for non-factoredMDPs, and improves on the best previous result for FMDPs~\citep{osband2014near}by a factored of $\sqrt{H|\mathcal{S}_i|}$, where $|\mathcal{S}_i|$ is thecardinality of the factored state subspace and $H$ is the planning horizon. Toshow the optimality of our bounds, we also provide a lower bound for FMDP,which indicates that our algorithm is near-optimal w.r.t. timestep $T$, horizon$H$ and factored state-action subspace cardinality. Finally, as an application,we study a new formulation of constrained RL, known as RL with knapsackconstraints (RLwK), and provides the first sample-efficient algorithm based onFMDP-BF.

 

Quick Read (beta)

loading the full paper ...