Optimistic MLE -- A Generic Model-based Algorithm for Partially Observable Sequential Decision Making

Abstract

This paper introduces a simple efficient learning algorithms for generalsequential decision making. The algorithm combines Optimism for explorationwith Maximum Likelihood Estimation for model estimation, which is thus namedOMLE. We prove that OMLE learns the near-optimal policies of an enormously richclass of sequential decision making problems in a polynomial number of samples.This rich class includes not only a majority of known tractable model-basedReinforcement Learning (RL) problems (such as tabular MDPs, factored MDPs, lowwitness rank problems, tabular weakly-revealing/observable POMDPs andmulti-step decodable POMDPs), but also many new challenging RL problemsespecially in the partially observable setting that were not previously knownto be tractable. Notably, the new problems addressed by this paper include (1) observablePOMDPs with continuous observation and function approximation, where we achievethe first sample complexity that is completely independent of the size ofobservation space; (2) well-conditioned low-rank sequential decision makingproblems (also known as Predictive State Representations (PSRs)), which includeand generalize all known tractable POMDP examples under a more intrinsicrepresentation; (3) general sequential decision making problems under SAILcondition, which unifies our existing understandings of model-based RL in bothfully observable and partially observable settings. SAIL condition isidentified by this paper, which can be viewed as a natural generalization ofBellman/witness rank to address partial observability. This paper also presentsa reward-free variant of OMLE algorithm, which learns approximate dynamicmodels that enable the computation of near-optimal policies for all rewardfunctions simultaneously.

Quick Read (beta)

loading the full paper ...