On overfitting and asymptotic bias in batch reinforcement learning with partial observability

  • 2019-02-06 18:30:04
  • Vincent Francois-Lavet, Guillaume Rabusseau, Joelle Pineau, Damien Ernst, Raphael Fonteneau
  • 0


This paper provides an analysis of the tradeoff between asymptotic bias(suboptimality with unlimited data) and overfitting (additional suboptimalitydue to limited data) in the context of reinforcement learning with partialobservability. Our theoretical analysis formally characterizes that whilepotentially increasing the asymptotic bias, a smaller state representationdecreases the risk of overfitting. This analysis relies on expressing thequality of a state representation by bounding L1 error terms of the associatedbelief states. Theoretical results are empirically illustrated when the staterepresentation is a truncated history of observations, both on synthetic POMDPsand on a large-scale POMDP in the context of smartgrids, with real-world data.Finally, similarly to known results in the fully observable setting, we alsobriefly discuss and empirically illustrate how using function approximators andadapting the discount factor may enhance the tradeoff between asymptotic biasand overfitting in the partially observable context.


Introduction (beta)



Conclusion (beta)