Cross-Entropy Estimators for Sequential Experiment Design with Reinforcement Learning

Abstract

Reinforcement learning can effectively learn amortised design policies fordesigning sequences of experiments. However, current methods rely oncontrastive estimators of expected information gain, which require anexponential number of contrastive samples to achieve an unbiased estimation. Wepropose an alternative lower bound estimator, based on the cross-entropy of thejoint model distribution and a flexible proposal distribution. This proposaldistribution approximates the true posterior of the model parameters given theexperimental history and the design policy. Our estimator requires nocontrastive samples, can achieve more accurate estimates of high informationgains, allows learning of superior design policies, and is compatible withimplicit probabilistic models. We assess our algorithm's performance in varioustasks, including continuous and discrete designs and explicit and implicitlikelihoods.

Quick Read (beta)

loading the full paper ...