### Abstract

Multi-armed bandit algorithms like Thompson Sampling (TS) can be used toconduct adaptive experiments, in which maximizing reward means that data isused to progressively assign participants to more effective arms. Suchassignment strategies increase the risk of statistical hypothesis testsidentifying a difference between arms when there is not one, and failing toconclude there is a difference in arms when there truly is one. We tackle thisby introducing a novel heuristic algorithm, called TS-PostDiff (PosteriorProbability of Difference). TS-PostDiff takes a Bayesian approach to mixing TSand Uniform Random (UR): the probability a participant is assigned using URallocation is the posterior probability that the difference between two arms is'small' (below a certain threshold), allowing for more UR exploration whenthere is little or no reward to be gained. We evaluate TS-PostDiff againststate-of-the-art strategies. The empirical and simulation results helpcharacterize the trade-offs of these approaches between reward, False PositiveRate (FPR), and statistical power, as well as under which circumstances each iseffective. We quantify the advantage of TS-PostDiff in performing well acrossmultiple differences in arm means (effect sizes), showing the benefits ofadaptively changing randomization/exploration in TS in a "StatisticallyConsiderate" manner: reducing FPR and increasing statistical power whendifferences are small or zero and there is less reward to be gained, whileexploiting more when differences may be large. This highlights importantconsiderations for future algorithm development and analysis to better balancereward and statistical analysis.