Abstract
The convergence speed of stochastic gradient descent (SGD) can be improved byactively selecting mini-batches. We explore sampling schemes where similar datapoints are less likely to be selected in the same mini-batch. In particular, weprove that such repulsive sampling schemes lowers the variance of the gradientestimator. This generalizes recent work on using Determinantal Point Processes(DPPs) for mini-batch diversification (Zhang et al., 2017) to the broader classof repulsive point processes. We first show that the phenomenon of variancereduction by diversified sampling generalizes in particular to non-stationarypoint processes. We then show that other point processes may be computationallymuch more efficient than DPPs. In particular, we propose and investigatePoisson Disk sampling---frequently encountered in the computer graphicscommunity---for this task. We show empirically that our approach improves overstandard SGD both in terms of convergence speed as well as final modelperformance.