Generative Adversarial Networks (GANs) can successfully learn a probabilitydistribution and produce realistic samples. However, open questions such assufficient convergence conditions and mode collapse still persist. In thispaper, we build on existing work in the area by proposing a novel framework fortraining the generator against an ensemble of discriminator networks, which canbe seen as a one-student/multiple-teachers setting. We formalize this problemwithin the non-stationary Multi-Armed Bandit (MAB) framework, where we evaluatethe capability of a bandit algorithm to select discriminators for providing thegenerator with feedback during learning. To this end, we propose a rewardfunction which reflects the amount of knowledge learned by the generator anddynamically selects the optimal discriminator network. Finally, we connect ouralgorithm to stochastic optimization methods and show that existing methodsusing multiple discriminators in literature can be recovered from ourparametric model. Experimental results based on the Fr\'echet InceptionDistance (FID) demonstrates faster convergence than existing baselines and showthat our method learns a curriculum.