Many training algorithms of a deep neural network can be interpreted asminimizing the cross entropy loss between the prediction made by the networkand a target distribution. In supervised learning, this target distribution istypically the ground-truth one-hot vector. In semi-supervised learning, thistarget distribution is typically generated by a pre-trained teacher model totrain the main network. In this work, instead of using such predefined targetdistributions, we show that learning to adjust the target distribution based onthe learning state of the main network can lead to better performances. Inparticular, we propose an efficient meta-learning algorithm, which encouragesthe teacher to adjust the target distributions of training examples in themanner that improves the learning of the main network. The teacher is updatedby policy gradients computed by evaluating the main network on a held-outvalidation set. Our experiments demonstrate substantial improvements over strong baselinesand establish state-ofthe-art performance on CIFAR-10, SVHN, and ImageNet. Forinstance, with ResNets on small datasets, we achieve 96.1% on CIFAR-10 with4,000 labeled examples and 73.9% top-1 on ImageNet with 10% examples.Meanwhile, with EfficientNet on full datasets plus extra unlabeled data, weattain 98.6% accuracy on CIFAR-10 and 86.9% top-1 accuracy on ImageNet.