Self-supervised learning of convolutional neural networks can harness largeamounts of cheap unlabeled data to train powerful feature representations. Assurrogate task, we jointly address ordering of visual data in the spatial andtemporal domain. The permutations of training samples, which are at the core ofself-supervision by ordering, have so far been sampled randomly from a fixedpreselected set. Based on deep reinforcement learning we propose a samplingpolicy that adapts to the state of the network, which is being trained.Therefore, new permutations are sampled according to their expected utility forupdating the convolutional feature representation. Experimental evaluation onunsupervised and transfer learning tasks demonstrates competitive performanceon standard benchmarks for image and video classification and nearest neighborretrieval.