Lately, there has been a resurgence of interest in using supervised learningto solve reinforcement learning problems. Recent work in this area has largelyfocused on learning command-conditioned policies. We investigate the potentialof one such method -- upside-down reinforcement learning -- to work withcommands that specify a desired relationship between some scalar value and theobserved return. We show that upside-down reinforcement learning can learn tocarry out such commands online in a tabular bandit setting and in CartPole withnon-linear function approximation. By doing so, we demonstrate the power ofthis family of methods and open the way for their practical use under morecomplicated command structures.