Using a model heat engine, we show that neural network-based reinforcementlearning can identify thermodynamic trajectories of maximal efficiency. Weconsider both gradient and gradient-free reinforcement learning. We use anevolutionary learning algorithm to evolve a population of neural networks,subject to a directive to maximize the efficiency of a trajectory composed of aset of elementary thermodynamic processes; the resulting networks learn tocarry out the maximally-efficient Carnot, Stirling, or Otto cycles. When givenan additional irreversible process, this evolutionary scheme learns apreviously unknown thermodynamic cycle. Gradient-based reinforcement learningis able to learn the Stirling cycle, whereas an evolutionary approach achievesthe optimal Carnot cycle. Our results show how the reinforcement learningstrategies developed for game playing can be applied to solve physical problemsconditioned upon path-extensive order parameters.