Taking gradients through experiments: LSTMs and memory proximal policy optimization for black-box quantum control

Abstract

In this work we introduce the application of black-box quantum control as aninteresting rein- forcement learning problem to the machine learning community.We analyze the structure of the reinforcement learning problems arising inquantum physics and argue that agents parameterized by long short-term memory(LSTM) networks trained via stochastic policy gradients yield a general methodto solving them. In this context we introduce a variant of the proximal policyoptimization (PPO) algorithm called the memory proximal policy optimization(MPPO) which is based on this analysis. We then show how it can be applied tospecific learning tasks and present results of nu- merical experiments showingthat our method achieves state-of-the-art results for several learning tasks inquantum control with discrete and continouous control parameters.

Quick Read (beta)

loading the full paper ...