Quantum Optical Experiments Modeled by Long Short-Term Memory

  • 2019-10-30 12:35:46
  • Thomas Adler, Manuel Erhard, Mario Krenn, Johannes Brandstetter, Johannes Kofler, Sepp Hochreiter
  • 13

Abstract

We demonstrate how machine learning is able to model experiments in quantumphysics. Quantum entanglement is a cornerstone for upcoming quantumtechnologies such as quantum computation and quantum cryptography. Ofparticular interest are complex quantum states with more than two particles anda large number of entangled quantum levels. Given such a multiparticlehigh-dimensional quantum state, it is usually impossible to reconstruct anexperimental setup that produces it. To search for interesting experiments, onethus has to randomly create millions of setups on a computer and calculate therespective output states. In this work, we show that machine learning modelscan provide significant improvement over random search. We demonstrate that along short-term memory (LSTM) neural network can successfully learn to modelquantum experiments by correctly predicting output state characteristics forgiven setups without the necessity of computing the states themselves. Thisapproach not only allows for faster search but is also an essential steptowards automated design of multiparticle high-dimensional quantum experimentsusing generative machine learning models.

 

Quick Read (beta)

Quantum Optical Experiments Modeled by Long Short-Term Memory

Thomas Adler
Institute for Machine Learning &
LIT AI Lab
Johannes Kepler University Linz
Linz, Austria
[email protected]
&Manuel Erhard
Institute for Quantum Optics and Quantum Information
Austrian Academy of Sciences &
Vienna Center for Quantum Science & Technology
University of Vienna
Vienna, Austria
[email protected]
&Mario Krenn
Dept. of Chemistry & Dept. of Computer Science
University of Toronto &
Vector Institute for Artificial Intelligence
Toronto, Canada
[email protected]
&Johannes Brandstetter
Institute for Machine Learning &
LIT AI Lab
Johannes Kepler University Linz
Linz, Austria
[email protected]
&Johannes Kofler
Institute for Machine Learning &
LIT AI Lab
Johannes Kepler University Linz
Linz, Austria
[email protected]
&Sepp Hochreiter
Institute for Machine Learning &
LIT AI Lab
Johannes Kepler University Linz
Linz, Austria
[email protected]
Abstract

We demonstrate how machine learning is able to model experiments in quantum physics. Quantum entanglement is a cornerstone for upcoming quantum technologies such as quantum computation and quantum cryptography. Of particular interest are complex quantum states with more than two particles and a large number of entangled quantum levels. Given such a multiparticle high-dimensional quantum state, it is usually impossible to reconstruct an experimental setup that produces it. To search for interesting experiments, one thus has to randomly create millions of setups on a computer and calculate the respective output states. In this work, we show that machine learning models can provide significant improvement over random search. We demonstrate that a long short-term memory (LSTM) neural network can successfully learn to model quantum experiments by correctly predicting output state characteristics for given setups without the necessity of computing the states themselves. This approach not only allows for faster search but is also an essential step towards automated design of multiparticle high-dimensional quantum experiments using generative machine learning models.

Quantum Optical Experiments Modeled by Long Short-Term Memory

Thomas Adler
Institute for Machine Learning &
LIT AI Lab
Johannes Kepler University Linz
Linz, Austria
[email protected]
Manuel Erhard
Institute for Quantum Optics and Quantum Information
Austrian Academy of Sciences &
Vienna Center for Quantum Science & Technology
University of Vienna
Vienna, Austria
[email protected]
Mario Krenn
Dept. of Chemistry & Dept. of Computer Science
University of Toronto &
Vector Institute for Artificial Intelligence
Toronto, Canada
[email protected]
Johannes Brandstetter
Institute for Machine Learning &
LIT AI Lab
Johannes Kepler University Linz
Linz, Austria
[email protected]
Johannes Kofler
Institute for Machine Learning &
LIT AI Lab
Johannes Kepler University Linz
Linz, Austria
[email protected]
Sepp Hochreiter
Institute for Machine Learning &
LIT AI Lab
Johannes Kepler University Linz
Linz, Austria
[email protected]

1 Introduction

In the past decade, artificial neural networks have been applied to a plethora of scientific disciplines, commercial applications, and every-day tasks with outstanding performance in, e.g., medical diagnosis, self-driving, and board games (Esteva et al., 2017; Silver et al., 2017). In contrast to standard feedforward neural networks, long short-term memory (LSTM) (Hochreiter, 1991; Hochreiter and Schmidhuber, 1997) architectures have recurrent connections, which allow them to process sequential data such as text and speech (Sutskever et al., 2014).

Such sequence-processing capabilities can be particularly useful for designing complex quantum experiments, since the final state of quantum particles depends on the sequence of elements, i.e. the experimental setup, these particles pass through. For instance, in quantum optical experiments, photons may traverse a sequence of wave plates, beam splitters, and holographic plates. High-dimensional quantum states are important for multiparticle and multisetting violations of local realist models as well as for applications in emerging quantum technologies such as quantum communication and error correction in quantum computers (Shor, 2000; Kaszlikowski et al., 2000).

Already for three photons and only a few quantum levels, it becomes in general infeasible for humans to determine the required setup for a desired final quantum state, which makes automated design procedures for this inverse problem necessary. One example of such an automated procedure is the algorithm MELVIN (Krenn et al., 2016), which uses a toolbox of optical elements, randomly generates sequences of these elements, calculates the resulting quantum state, and then checks whether the state is interesting, i.e. maximally entangled and involving many quantum levels. The setups proposed by MELVIN have been realized in laboratory experiments (Malik et al., 2016; Erhard et al., 2018b). Recently, also a reinforcement learning approach has been applied to design new experiments (Melnikov et al., 2018).

Inspired by these advances, we investigate how LSTM networks can learn quantum optical setups and predict the characteristics of the resulting quantum states, a task whose level of humanly perceived difficulty distinctly goes beyond that of other deep learning tasks like object recognition or text generation. We train the neural networks using millions of setups generated by MELVIN. The huge amount of data makes deep learning approaches the first choice. We use cluster cross validation (Mayr et al., 2016) to evaluate the models.

2 Methods

2.1 Target values

Let us consider a quantum optical experiment using three photons with orbital angular momentum (OAM) (Yao and Padgett, 2011; Erhard et al., 2018a). The OAM of a photon is characterized by an integer whose size and sign represent the shape and handedness of the photon wavefront, respectively. For instance, after a series of optical elements, a three particle quantum state may have the following form:

|Ψ=12(|0,0,0+|1,0,1+|2,1,0+|3,1,1). (1)

This state represents a physical situation, in which there is 1/4 chance (modulus square of the amplitude value 1/2) that all three photons have OAM value 0 (first term), and a 1/4 chance that photons 1 and 3 have OAM value 1, while photon 2 has OAM value 0 (second term), and so on for the two remaining terms.

We are generally interested in two main characteristics of the quantum states: (1) Are they maximally entangled? (2) Are they high-dimensional? The dimensionality of a state is represented by its Schmidt rank vector (SRV) (Huber and de Vicente, 2013; Huber et al., 2013). State |Ψ is indeed maximally entangled because all terms on the right hand side have the same amplitude value. Its SRV is (4,2,2), as the first photon is four-dimensionally entangled with the other two photons, whereas photons two and three are both only two-dimensionally entangled with the rest.

A setup is labeled “positive” (yE=1) if its output state is maximally entangled and if the setup obeys some further restrictions, e.g., behaves well under multi-pair emission, and otherwise labeled “negative” (yE=0). The target label capturing the state dimensionality is the SRV ySRV=(n,m,k). We train LSTM networks to directly predict these state characteristics (entanglement and SRV) from a given experimental setup without actually predicting the quantum state itself.

2.2 Loss Function

For classification, we use binary cross entropy (BCE) in combination with logistic sigmoid output activation for learning. For regression, it is always possible to reorder the photon labels such that the SRV has entries in non-increasing order. An SRV label is thus represented by 3-tuple ySRV=(n,m,k) which satisfies nmk. With slight abuse of notation, we model n𝒫(λ) as a Poisson-distributed random variable and m(n,p),k(m,q) as Binomials with ranges m{1,n} and k{1,,m} and success probabilities p and q, respectively. The resulting log-likelihood objective (omitting all terms not depending on λ,p,q) for a data point x with label (n,m,k) is

(λ^,p^,q^x)=nlogλ^-λ^+mlogp^+(n-m)log(1-p^)+klogq^+(m-k)log(1-q^) (2)

where λ^,p^,q^ are the network predictions (i.e. functions of x) for the distribution parameters of n,m,k respectively. The Schmidt rank value predictions are n^=λ^, m^=p^λ^, k^=p^q^λ^. To see this, we need to consider the marginals of the joint probability mass function

f(n,m,k)=λne-λn!(nm)pm(1-p)n-m(mk)qk(1-q)m-k. (3)

To obtain the marginal distribution of m, we can first sum over all possible k, which is easy. To sum out n we first observe that (nm)=0 for n<m, i.e. the first m terms are zero and we may write

f(m)=n=0f(n,m)=n=0f(m+n,m) (4)

capturing only non-zero terms. It follows that

f(m) =n=0λn+me-λ(n+m)!(n+mm)pm(1-p)n (5)
=e-λpmλmn=0λn(1-p)n(n+m)!(n+mm)
=e-λpmλmm!n=0λn(1-p)nn!=e-pλ(pλ)mm!,

which is 𝒫(pλ)-distributed. Using the same argument for k we get that the marginal of k is 𝒫(pqλ)-distributed. The estimates n^,m^,k^ are obtained by taking the means of their respective marginals.

2.3 Network architecture

The used sequence processing model is depicted in Figure 1. We train two networks, one for entanglement classification (target yE), and one for SRV regression (target ySRV). The reason why we avoid multitask learning in this context is that we do not want to incorporate correlations between entanglement and SRV into our models. For instance, the SRV (6,6,6) was only observed in non-maximally entangled samples so far, which is a perfect correlation. This would cause a multitask network to automatically label such a sample as negative only because of its SRV. By training separate networks we lower the risk of incorporating such correlations.

Figure 1: Sequence processing model for a many-to-one mapping. The target value y^ can be either an estimate for yE (entanglement classification) or ySRV (SRV regression).

A setup of N elements is being fed into a network by its sequence of individual optical components x=(x1,x2,,xN), where in our data N ranges from 6 to 15. We use an LSTM with 2048 hidden units and a component embedding space with 64 dimensions. The component embedding technique is similar to word embeddings (Mikolov et al., 2013).

3 Experiments

3.1 Dataset

The dataset produced by MELVIN consists of 7,853,853 different setups of which 1,638,233 samples are labeled positive. Each setup consists of a sequence x of optical elements, and the two target values yE and ySRV. We are interested in whether the trained model is able to extrapolate to unseen SRVs. Therefore, we cluster the data by leading Schmidt rank n. Figure 2 shows the the number of positive and negative samples in the data set for each n.

Figure 2: Negative and positive samples in the data set as a function of the leading Schmidt rank n.

3.2 Workflow

All samples with n9 are moved to a special extrapolation set consisting of only 1,754 setups (gray cell in Table 1). The remainder of the data, i.e. all samples with n<9, is then divided into a training set and a conventional test set with 20 % of the data drawn at random (iid). This workflow is shown in Figure 3.

0,1
0,1 2 3 4 5 6 7 8 9-12
Table 1: Cluster cross validation folds (0-8) and extrapolation set (9-12) characterized by leading Schmidt rank n. Samples with n=0 and samples with n=1 are combined and then split into two folds (0,1) at random.
Figure 3: Workflow. We split the entire data by their leading Schmidt rank n. All samples with n9 constitute the extrapolation set, which we use to explore the out-of-distribution capabilities of our model. For the remaining samples (i.e. n<9) we make a random test split at a ratio of 1/4. The test set is used to estimate the conventional generalization error of our model. We use the training set to perform cluster cross validation.

The test set is used to estimate the conventional generalization error, while the extrapolation set is used to shed light on the ability of the learned model to perform on higher Schmidt rank numbers. If the model extrapolates successfully, we can hope to find experimental setups that lead to new interesting quantum states.

Cluster cross validation (CCV) is an evaluation method similar to standard cross validation. Instead of grouping the folds iid, CCV groups them according to a clustering method. Thus, CCV removes similarities between training and validation set and simulates situations in which the withheld folds have not been obtained yet, thereby allowing us to investigate the ability of the network to discover these withheld setups. We use CCV with nine folds (white cells in Table 1). Seven of these folds correspond to the leading Schmidt ranks 2,,8. The samples with n=1 (not entangled) and n=0 (not even a valid three-photon state) are negative by definition. These samples represent special cases of yE=0 setups and it is not necessary to generalize to these cases without training on them. Therefore, the 4,300,268 samples with n<2 are divided into two folds at random such that the model will always see some of these special samples while training.

3.3 Results

Let us examine if the LSTM network has learned something about quantum physics. A good model will identify positive setups correctly while discarding as many negative setups as possible. This behavior is reflected in the metrics true positive rate TPR=TP/(TP+FN) and true negative rate TNR=TN/(TN+FP), with TP, TN, FP, FN the true positives, true negatives, false positives, false negatives, respectively. A metric that quantifies the success rate within the positive predictions is the precision (or positive predictive value), defined as PPV=TP/(TP+FP).

For each withheld CCV fold n, we characterize a setup to be “interesting” when it fulfills the following two criteria: (i) It is classified positive (y^E>τ) with τ the classification threshold of the sigmoid output activation. (ii) The SRV prediction y^SRV=(n^,m^,k^) is such that there exists a ySRV=(n,m,k) with ySRV-y^SRV2<r. We call r the SRV radius. We denote samples which are classified as interesting (uninteresting) and indeed positive (negative) as true positives (negatives). And we denote samples which are classified as interesting (uninteresting) and indeed negative (positive) as false positives (false negatives).

We employ stochastic gradient descent for training the LSTM network with momentum 0.5 and batch size 128. We sample mini-batches in such a way that positive and negative samples appear equally often in training. For balanced SRV regression, the leading Schmidt rank vector number n is misused as class label. The models were trained using early stopping after 40000 weight update steps for the entanglement classification network and 14000 update steps for the SRV regression network. Hyperparameter search was performed in advance on a data set similar to the training set.

Figure 4 shows the TNR, TPR, and rediscovery ratio for sigmoid threshold τ=0.5 and SRV radius r=3. The rediscovery ratio is defined as the number of distinct SRVs, for which at least 20% of the samples are rediscovered by our method, i.e. identified as interesting, divided by the number of distinct SRVs in the respective cluster. The TNR for fold 0,1 is 0.9996, and the precision on the extrapolation set 9-12 is 0.659. Error bars in Figure 4 and later in the text are 95% binomial proportion confidence intervals. Model performance depends heavily on parameters τ and r. Figure 5 shows the “beyond distribution” results for a variety of sigmoid thresholds and SRV radii.

Figure 4: True negative rate (TNR), true positive rate (TPR), rediscovery ratio of the LSTM network using cluster cross validation for different folds 0-8. True negative rates are high for all validation folds. All metrics are good for the extrapolation set 9-12, demonstrating that the models perform well on data beyond the training set distribution, covering only Schmidt rank numbers 0-8. Error bars represent 95% binomial proportion confidence intervals.
(a)
(b)
(c)
(d)
Figure 5: True negative rate (scale starts at 0.6), true positive rate, rediscovery ratio, and precision for the extrapolation set 9-12 for varying sigmoid threshold τ and SRV radius r. For too restrictive parameter choices (τ1 and r0.5) the TNR and precision approach the value 1, while TPR and rediscovery ratio approach 0, such that no interesting new setups would be identified. For too loose choices (small τ, large r), too few negative samples would be rejected, such that the advantage over random search becomes negligible, reflected in smaller precision values. Hence, there is a trade-off between rediscovery ratio (diversity of discoveries) and precision (speed of discoveries). For a large variety of τ and r the models perform satisfyingly well, allowing a decent compromise between TNR and TPR. This is also reflected by a value of 0.64 for the mean average precision, where the mean is taken over r=0.5 to r=7 with a step size of 0.1 and the average precision is over τ=1 to τ=0 with a step size of 0.01 for each value of r.

Finally, we investigate the conventional in-distribution generalization error using the test set (20 % of the data). Entanglement classification: The entanglement training BCE loss value is 10.2. TNR and TPR are 0.9271±0.00024 and 0.9469±0.00041, respectively. The corresponding test error is 10.4. TNR and TPR are 0.9261±0.00038 and 0.9427±0.00065, respectively. SRV regression: The SRV training loss value according to Equation (2) is 2.247, the accuracy with r=3 is 93.82 % and the mean distance between label and prediction is 1.3943. The SRV test error is 2.24, the accuracy with r=3 is 0.938 and the mean distance between label and prediction is 1.40. These figures are consistent with a clean training procedure.

4 Outlook

Our experiments demonstrate that an LSTM-based neural network can be trained to model certain properties of complex quantum systems. Our approach is not limited to entanglement and Schmidt rank but may be generalized to employ other objective functions such as multiparticle transformations, interference and fidelity qualities, and so on.

Another possible next step to expand our approach towards the goal of automated design of multiparticle high-dimensional quantum experiments is the exploitation of generative models. Here, we consider Generative Adversarial Networks (GANs) (Goodfellow et al., 2014) and beam search (Lowerre, 1976) as possible approaches.

Generating sequences such as text in adversarial settings has been done using 1D CNNs (Gulrajani et al., 2017) and LSTMs (Yu et al., 2016; Fedus et al., 2018). The LSTM-based approaches employ ideas from reinforcement learning to alleviate the problem of propagating gradients through the softmax outputs of the network. Since our data is in structure similar to text, these approaches are directly applicable to our setting.

For beam search, there exist two different ideas, namely a discriminative approach and a generative approach. The discriminative approach incorporates the entire data set (positive and negative samples). The models trained for this work can be used for the discriminative approach in that one constructs new sequences by maximizing the belief of the network that the outcome will be a positive setup. For the generative approach, the idea is to train a model on the positive samples only to learn their distribution via next element prediction. On inference, beam search can be used to approximate the most probable sequence given some initial condition (Bengio et al., 2015). Another option to generate new sequences is to sample from the softmax distribution of the network output at each sequence position as has been used for text generation models (Graves, 2013; Karpathy and Fei-Fei, 2015).

In general, automated design procedures of experiments has much broader applications beyond quantum optical setups and can be of importance for many scientific disciplines other than physics.

5 Conclusion

We have shown that an LSTM-based neural network can be trained to successfully predict certain characteristics of high-dimensional multiparticle quantum states from the experimental setup without any explicit knowledge of quantum mechanics. For humans, the difficulty of analyzing quantum optical experiments goes far beyond that of other deep learning problems like, e.g., image classification. The network performs well even on unseen data beyond the training distribution, proving its extrapolation capabilities. This paves the way to automated design of complex quantum experiments using generative machine learning models.

Acknowledgements

This work was supported by NVIDIA Corporation, Merck KGaA, Audi.JKU Deep Learning Center, Audi Electronic Venture GmbH, Janssen Pharmaceutica (madeSMART), TGW Logistics Group, ZF Friedrichshafen AG, UCB S.A., FFG grant 871302, LIT grant DeepToxGen and AI-SNN, and FWF grant P 28660-N31. MK acknowledges support from the Austrian Science Fund (FWF) via the Erwin Schrödinger fellowship No. J4309. ME acknowledges FWF project CoQuS no. W1210-N16. The authors thank Anton Zeilinger for useful discussions.

References

  • S. Bengio, O. Vinyals, N. Jaitly, and N. Shazeer (2015) Scheduled sampling for sequence prediction with recurrent neural networks. In Advances in Neural Information Processing Systems 28, pp. 1171–1179. Cited by: §4.
  • M. Erhard, R. Fickler, M. Krenn, and A. Zeilinger (2018a) Twisted photons: new quantum perspectives in high dimensions. Light: Science & Applications 7 (3), pp. 17146. Cited by: §2.1.
  • M. Erhard, M. Malik, M. Krenn, and A. Zeilinger (2018b) Experimental GHZ entanglement beyond qubits. Nature Photonics 12 (759). Cited by: §1.
  • A. Esteva, B. Kuprel, R. A. Novoa, J. Ko, S. M. Swetter, H. M. Blau, and S. Thrun (2017) Dermatologist-level classification of skin cancer with deep neural networks. Nature 542 (115). Cited by: §1.
  • W. Fedus, I. Goodfellow, and A. M. Dai (2018) MaskGAN: better text generation via filling in the------. In International Conference on Learning Representations, Cited by: §4.
  • I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in Neural Information Processing Systems 27, pp. 2672–2680. Cited by: §4.
  • A. Graves (2013) Generating sequences with recurrent neural networks. arXiv:1308.0850. Cited by: §4.
  • I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville (2017) Improved training of wasserstein gans. In Advances in Neural Information Processing Systems 30, pp. 5767–5777. Cited by: §4.
  • S. Hochreiter and J. Schmidhuber (1997) Long short-term memory. Neural Computation 9 (1735). Cited by: §1.
  • S. Hochreiter (1991) Untersuchungen zu dynamischen neuronalen Netzen. Diploma Thesis, TU M nchen. Cited by: §1.
  • M. Huber and J. I. de Vicente (2013) Structure of multidimensional entanglement in multipartite systems. Physical Review Letters 110 (030501). Cited by: §2.1.
  • M. Huber, M. Perarnau-Llobet, and J. I. de Vicente (2013) Entropy vector formalism and the structure of multidimensional entanglement in multipartite systems. Physical Review A 88 (4), pp. 042328. Cited by: §2.1.
  • A. Karpathy and L. Fei-Fei (2015) Deep visual-semantic alignments for generating image descriptions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3128–3137. Cited by: §4.
  • D. Kaszlikowski, P. Gnac nski, M. Zukowski, W. Miklaszewski, and A. Zeilinger (2000) Violations of local realism by two entangled N-dimensional systems are stronger than for two qubits. Phys. Rev. Lett. 86 (4418). Cited by: §1.
  • M. Krenn, M. Malik, R. Fickler, R. Lapkiewicz, and A. Zeilinger (2016) Automated Search for new Quantum Experiments. Phys. Rev. Lett. 116 (090405). Cited by: §1.
  • B. T. Lowerre (1976) The Harpy speech recognition system. PhD Thesis, Carnegie Mellon University, Pittsburgh. Cited by: §4.
  • M. Malik, M. Erhard, M. Huber, M. Krenn, R. Fickler, and A. Zeilinger (2016) Multi-photon entanglement in high dimensions. Nature Photonics 10 (248). Cited by: §1.
  • A. Mayr, G. Klambauer, T. Unterthiner, and S. Hochreiter (2016) DeepTox: Toxicity Prediction using Deep Learning. Frontiers in Environmental Science 3 (80). Cited by: §1.
  • A. A. Melnikov, H. Poulsen Nautrup, M. Krenn, V. Dunjko, M. Tiersch, A. Zeilinger, and H. J. Briegel (2018) Active learning machine learns to create new quantum experiments. PNAS 115 (1221). Cited by: §1.
  • T. Mikolov, K. Chen, G. Corrado, and J. Dean (2013) Efficient estimation of word representations in vector space. ICLR Workshop, arXiv:1301.3781. Cited by: §2.3.
  • P. W. Shor (2000) Scheme for reducing decoherence in quantum computer memory. Phys. Rev. A 52 (R2493). Cited by: §1.
  • D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, F. Hui, L. Sifre, G. van den Driessche, T. Graepel, and D. Hassabis (2017) Mastering the game of Go without human knowledge. Nature 550 (354). Cited by: §1.
  • I. Sutskever, O. Vinyals, and Q. V. Le (2014) Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pp. 3104–3112. Cited by: §1.
  • A. M. Yao and M. J. Padgett (2011) Orbital angular momentum: origins, behavior and applications. Adv. Opt. Photon. 3 (161). Cited by: §2.1.
  • L. Yu, W. Zhang, J. Wang, and Y. Yu (2016) SeqGAN: sequence generative adversarial nets with policy gradient. arxiv:1609.05473. Cited by: §4.