Abstract
Conservation laws are considered to be fundamental laws of nature. It hasbroad applications in many fields, including physics, chemistry, biology,geology, and engineering. Solving the differential equations associated withconservation laws is a major branch in computational mathematics. The recentsuccess of machine learning, especially deep learning in areas such as computervision and natural language processing, has attracted a lot of attention fromthe community of computational mathematics and inspired many intriguing worksin combining machine learning with traditional methods. In this paper, we arethe first to view numerical PDE solvers as an MDP and to use (deep) RL to learnnew solvers. As proof of concept, we focus on 1dimensional scalar conservationlaws. We deploy the machinery of deep reinforcement learning to train a policynetwork that can decide on how the numerical solutions should be approximatedin a sequential and spatialtemporal adaptive manner. We will show that theproblem of solving conservation laws can be naturally viewed as a sequentialdecisionmaking process, and the numerical schemes learned in such a way caneasily enforce longterm accuracy. Furthermore, the learned policy network iscarefully designed to determine a good local discrete approximation based onthe current state of the solution, which essentially makes the proposed methoda metalearning approach. In other words, the proposed method is capable oflearning how to discretize for a given situation mimicking human experts.Finally, we will provide details on how the policy network is trained, how wellit performs compared with some stateoftheart numerical solvers such as WENOschemes, and supervised learning based approach L3D and PINN, and how well itgeneralizes.
Quick Read (beta)
Learning to Discretize: Solving 1D Scalar Conservation Laws via Deep Reinforcement Learning
Abstract
Conservation laws are considered to be fundamental laws of nature. It has broad applications in many fields, including physics, chemistry, biology, geology, and engineering. Solving the differential equations associated with conservation laws is a major branch in computational mathematics. The recent success of machine learning, especially deep learning in areas such as computer vision and natural language processing, has attracted a lot of attention from the community of computational mathematics and inspired many intriguing works in combining machine learning with traditional methods. In this paper, we are the first to view numerical PDE solvers as an MDP and to use (deep) RL to learn new solvers. As proof of concept, we focus on 1dimensional scalar conservation laws. We deploy the machinery of deep reinforcement learning to train a policy network that can decide on how the numerical solutions should be approximated in a sequential and spatialtemporal adaptive manner. We will show that the problem of solving conservation laws can be naturally viewed as a sequential decisionmaking process, and the numerical schemes learned in such a way can easily enforce longterm accuracy. Furthermore, the learned policy network is carefully designed to determine a good local discrete approximation based on the current state of the solution, which essentially makes the proposed method a metalearning approach. In other words, the proposed method is capable of learning how to discretize for a given situation mimicking human experts. Finally, we will provide details on how the policy network is trained, how well it performs compared with some stateoftheart numerical solvers such as WENO schemes, and supervised learning based approach L3D and PINN, and how well it generalizes.
Wang Y F et. al.]Yufei Wang${}^{1}$ ^{2}^{2}footnotemark: 2 ^{†}^{†}footnotetext: ${}^{\u2020}$ The first two authors contributed equally., Ziju Shen${}^{2}$ ${}^{\u2020}$, Zichao Long${}^{3}$ and Bin Dong${}^{4}$${}^{\text{,}}$^{1}^{1} 1 Corresponding author.

Learning to Discretize: Solving 1D Scalar Conservation Laws via Deep Reinforcement Learning

[

${}^{1}$ Computer Science Department, Carnegie Mellon University, USA.
${}^{\mathrm{2}}$ Academy for Advanced Interdisciplinary Studies, Peking University, P.R. China.
${}^{\mathrm{3}}$ School of Mathematical Sciences, Peking University, P.R. China.
${}^{\mathrm{4}}$ Beijing International Center for Mathematical Research, and Institute for Artificial Intelligence, Peking University, P.R. China.

AMS subject classifications: 65M06, 68T05, 49N90

Key words: conservation laws, deep reinforcement learning, finite difference approximation, WENO
1 Introduction
Conservation laws are considered to be one of the fundamental laws of nature, and has broad applications in multiple fields such as physics, chemistry, biology, geology, and engineering. For example, Burgers equation, a very classic partial differential equation (PDE) in conservation laws, has important applications in fluid mechanics, nonlinear acoustics, gas dynamics, and traffic flow.
Solving the differential equations associated with conservation laws has been a major branch of computational mathematics [18, 19], and a lot of effective methods have been proposed, from classic methods such as the upwind scheme, the LaxFriedrichs scheme, to the advanced ones such as the ENO/WENO schemes [22, 30], the fluxlimiter methods [14], and etc. In the past few decades, these traditional methods have been proven successful in solving conservation laws. Nonetheless, the design of some of the highend methods heavily relies on expert knowledge and the coding of these methods can be a laborious process. To ease the usage and potentially improve these traditional algorithms, machine learning, especially deep learning, has been recently incorporated into this field. For example, the ENO scheme requires lots of ‘if/else’ logical judgments when used to solve complicated system of equations or highdimensional equations. This very much resembles the oldfashioned expert systems. The recent trend in artificial intelligence (AI) is to replace the expert systems by the socalled ‘connectionism’, e.g., deep neural networks, which leads to the recent bloom of AI. Therefore, it is natural and potentially beneficial to introduce deep learning in traditional numerical solvers of conservation laws.
1.1 Related works
In the last few years, neural networks (NNs) have been applied to solving ODEs/PDEs or the associated inverse problems. These works can be roughly classified into two categories according to the way that the NN is used.
The first type of works propose to harness the representation power of NNs, and are irrelevant to the numerical discretization based methods. For example, in the pioneering works [34, 31, 27, 26, 2], NNs are used as new ansatz to approximate solutions of PDEs. It was later generalized by [33] to allow randomness in the solution which is trained using policy gradient. More recent works along this line include [23, 25, 6]. Besides, several works have focused on using NNs to establish direct mappings between the parameters of the PDEs (e.g. the coefficient field or the ground state energy) and their associated solutions [16, 17, 21, 9]. Furthermore, [13, 4] proposed a method to solve very highdimensional PDEs by converting the PDE to a stochastic control problem and use NNs to approximate the gradient of the solution.
The second type of works, which target at using NNs to learn new numerical schemes, are closely related to our work. However, we note that these works mainly fall in the setting of supervised learning (SL). For example, [7] proposed to integrate NNs into highorder numerical solvers to predict artificial viscosity; [28] trained a multilayer perceptron to replace traditional indicators for identifying troubledcells in highresolution schemes for conservation laws. These works greatly advanced the development in machine learning based design of numerical schemes for conservation laws. Note that in [7], the authors only utilized the onestep error to train the artificial viscosity networks without taking into account the longterm accuracy of the learned numerical scheme. [28] first constructed several functions with known regularities and then used them to train a neural network to predict the location of discontinuity, which was later used to choose a proper slope limiter. Therefore, the training of the NNs is separated from the numerical scheme. Then, a natural question is whether we can learn discretization of differential equations in an endtoend fashion and the learned discrete scheme also takes longterm accuracy into account. This motivates us to employ reinforcement learning to learn good solvers for conservation laws.
1.2 Our Approach
The main objective of this paper is to design new numerical schemes in an autonomous way. We propose to view iterative numerical schemes solving the conservation laws as a Markov Decision Process (MDP) and use reinforcement learning (RL) to find new and potentially better datadriven solvers. We carefully design an RLbased method so that the learned policy can generate high accuracy numerical schemes and can well generalize in varied situations. Details will be given in section 3.
Here, we first summarize the benefits and potentials of using RL to solve conservation laws (the arguments may also well apply to other evolution PDEs):

•
Most of the numerical solvers of conservation law can be interpreted naturally as a sequential decision making process (e.g., the approximated grid values at the current time instance definitely affects all the future approximations). Thus, it can be easily formulated as a MDP and solved by RL.

•
In almost all the RL algorithms, the policy $\pi $ (which is the AI agent who decides on how the solution should be approximated locally) is optimized with regards to the values ${Q}^{\pi}({s}_{0},{a}_{0})=r({s}_{0},{a}_{0})+{\sum}_{t=1}^{\mathrm{\infty}}{\gamma}^{t}r({s}_{t},{a}_{t})$, which by definition considers the longterm accumulated reward (or, error of the learned numerical scheme), thus could naturally guarantee the longterm accuracy of the learned schemes, instead of greedily deciding the local approximation which is the case for most numerical PDEs solvers. Furthermore, it can gracefully handle the cases when the action space is discrete, which is in fact one of the major strength of RL.

•
By optimizing towards longterm accuracy and effective exploration, we believe that RL has a good potential in improving traditional numerical schemes, especially in parts where no clear design principles exist. For example, although the WENO5 scheme achieves optimal order of accuracy at smooth regions of the solution [30], the best way of choosing templates near singularities remains unknown. Our belief that RL could shed lights on such parts is later verified in the experiments: the trained RL policy demonstrated new behaviours and is able to select better templates than WENO and hence approximate the solution better than WENO near singularities.

•
Nonsmooth norms such as the infinity norm of the error is often used to evaluate the performance of the learned numerical schemes. As the norm of the error serves as the loss function for the learning algorithms, computing the gradient of the infinity norm can be problematic for supervised learning, while RL does not have such problem since it does not explicitly take gradients of the loss function (i.e. the reward function for RL).

•
Learning the policy $\pi $ within the RL framework makes the algorithm metalearninglike [29, 5, 1, 20, 10]. The learned policy $\pi $ can decide on which local numerical approximation to use by judging from the current state of the solution (e.g. local smoothness, oscillatory patterns, dissipation, etc). This is vastly different from regular (nonmeta) learning where the algorithms directly make inference on the numerical schemes without the aid of an additional network such as $\pi $. As subtle the difference as it may seem, metalearninglike methods have been proven effective in various applications such as in image restoration [15, 8, 35]. See [32] for a comprehensive survey on metalearning. In our experiment section, we will present extensive empirical studies on the generalization ability of the RLbased solver and demonstrate its advantage over some supervised learning based methods.

•
Another purpose of this paper is to raise an awareness of the connection between MDP and numerical PDE solvers, and the general idea of how to use RL to improve PDE solvers or even finding brand new ones. Furthermore, in computational mathematics, a lot of numerical algorithms are sequential, and the computation at each step is expertdesigned and usually greedy, e.g., the conjugate gradient method, the fast sweeping method [36], matching pursuit [24], etc. We hope our work could motivate more researches in combining RL and computational mathematics, and stimulate more exploration on using RL as a tool to tackle the bottleneck problems in computational mathematics.
Our paper is organized as follows. In section 2 we briefly review 1dimensional conservation laws and the WENO schemes. In section 3, we discuss how to formulate the process of numerically solving conservation laws into a Markov Decision Process. Then, we present details on how to train a policy network to mimic human expert in choosing discrete schemes in a spatialtemporary adaptive manner by learning upon WENO. In section 4, we conduct numerical experiments on 1D conservation laws to demonstrate the performance of our trained policy network. Our experimental results show that the trained policy network indeed learned to adaptively choose good discrete schemes that offer better results than the stateoftheart WENO scheme which is 5th order accurate in space and 4th order accurate in time. This serves as an evidence that the proposed RL framework has the potential to design highperformance numerical schemes for conservation laws in a datadriven fashion. Furthermore, the learned policy network generalizes well to other situations such as different initial conditions, forcing terms, mesh sizes, temporal discrete schemes, etc. To further demonstrate the advantage of using RL to design numerical solvers, we we provide additional comparisons between the WENO learned using RL and two supervised learning based methods, L3D [3] and PINN [27, 26]. The paper ends with a conclusion in section 5, where possible future research directions are also discussed.
2 Preliminaries
2.1 Notations
In this paper, we consider solving the following 1D conservation laws:
$$\begin{array}{cc}& {u}_{t}(x,t)+{\left(f(u(x,t))\right)}_{x}=0\hfill \\ & a\le x\le b,t\in [0,T],u(x,0)={u}_{0}(x)\hfill \end{array}$$  (2.1) 
For example, $f=\frac{{u}^{2}}{2}$ is the famous Burgers Equation. We discretize the $(x,t)$plane by choosing a mesh with spatial size $\mathrm{\Delta}x$ and temporal step size $\mathrm{\Delta}t$, and define the discrete mesh points $({x}_{j},{t}_{n})$ by
$$\begin{array}{cc}& {x}_{j}=a+j\mathrm{\Delta}x,{t}_{n}=n\mathrm{\Delta}t\hfill \\ & \text{with}j=0,1,\mathrm{\dots},J=\frac{ba}{\mathrm{\Delta}x},n=0,1,\mathrm{\dots},N=\frac{T}{\mathrm{\Delta}t}\hfill \end{array}$$  (2.2) 
We denote ${x}_{j+\frac{1}{2}}={x}_{j}+\mathrm{\Delta}x/2=a+(j+\frac{1}{2})\mathrm{\Delta}x.$ The finite difference methods will produce approximations ${U}_{j}^{n}$ to the solution $u({x}_{j},{t}_{n})$ on the given discrete mesh points. We denote pointwise values of the true solution to be ${u}_{j}^{n}=u({x}_{j},{t}_{n})$, and the true pointwise flux values to be ${f}_{j}^{n}=f(u({x}_{j},{t}_{n}))$.
2.2 WENO – Weighted Essentially NonOscillatory Schemes
WENO (Weighted Essentially NonOscillatory) [22] is a family of high order accurate finite difference schemes for solving hyperbolic conservation laws, and has been successful for many practical problems. The key idea of WENO is a nonlinear adaptive procedure that automatically chooses the smoothest local stencil to reconstruct the numerical flux. Generally, a finite difference method solves Eq.2.1 by using a conservative approximation to the spatial derivative of the flux:
$$\frac{d{u}_{j}(t)}{dt}=\frac{1}{\mathrm{\Delta}x}\left({\widehat{f}}_{j+\frac{1}{2}}{\widehat{f}}_{j\frac{1}{2}}\right),$$  (2.3) 
where ${u}_{j}(t)$ is the numerical approximation to the point value $u({x}_{j},t)$ and ${\widehat{f}}_{j+\frac{1}{2}}$ is the numerical flux generated by a numerical flux policy
$${\widehat{f}}_{j+\frac{1}{2}}={\pi}^{f}({u}_{jr},\mathrm{\dots},{u}_{j+s}),$$ 
which is manually designed. Note that the term “numerical flux policy" is a new terminology that we introduce in this paper, which is exactly the policy we shall learn using RL. In WENO, ${\pi}^{f}$ works as follows. Using the physical flux values $\{{f}_{j2},{f}_{j1},{f}_{j}\}$, we could obtain a ${3}^{th}$ order accurate polynomial interpolation ${\widehat{f}}_{j+\frac{1}{2}}^{2}$, where the indices $\{j2,j1,j\}$ is called a ‘stencil’. We could also use the stencil $\{j1,j,j+1\}$, $\{j,j+1,j+2\}$ or $\{j+1,j+2,j+3\}$ to obtain another three interpolants ${\widehat{f}}_{j+\frac{1}{2}}^{1}$, ${\widehat{f}}_{j+\frac{1}{2}}^{0}$ and ${\widehat{f}}_{j+\frac{1}{2}}^{1}$. The key idea of WENO is to average (with properly designed weights) all these interpolants to obtain the final reconstruction:
$${\widehat{f}}_{j+\frac{1}{2}}=\sum _{r=2}^{1}{w}_{r}{\widehat{f}}_{j+1/2}^{r},\sum _{r=2}^{1}{w}_{r}=1.$$ 
The weight ${w}_{i}$ depends on the smoothness of the stencil. A general principal is: the smoother is the stencil, the more accurate is the interpolant and hence the larger is the weight. To ensure convergence, we need the numerical scheme to be consistent and stable [18]. It is known that WENO schemes as described above are consistent. For stability, upwinding is required in constructing the flux. The most easy way is to use the sign of the Roe speed
$${\overline{a}}_{j+\frac{1}{2}}=\frac{{f}_{j+\frac{1}{2}}{f}_{j\frac{1}{2}}}{{u}_{j+\frac{1}{2}}{u}_{j\frac{1}{2}}}$$ 
to determine the upwind direction: if ${\overline{a}}_{j+\frac{1}{2}}\ge 0$, we only average among the three interpolants ${\widehat{f}}_{j+\frac{1}{2}}^{2}$, ${\widehat{f}}_{j+\frac{1}{2}}^{1}$ and ${\widehat{f}}_{j+\frac{1}{2}}^{0}$; if $$, we use ${\widehat{f}}_{j+\frac{1}{2}}^{1}$, ${\widehat{f}}_{j+\frac{1}{2}}^{0}$ and ${\widehat{f}}_{j+\frac{1}{2}}^{1}$.
Some further thoughts.
WENO achieves optimal order of accuracy (up to 5) at the smooth region of the solutions [30], while lower order of accuracy at singularities. The key of the WENO method lies in how to compute the weight vector $({w}_{1},{w}_{2},{w}_{3},{w}_{4})$, which primarily depends on the smoothness of the solution at local stencils. In WENO, such smoothness is characterized by handcrafted formula, and was proven to be successful in many practical problems when coupled with highorder temporal discretization. However, it remains unknown whether there are better ways to combine the stencils so that optimal order of accuracy in smooth regions can be reserved while, at the same time, higher accuracy can be achieved near singularities. Furthermore, estimating the upwind directions is another key component of WENO, which can get quite complicated in highdimensional situations and requires lots of logical judgments (i.e. “if/else"). Can we ease the (some time painful) coding and improve the estimation at the aid of machine learning?
3 Methods
In this section we present how to employ reinforcement learning to solve the conservation laws given by (2.1). To better illustrate our idea, we first show in general how to formulate the process of numerically solving a conservation law into an MDP. We then discuss how to incorporate a policy network with the WENO scheme. Our policy network targets at the following two key aspects of WENO: (1) Can we learn to choose better weights to combine the constructed fluxes? (2) Can we learn to automatically judge the upwind direction, without complicated logical judgments?
3.1 MDP Formulation
As shown in Algorithm 1, the procedure of numerically solving a conservation law is naturally a sequential decision making problem. The key of the procedure is the numerical flux policy ${\pi}^{f}$ and the temporal scheme ${\pi}^{t}$ as shown in line 6 and 8 in Algorithm 1. Both policies could be learned using RL. However, in this paper, we mainly focus on using RL to learn the numerical flux policy ${\pi}^{f}$, while leaving the temporal scheme ${\pi}^{t}$ with traditional numerical schemes such as the Euler scheme or the Runge–Kutta methods. A quick review of RL is given in the appendix.
Now, we show how to formulate the above procedure as an MDP and the construction of the state $S$, action $A$, reward $r$ and transition dynamics $P$. Algorithm 2 shows in general how RL is incorporated into the procedure. In Algorithm 2, we use a single RL agent. Specifically, when computing ${U}_{j}^{n}$:

•
The state for the RL agent is
$${s}_{j}^{n}={g}_{s}({U}_{jr1}^{n1},\mathrm{\dots},{U}_{j+s}^{n1}),$$ where ${g}_{s}$ is the state function. The choice of ${g}_{s}$ depends on the problem and the choice we made for WENO will be given in Section 4.

•
In general, the action of the agent is used to determine how the numerical fluxes ${\widehat{f}}_{j+\frac{1}{2}}^{n}$ and ${\widehat{f}}_{j\frac{1}{2}}^{n}$ is computed. In the next subsection, we detail how we incorporate ${a}_{j}^{n}$ to be the linear weights of the fluxes computed using different stencils in the WENO scheme.

•
The reward should encourage the agent to generate a scheme that minimizes the error between its approximated value and the true value. Therefore, we define the reward function as
$${r}_{j}^{n}={g}_{r}({U}_{jr1}^{n}{u}_{jr1}^{n},\mathrm{\cdots},{U}_{j+s}^{n}{u}_{j+s}^{n}),$$ e.g., a simplest choice is ${g}_{r}=\cdot {}_{2}$. Same as ${g}_{s}$, the choice of ${g}_{r}$ is problem dependent, and the choice we made for WENO will be given in Section 4.

•
The transition dynamics $P$ is fully deterministic, and depends on the choice of the temporal scheme at line 10 in Algorithm 2. Note that the next state can only be constructed when we have obtained all the point values in the next time step, i.e., ${s}_{j}^{n+1}={g}_{s}({U}_{jr1}^{n},\mathrm{\dots},{U}_{j+s}^{n})$ does not only depends on action ${a}_{j}^{n}$, but also on actions ${a}_{jr1}^{n},\mathrm{\dots},{a}_{j+s}^{n}$ (action ${a}_{j}^{n}$ can only determine the value ${U}_{j}^{n}$). This subtlety can be resolved by viewing the process under the framework of multiagent RL, in which at each mesh point $j$ we use a distinct agent ${A}_{j}^{RL}$, and the next state ${s}_{j}^{n+1}={g}_{s}({U}_{jr1}^{n},\mathrm{\dots},{U}_{j+s}^{n})$ depends on these agents’ joint action ${\mathbf{a}}_{\mathbf{j}}^{\mathbf{n}}=({a}_{jr1}^{n},\mathrm{\dots},{a}_{j+s}^{n})$. However, it is impractical to train $J$ different agents as $J$ is usually very large, therefore we enforce the agents at different mesh point $j$ to share the same weight, which reduces to case of using just a single agent. The single agent can be viewed as a counterpart of a human designer who decides on the choice of a local scheme based on the current state in traditional numerical methods.
3.2 Improving WENO with RL
We now present how to transfer the actions of the RL policy to the weights of WENO fluxes. Instead of directly using ${\pi}^{RL}$ to generate the numerical flux, we use it to produce the weights of numerical fluxes computed using different stencils in WENO. Since the weights are part of the configurations of the WENO scheme, our design of action essentially makes the RL policy a metalearner, and enables more stable learning and better generalization power than directly generating the fluxes.
Specifically, at point ${x}_{j}$ (here we drop the time superscript $n$ for simplicity), to compute the numerical flux ${\widehat{f}}_{j\frac{1}{2}}$ and ${\widehat{f}}_{j+\frac{1}{2}}$, we first construct four fluxes ${\{{\widehat{f}}_{j\frac{1}{2}}^{i}\}}_{i=2}^{1}$ and ${\{{\widehat{f}}_{j+\frac{1}{2}}^{i}\}}_{i=2}^{1}$ using four different stencils just as in WENO, and then use the RL policy ${\pi}^{RL}$ to generate the weights of these fluxes:
$${\pi}^{RL}({s}_{j})=({w}_{j\frac{1}{2}}^{2},\mathrm{\dots},{w}_{j\frac{1}{2}}^{1},{w}_{j+\frac{1}{2}}^{2},\mathrm{\dots},{w}_{j+\frac{1}{2}}^{1}),\text{with}\sum _{i=2}^{1}{w}_{j\pm \frac{1}{2}}^{i}=1.$$ 
The numerical flux is then constructed by averaging these fluxes:
$${\widehat{f}}_{j\frac{1}{2}}=\sum _{i=2}^{1}{w}_{j\frac{1}{2}}^{i}{\widehat{f}}_{j\frac{1}{2}}^{i}\mathit{\hspace{1em}}\text{and}\mathit{\hspace{1em}}{\widehat{f}}_{j+\frac{1}{2}}=\sum _{i=2}^{1}{w}_{j+\frac{1}{2}}^{i}{\widehat{f}}_{j+\frac{1}{2}}^{i}.$$ 
Note that the determination of upwind direction is automatically embedded in the RL policy since it generates four weights at once. For instance, when the roe speed ${\overline{a}}_{j+\frac{1}{2}}\ge 0$, we expect the ${4}^{th}$ weight ${w}_{j+\frac{1}{2}}^{1}\approx 0$ and when $$, we expect ${w}_{j+\frac{1}{2}}^{2}\approx 0$. Note that the upwind direction can be very complicated in a system of equations or in the highdimensional situations, and using the policy network to automatically embed such a process could save lots of efforts in algorithm design and implementation. Our numerical experiments show that ${\pi}^{RL}$ can indeed automatically determine upwind directions for 1D scalar cases. Although this does not mean that it works for systems and/or in highdimensions, it shows the potential of the proposed framework and value for further studies.
4 Experiments
\diagbox$\mathrm{\Delta}t$$\mathrm{\Delta}x$  0.02  0.04  0.05  
RLWENO  WENO  RLWENO  WENO  RLWENO  WENO  
0.002  10.81 (3.96)  11.13 (3.83)  18.79 (12.44)  19.45 (9.32)  28.61 (33.37)  35.95 (27.7) 
0.003  10.83 (3.96)  11.14 (3.82)  18.82 (12.48)  19.47 (9.31)  28.62 (33.34)  35.96 (27.67) 
0.004  10.83 (3.96)  11.14 (3.84)  18.89 (12.69)  19.48 (9.33)  28.61 (33.27)  35.93 (27.63) 
0.005  10.85 (3.96)  11.15 (3.84)  18.96 (12.84)  19.52 (9.35)  28.48 (33.04)  35.93 (27.61) 
0.006  10.89 (3.93)  11.16 (3.83)  18.95 (12.79)  19.51 (9.3)  28.58 (33.08)  35.89 (27.5) 
\diagbox$\mathrm{\Delta}t$$\mathrm{\Delta}x$  0.02  0.04  0.05  
RLWENO  WENO  RLWENO  WENO  RLWENO  WENO  
0.002  10.05 (2.74)  10.25 (2.65)  16.89 (3.59)  17.09 (3.56)  17.07 (4.2)  17.4 (4.37) 
0.003    10.26 (2.65)  16.9 (3.59)  17.1 (3.56)  17.09 (4.2)  17.42 (4.37) 
0.004      16.9 (3.6)  17.1 (3.56)  17.1 (4.2)  17.43 (4.37) 
0.005        17.12 (3.56)  17.11 (4.22)  17.43 (4.38) 
In this section, we describe the training and testing of the proposed RL conservation law solver, and compare it with WENO and some recently proposed supervise learningbased methods for solving conservation laws: L3D [3] and PINNs [27, 26].
4.1 Training and Testing Setup
In this subsection, we explain the general training and testing setup. We train the RL policy network on the Burgers equation, whose flux is computed as $f(u)=\frac{1}{2}{u}^{2}$. In all the experiments, we set the leftshift $r=2$ and the right shift $s=3$. The state function ${g}_{s}({s}_{j})={g}_{s}({U}_{jr1},\mathrm{\dots},{U}_{j+s})$ will generate two vectors:
$${s}^{l}=({f}_{jr1},\mathrm{\dots},{f}_{j+s1},{\overline{a}}_{j\frac{1}{2}})\mathit{\hspace{1em}}\text{and}\mathit{\hspace{1em}}{s}^{r}=({f}_{jr},\mathrm{\dots},{f}_{j+s},{\overline{a}}_{j+\frac{1}{2}})$$ 
for computing ${\widehat{f}}_{j\frac{1}{2}}$ and ${\widehat{f}}_{j+\frac{1}{2}}$ respectively. ${s}_{l}$ and ${s}_{r}$ will be passed into the same policy neural network ${\pi}_{\theta}^{RL}$ to produce the desired actions, as described in Section 3.2. The reward function ${g}_{r}$ simply computes the infinity norm, i.e.,
$${g}_{r}({U}_{jr1}{u}_{jr1},\mathrm{\dots},{U}_{j+s}{u}_{j+s})={({U}_{jr1}{u}_{jr1},\mathrm{\dots},{U}_{j+s}{u}_{j+s})}_{\mathrm{\infty}}.$$ 
The policy network ${\pi}_{\theta}^{RL}$ is a feedforward Multilayer Perceptron with $6$ hidden layers, each has 64 neurons and use ReLU [12] as the activation function. We use the Twin Delayed Deep Deterministic (TD3) policy gradient algorithm [11] to train the RL policy.
We consider the following three settings of the Burgers Equation:
$$u{(x,t)}_{t}+{(\frac{1}{2}u{(x,t)}^{2})}_{x}=\eta u{(x,t)}_{xx}+F(x,t),u(x,0)={u}_{0}(x),x\in D,t\in [0,T]$$  (4.1) 

•
Inviscid: inviscid Burgers Equation with different random initial conditions:
$$\begin{array}{cc}& \eta =0,F(x,t)=0\hfill \\ & {u}_{0}(x)=a+b\cdot \mathrm{sin}(c\pi x)+d\cdot \mathrm{cos}(e\pi x)\hfill \\ & a+b+d=4,a\le 1.2,b\le 3a,c\in \{4,6,8\}.\hfill \end{array}$$ (4.2) The reason for designing such kind of initial conditions is to ensure they have similar degree of smoothness and thus similar level of difficulty in learning. For training, we randomly generate $25$ such initial conditions, and we always set $c=8$ to ensure enough shocks in the solutions. For testing, we randomly generate another $25$ initial conditions in the same form with $c$ chosen in $\{4,6\}$. During training, the computation domain is $D=[1,1]$, and the terminal time $T$ is randomly sampled from $[0.25,1]$ with equal probability; during test, the terminal time is fixed at $T=0.9$. We train on two different grid sizes : $(\mathrm{\Delta}x,\mathrm{\Delta}t)\in \{(0.02,0.002),(0.04,0.004)\}$.

•
Forcing^{‡}^{‡} ‡ This setting is exactly the same as the training/testing setting used in L3D [3].: Viscous Burgers Equation with forcing, and a fixed initial value.
$$\begin{array}{cc}& \eta \in \{0.01,0.02,0.04\},{u}_{0}(x)=0\hfill \\ & F(x,t)=\sum _{i=1}^{N}{A}_{i}\mathrm{sin}({\omega}_{i}t+2\pi {l}_{i}x/L+{\psi}_{i})\hfill \\ & N=20,{A}_{i}\in [0.5,0.5],{\omega}_{i}\in [0.4,0.4],{\varphi}_{i}\in [0,2\pi ],{l}_{i}\in \{3,4,5,6\}\hfill \end{array}$$ (4.3) During training, $\eta $ is fixed at $0.01$, and we test with $\eta $ in $\{0.01,0.02,0.04\}$. 50 random forcings are generated, the first half is used for training, and the second half is used for testing. During training, the computation domain is $D=[0,2\pi ]$, and the terminal time $T$ is randomly sampled from $[0.25\pi ,\pi ]$ with equal probability. The test terminal time is fixed at $T=0.9\pi $. We train on two different grid sizes: $(\mathrm{\Delta}x,\mathrm{\Delta}t)\in \{(0.02\pi ,0.002\pi ),(0.04\pi ,0.004\pi )\}$.

•
Viscous^{§}^{§} § Note that the viscous Burgers Equation is easier to solve compared with the inviscid version, as the viscous term smooths out the solution and helps remove shocks in the solution.: Viscous Burgers Equation with different random initial conditions: $\eta \in \{0.01,0.02,0.04\}$, and $F(x,t)=0$. The viscous term is handled using central difference schemes. The initial conditions are generated in the same form as in equation 4.2. For both training and testing, we sample 25 initial conditions with $c$ in $\{4,6\}$. For training, we fix $\eta =0.01$, while for testing, we test with different values of $\eta $ in $\{0.01,0.02,0.04\}$. The computation domain, and training grid sizes are the same as the inviscid version.
When training the RL agent, we use the Euler scheme for temporal discretization. During test, we always use the 4th order RungeKutta (RK4) for temporal discretization. The true solution needed for computing the reward and for evaluation is generated using WENO on the same computation domain with a much finer grid ($\mathrm{\Delta}x=0.002$, $\mathrm{\Delta}t=0.0002$) and use the RK4 as the temporal discretization. In the following, we denote the policy network that generates the weights of the WENO fluxes (as described in Section 3.2) as RLWENO. In all the settings, we compare RLWENO with WENO. As L3D is (only) demonstrated to work in the Forcing setting, we also compare RLWENO with L3D in this setting. We also compare RLWENO with PINNs in both the inviscid and viscous setting. To demonstrate the ability to perform outofdistribution generalization of RLWENO, we train RLWENO on one of the three settings and test it on the other two settings without finetuning or retraining. For all experiments, the evaluation metric is the relative L2 error compared with the true solution computed using WENO on a finer grid as described above. See Table 3 for a summary and roadmap for all the experiments.
We also give a summary of the inference time for RLWENO, WENO, L3D, and PINNs in Table 4. RLWENO, WENO and L3D are tested in the forcing setting, averaged over 25 random forcings, and PINNs is tested in the viscous setting, averaged over 25 different random initial conditions. For PINNs, as it needs to be trained on each different initial condition, the inference time is the same as the training time.
\diagboxTrainTest 



















$(\mathrm{\Delta}x,\mathrm{\Delta}t)$  RLWENO  WENO  L3D  PINN 

$(0.02,0.002)$  2.23  4.14  0.59  2148.34 
$(0.04,0.004)$  0.87  1.09  0.51  2024.14 
$(0.05,0.005)$  0.6  0.71  0.5  2058.66 
(a) Smooth regions, $f(u)=\frac{1}{2}{u}^{2}$  (b) Near singularities, $f(u)=\frac{1}{2}{u}^{2}$ 
4.2 inviscid Burgers Equation with random initial conditions
Generalization to new mesh sizes, flux function, and temporal discretization.
We first test whether the RL policy trained in this setting can generalize to different temporal discretization schemes, mesh sizes and flux functions that are not included in training. Table 1 and Table 2 present the comparison results, where the number shows the relative error (computed as $\frac{{Uu}_{2}}{{u}_{2}}$ with the 2norm taking over all $x$) between the approximated solution $U$ and the true solution $u$, averaged over all evolving steps ($T=0.9$) and 25 random test initial values. Numbers in the bracket shows the standard deviation over the 25 initial values. Several entries in the table are marked as ‘’ because the corresponding CFL number is not small enough to guarantee convergence.
Our experimental results show that, compared with the high order accurate WENO (5th order accurate in space and 4th order accurate in time), the linear weights learned by RL not only achieves smaller errors, but also generalizes well to:

•
New time discretization scheme: trained with Euler, tested with RK4;
 •
 •
Figure 1 shows some examples of the solutions. As one can see, the solutions generated by RLWENO not only achieve the same accuracy as WENO at smooth regions, but also have clear advantage over WENO near singularities which is particularly challenging for numerical PDE solvers and important in applications.
More analysis on how RLWENO chooses the weights.
First, as shown in Figure 2, RLWENO can indeed correctly determine upwind directions and generate local numerical schemes in an adaptive fashion. More interestingly, Figure 2 further shows that comparing to WENO, RLWENO seems to be able to select stencils in a different way from it, and eventually leads to a more accurate solution.
As mentioned in Section 2.2, WENO itself already achieves an optimal order of accuracy in the smooth regions. Since RLWENO can further improve upon WENO, it should have achieved higher accuracy especially near singularities. Therefore, we provide additional demonstrations on how RLWENO performs in the smooth/singular regions. We run RLWENO and WENO on a set of initial conditions, and record the approximation errors at every spatialtemporal location. We then separate the errors in the smooth and singular regions, and compute the distribution of the errors on the entire spatialtemporal grids with multiple initial conditions. The results are shown in figure 3. In figure 3, the $x$axis is the logarithmic (base 10) value of the error and the yaxis is the number of grid points whose error is less than the corresponding value on the $x$axis, i.e., the accumulated distribution of the errors. The results illustrate that RLWENO indeed performs better than WENO near singularities.
Given that RLWENO performs better than WENO at singularities, we further investigate how it inferences the weights differently from WENO. Figure 4 shows the distribution of weights computed by both methods near singularities. Again, it can be seen that RLWENO generates the weights in a different way from WENO. Interestingly, the smaller standard deviation of RLWENO also seems to show that RLWENO is more confident on its choice of the weights. These results suggest that the proposed RL framework has the potential to surpass human experts in designing numerical schemes for conservation laws.
Comparison with PINNs
$\eta $  $\mathrm{\Delta}x$  RLWENO  WENO  PINNs 

0  0.02  10.81 (3.96)  11.13 (3.83)  40.06(18.82) 
0.04  18.89 (12.69)  19.48 (9.33)  51.61(23.68)  
0.05  28.48 (33.04)  35.93 (27.61)  51.18(18.31)  
0.01  0.02  1.64 (0.57)  1.94 (0.55)  34.09(26.92) 
0.04  4.29 (0.93)  4.93 (0.73)  48.28(29.71)  
0.05  7.6 (3.94)  7.66 (1.71)  47.95(19.86)  
0.02  0.02  0.74 (0.35)  0.87 (0.37)  31.13(34.35) 
0.04  1.91 (0.58)  2.33 (0.51)  47.41(35.84)  
0.05  2.97 (1.07)  3.64 (0.77)  43.05(20.1))  
0.04  0.02  0.42 (0.21)  0.41 (0.21)  34.98(37.83) 
0.04  0.87 (0.38)  0.96 (0.36)  52.27(44.74)  
0.05  1.38 (0.61)  1.52 (0.5)  47.0(24.47) 
PINNs [27] directly uses a deep neural network to approximate the solution $u$ to the conservation law (2.1). Therefore, it needs to be trained individually for each different initial condition, as the underlying solution is different. The implementation of PINNs is based on the released codes by the authors ^{¶}^{¶} ¶ https://github.com/maziarraissi/PINNs. Note that we assume periodic boundary condition of (2.1). Therefore, we modified the released codes of PINNs according to another related online codes ^{∥}^{∥} ∥ https://github.com/MinglangYin/PyTorchTutorial/tree/master/PINNs. The comparison result is shown in Table 5. In fact, PINNs performs poorly with relatively complicated initial conditions such as the ones we chose in Eq. (4.2). Indeed, when we simplify the initial condition, e.g., ${u}_{0}(x)=\mathrm{sin}(2\pi x)$, the performance of PINNs is significantly improved (see a comparison in Figure 5). To resolve this issue, we may choose more expressive neural networks to approximate the solutions. However, this may also increase the difficulty of training.
Outofdistribution generalization to the other two test settings.
We now test if RLWENO trained in the inviscid setting can generalize to the other two settings: viscous and forcing. Table 6 and 7 show the results. Surprisingly, RLWENO generalizes relatively well to these two test settings without any retraining or finetuning. We attribute the good generalization ability of RLWENO to our careful action design, which essentially makes RLWENO a metalearner under the WENO framework and thus have strong outofdistribution generalization.
\diagbox$\mathrm{\Delta}x$$\eta $  0.01  0.02  0.04  
RLWENO  WENO  RLWENO  WENO  RLWENO  WENO  
0.02  4.75 (0.73)  4.74 (0.7)  2.38 (0.46)  2.59 (0.45)  1.07 (0.2)  1.02 (0.27) 
0.04  10.78 (1.4)  10.3 (1.35)  6.95 (1.15)  6.6 (0.95)  3.84 (0.7)  3.68 (0.62) 
0.05  13.97 (1.93)  13.55 (2.07)  9.76 (1.44)  9.33 (1.4)  5.67 (0.89)  5.42 (0.81) 
\diagbox$\mathrm{\Delta}x$$\eta $  0.01  0.02  0.04  
RLWENO  WENO  RLWENO  WENO  RLWENO  WENO  
0.02  1.85 (0.58)  1.94 (0.55)  0.85 (0.38)  0.87 (0.37)  0.45 (0.22)  0.41 (0.21) 
0.04  4.91 (1.64)  4.93 (0.73)  2.34 (0.82)  2.33 (0.51)  1.14 (0.58)  0.96 (0.36) 
0.05  8.31 (5.49)  7.66 (1.71)  3.59 (1.05)  3.64 (0.77)  1.84 (0.88)  1.52 (0.5) 
4.3 Viscous Burgers Equation with forcing and fixed initial value
$\eta $  $\mathrm{\Delta}x$  RLWENO  WENO  L3D 

0.01  0.02  4.64 (0.7)  4.74 (0.7)  3.09 (0.9) 
0.04  10.36 (1.45)  10.3 (1.35)  nan (nan)  
0.05  13.33 (2.05)  13.55 (2.07)  nan (nan)  
0.02  0.02  2.41 (0.43)  2.59 (0.45)  6.89 (0.74) 
0.04  6.53 (1.13)  6.6 (0.95)  4.08 (1.12)  
0.05  9.19 (1.54)  9.33 (1.4)  7.58 (1.11)  
0.04  0.02  1.09 (0.21)  1.02 (0.27)  11.01 (1.3) 
0.04  3.42 (0.65)  3.68 (0.62)  11.22 (1.22)  
0.05  4.97 (0.91)  5.42 (0.81)  9.37 (1.21) 
Comparision between WENO and L3D.
In this section, we train RLWENO in the forcing setting, and compare it with WENO and a recently proposed supervise learningbased method: L3D [3]. Note that this setting we use (Eq. (4.3)) is exactly the same as the training/testing setting used in L3D.
For L3D, we use the code released by the authors^{**}^{**} ** https://github.com/google/datadrivendiscretization1d, and train a model using their provided dataset. After training, we test the trained model on different grid sizes and viscosity.
The test results are shown in Table 8. We observe that in three cases, L3D reports lower error than WENO and RLWENO, and in all other test cases, L3D reports higher error than RLWENO and WENO. Notably, the trained model by L3D diverges in two of the test cases, where the solution blows up during the evolution of the equation. This shows that L3D has worse generalization compared with RLWENO.
Outofdistribution generalization to the other two settings.
\diagbox$\mathrm{\Delta}x$$\eta $  0  0.01  0.02  0.04  
RLWENO  WENO  RLWENO  WENO  RLWENO  WENO  RLWENO  WENO  
0.02  13.06 (3.22)  11.13 (3.83)  1.9 (0.58)  1.94 (0.55)  0.92 (0.37)  0.87 (0.37)  0.49 (0.22)  0.41 (0.21) 
0.04  18.7 (7.65)  19.48 (9.33)  4.76 (1.19)  4.93 (0.73)  2.3 (0.78)  2.33 (0.51)  1.13 (0.62)  0.96 (0.36) 
0.05  28.57 (22.3)  35.88 (27.48)  7.75 (2.76)  7.66 (1.71)  3.51 (1.24)  3.64 (0.77)  1.72 (0.95)  1.52 (0.5) 
The generalization results of RLWENO to the inviscid and viscous settings are shown in Table 9. We again see that RLWENO generalizes relatively well. We tried to train L3D in the inviscid/viscous settings, but could not get any decent models. Besides, the L3D model trained under the forcing setting cannot generalize to other settings (the model blows up during the evolution of the equation). In contrast, RLWENO has much stronger flexibility and generalization: decent policies can be obtained when trained under various settings, and the model trained under one setting can generalize relatively well to other settings. Table 3 gives a thorough discussion of the generalization ability of L3D and RLWENO.
4.4 Viscous Burgers Equation with random initial conditions
\diagbox$\mathrm{\Delta}x$$\eta $  0  0.01  0.02  0.04  
RLWENO  WENO  RLWENO  WENO  RLWENO  WENO  RLWENO  WENO  
0.02  11.17 (3.8)  11.13 (3.83)  1.64 (0.57)  1.94 (0.55)  0.74 (0.35)  0.87 (0.37)  0.42 (0.21)  0.41 (0.21) 
0.04  19.37 (11.67)  19.48 (9.33)  4.29 (0.93)  4.93 (0.73)  1.91 (0.58)  2.33 (0.51)  0.87 (0.38)  0.96 (0.36) 
0.05  41.44 (45.05)  35.88 (27.48)  7.6 (3.94)  7.66 (1.71)  2.97 (1.07)  3.64 (0.77)  1.38 (0.61)  1.52 (0.5) 
Comparison between WENO and PINNs.
The test results of RLWENO compared with WENO are shown in the last 3 columns of Table 10. We again see that in most cases, RLWENO outperforms WENO.
The comparison results of RLWENO versus PINNs are shown in Table 5. Again, we see that PINNs performs poorly, due to the complexity of our chosen initial values.
Outofdistribution generalization to the other two test settings.
\diagbox$\mathrm{\Delta}x$$\eta $  0.01  0.02  0.04  
RLWENO  WENO  RLWENO  WENO  RLWENO  WENO  
0.01  4.66 (0.72)  4.74 (0.7)  2.33 (0.47)  2.59 (0.45)  1.01 (0.19)  1.02 (0.27) 
0.02  10.28 (1.33)  10.3 (1.35)  6.43 (0.88)  6.6 (0.95)  3.43 (0.61)  3.68 (0.62) 
0.04  13.38 (2.3)  13.55 (2.07)  9.08 (1.57)  9.33 (1.4)  5.19 (0.92)  5.42 (0.81) 
5 Conclusion and Future Work
In this paper, we proposed a general framework to learn how to solve 1dimensional conservation laws via deep reinforcement learning. We first discussed how the procedure of numerically solving conservation laws can be naturally cast in the form of Markov Decision Process. We then elaborated how to relate notions in numerical schemes of PDEs with those of reinforcement learning. In particular, we introduced a numerical flux policy which was able to decide on how numerical flux should be designed locally based on the current state of the solution. We carefully design the action of our RL policy to make it a metalearner. Our numerical experiments showed that the proposed RL based solver was able to outperform high order WENO, which is one of the most delicate and successful numerical solvers for conservation laws. Furthermore, RL based solver has notably better generalization than the supervised learning based approach L3D and PINNs, especially for outof distribution generalization, i.e. when the test settings are vastly different from the training settings.
As part of the future works, we would like to consider using the numerical flux policy to inference more complicated numerical fluxes with guaranteed consistency and stability. Furthermore, we can use the proposed framework to learn a policy that can generate adaptive grids and the associated numerical schemes. Lastly, we would like consider system of conservation laws in 2nd and 3rd dimensional space.
Acknowledgments
Bin Dong is supported in part by National Natural Science Foundation of China (NSFC) grant No. 11831002, Beijing Natural Science Foundation (No. 180001) and Beijing Academy of Artificial Intelligence (BAAI).
References
 [1] (2016) Learning to learn by gradient descent by gradient descent. In Advances in Neural Information Processing Systems, pp. 3981–3989. Cited by: 5th item.
 [2] (2018) Datadriven discretization: a method for systematic coarse graining of partial differential equations. arXiv preprint arXiv:1808.04930. Cited by: §1.1.
 [3] (2019) Learning datadriven discretizations for partial differential equations. Proceedings of the National Academy of Sciences 116 (31), pp. 15344–15349. Cited by: §1.2, §4.3, §4, footnote ‡.
 [4] (2017) Machine learning approximation algorithms for highdimensional fully nonlinear partial differential equations and secondorder backward stochastic differential equations. Journal of Nonlinear Science, pp. 1–57. Cited by: §1.1.
 [5] (1992) On the optimization of a synaptic learning rule. In Preprints Conf. Optimality in Artificial and Biological Neural Networks, pp. 6–8. Cited by: 5th item.
 [6] (2019) DeepMoD: deep learning for model discovery in noisy data. arXiv preprint arXiv:1904.09406. Cited by: §1.1.
 [7] (2020) Controlling oscillations in highorder discontinuous galerkin schemes using artificial viscosity tuned by neural networks. Journal of Computational Physics, pp. 109304. Cited by: §1.1.
 [8] (2018) Decouple learning for parameterized image operators. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 442–458. Cited by: 5th item.
 [9] (2018) A multiscale neural network based on hierarchical matrices. arXiv preprint arXiv:1807.01883. Cited by: §1.1.
 [10] (2017) Modelagnostic metalearning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine LearningVolume 70, pp. 1126–1135. Cited by: 5th item.
 [11] (2018) Addressing function approximation error in actorcritic methods. arXiv preprint arXiv:1802.09477. Cited by: §4.1.
 [12] (2016) Deep learning. MIT press. Cited by: §4.1.
 [13] (2018) Solving highdimensional partial differential equations using deep learning. Proceedings of the National Academy of Sciences 115 (34), pp. 8505–8510. Cited by: §1.1.
 [14] (2010) A new tvd fluxlimiter method for solving nonlinear hyperbolic equations. Journal of Computational and Applied Mathematics 234 (5), pp. 1395–1403. Cited by: §1.
 [15] (2017) Noiseblind image deblurring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3510–3518. Cited by: 5th item.
 [16] (2017) Solving parametric pde problems with artificial neural networks. arXiv preprint arXiv:1707.03351. Cited by: §1.1.
 [17] (2018) SwitchNet: a neural network model for forward and inverse scattering problems. arXiv preprint arXiv:1810.09675. Cited by: §1.1.
 [18] (1992) Numerical methods for conservation laws. Vol. 132, Springer. Cited by: §1, §2.2.
 [19] (2002) Finite volume methods for hyperbolic problems. Vol. 31, Cambridge university press. Cited by: §1.
 [20] (2016) Learning to optimize. arXiv preprint arXiv:1606.01885. Cited by: 5th item.
 [21] (2019) Variational training of neural network approximations of solution maps for physical models. arXiv preprint arXiv:1905.02789. Cited by: §1.1.
 [22] (1994) Weighted essentially nonoscillatory schemes. Journal of computational physics 115 (1), pp. 200–212. Cited by: §1, §2.2.
 [23] (2019) Constraintaware neural networks for riemann problems. arXiv preprint arXiv:1904.12794. Cited by: §1.1.
 [24] (1993) Matching pursuits with timefrequency dictionaries. IEEE Transactions on signal processing 41 (12), pp. 3397–3415. Cited by: 6th item.
 [25] (2019) Solving irregular and dataenriched differential equations using deep neural networks. arXiv preprint arXiv:1905.04351. Cited by: §1.1.
 [26] (2019) Physicsinformed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics 378, pp. 686–707. Cited by: §1.1, §1.2, §4.
 [27] (2017) Physics informed deep learning (part i): datadriven solutions of nonlinear partial differential equations. arXiv preprint arXiv:1711.10561. Cited by: §1.1, §1.2, §4.2, §4.
 [28] (2018) An artificial neural network as a troubledcell indicator. Journal of Computational Physics 367, pp. 166–191. Cited by: §1.1.
 [29] (1987) Evolutionary principles in selfreferential learning. On learning how to learn: The metameta… hook.) Diploma thesis, Institut f. Informatik, Tech. Univ. Munich. Cited by: 5th item.
 [30] (1998) Essentially nonoscillatory and weighted essentially nonoscillatory schemes for hyperbolic conservation laws. In Advanced numerical approximation of nonlinear hyperbolic equations, pp. 325–432. Cited by: 3rd item, §1, §2.2.
 [31] (2018) DGM: a deep learning algorithm for solving partial differential equations. Journal of computational physics 375, pp. 1339–1364. Cited by: §1.1.
 [32] (2018) Metalearning: a survey. arXiv preprint arXiv:1810.03548. Cited by: 5th item.
 [33] (2019) General solutions for nonlinear differential equations: a rulebased selflearning approach using deep reinforcement learning. Computational Mechanics, pp. 1–14. Cited by: §1.1.
 [34] (2018) The deep ritz method: a deep learningbased numerical algorithm for solving variational problems. Communications in Mathematics and Statistics 6 (1), pp. 1–12. Cited by: §1.1.
 [35] (2019) Dynamically unfolding recurrent restorer: a moving endpoint control method for image restoration. ICLR. Cited by: 5th item.
 [36] (2005) A fast sweeping method for eikonal equations. Mathematics of computation 74 (250), pp. 603–627. Cited by: 6th item.