Implicit Neural Solver for Time-dependent Linear PDEs with Convergence Guarantee

  • 2019-10-08 15:20:01
  • Suprosanna Shit, Abinav Ravi, Ivan Ezhov, Jana Lipkova, Marie Piraud, Bjoern Menze
  • 0

Abstract

Fast and accurate solution of time-dependent partial differential equations(PDEs) is of key interest in many research fields including physics,engineering, and biology. Generally, implicit schemes are preferred over theexplicit ones for better stability and correctness. The existing implicitschemes are usually iterative and employ a general-purpose solver which may besub-optimal for a specific class of PDEs. In this paper, we propose a neuralsolver to learn an optimal iterative scheme for a class of PDEs, in adata-driven fashion. We achieve this goal by modifying an iteration of anexisting semi-implicit solver using a deep neural network. Further, we providetheoretical proof that our approach preserves the correctness and convergenceguarantees provided by the existing iterative-solvers. We also demonstrate thatour model generalizes to a different parameter setting than the one seen duringtraining and achieves faster convergence compared to the semi-implicit schemes.

 

Quick Read (beta)

Implicit Neural Solver for Time-dependent Linear PDEs with Convergence Guarantee

Suprosanna Shit
Technical University Munich
[email protected]
&
Abinav Ravi
Technical University Munich
[email protected]
&
Ivan Ezhov
Technical University Munich
[email protected]
&
Jana Lipkova
Technical University Munich
[email protected]
&
Marie Piraud
Konica Minolta Laboratory Europe
[email protected]
&
Bjoern Menze
Technical University Munich
[email protected]

Abstract

Fast and accurate solution of time-dependent partial differential equations (PDEs) is of key interest in many research fields including physics, engineering, and biology. Generally, implicit schemes are preferred over the explicit ones for better stability and correctness. The existing implicit schemes are usually iterative and employ a general-purpose solver which may be sub-optimal for a specific class of PDEs. In this paper, we propose a neural solver to learn an optimal iterative scheme for a class of PDEs, in a data-driven fashion. We achieve this goal by modifying an iteration of an existing semi-implicit solver using a deep neural network. Further, we provide theoretical proof that our approach preserves the correctness and convergence guarantees provided by the existing iterative-solvers. We also demonstrate that our model generalizes to a different parameter setting than the one seen during training and achieves faster convergence compared to the semi-implicit schemes.

 

Implicit Neural Solver for Time-dependent Linear PDEs with Convergence Guarantee


  Suprosanna Shit Technical University Munich [email protected] Abinav Ravi Technical University Munich [email protected] Ivan Ezhov Technical University Munich [email protected] Jana Lipkova Technical University Munich [email protected] Marie Piraud Konica Minolta Laboratory Europe [email protected] Bjoern Menze Technical University Munich [email protected]

\@float

noticebox[b]33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\[email protected]

1 Introduction

Time-dependent partial differential equations (PDEs) are an essential mathematical tool to describe numerous physical processes such as heat transfer, wave propagation, quantum transport, and tumor growth. Solving the initial-value problem (IVP) and boundary-value problem (BVP) accurately and efficiently is of primary research interest in computational science. Numerical solution of time-dependant PDEs requires appropriate spatio-temporal discretization. Spatial discretization can be cast as either finite difference method (FDM) or finite element method (FEM) or finite volume method (FVM), whereas temporal discretization relies on either explicit, implicit or semi-implicit methods. Explicit temporal update rules are generally a single or few forward computation steps, while implicit or semi-implicit update rules, such as Crank-Nicolson’s scheme, resort to a fixed-point iterative scheme. Small time and spatial resolution facilitate a more accurate solution, however, increases computational burden at the same time. Moreover, maximally allowed spatio-temporal resolution is not only constrained by the desired accuracy but also limited to numerical stability criteria. Note that implicit and semi-implicit methods offer a relaxed stability constraints (sometimes unconditionally stable) at the expense of an increased computational cost caused by the iterative schemes.

In recent times, deep neural networks [10] have gained significant attention by numerical computation community due to its superior performance in solving forward simulations [9] and inverse-problems [3]. Recent work by Tompson et al. [11] shows a data-driven convolutional neural network (CNN) can accelerate fluid simulation compared to traditional numerical schemes. Long et al. [8] shows that learned differential operator can outperform hand-designed discrete schemes. However, on the contrary to the well understood and theoretically grounded classical methods, the deep learning-based approaches rely mainly on empirical validity. Recently, Hsieh et al. [5] develop a promising way to learn numerical solver while providing a theoretical convergence guarantee. They demonstrate that a feed-forward CNN, which is trained to mimic a single iteration of a linear solver, can deliver faster solution than the handcrafted solver. Astonishingly for time-dependent PDEs, the temporal update step of the previously proposed neural schemes relies on an explicit forward Euler method and none of them is capable of making use of the advanced implicit and semi-implicit methods. This limitation restricts the general use of neural architectures in solving time-dependent PDEs.

In this paper, we introduce a novel neural solver for time-dependant linear PDEs. Motivated by [5] we construct a neural iterator from a semi-implicit update rule. We replace a single iteration of the semi-implicit scheme with a learnable parameterized function such that the fixed point of the algorithm is preserved. To leverage the theoretical guarantees we perform data-driven learning to enhance the convergence speed. As a result, our approach provides: (i) theoretical guarantees of convergence to the correct stationary solution, (ii) faster convergence than existing solvers, and (iii) generalizes to a resolutions and parameter settings very different from the ones seen at training time.

2 Methodology

In this section, we first introduce the necessary background on the semi-implicit method for time-dependant linear PDEs and subsequently describe our proposed neural solver.

2.1 Background

In the following, we consider the general IVP form of time-dependant linear PDE for the variable of interest u within the computation domain Ω, w.r.t. the spatial variable x and temporal variable t, subject to Dirichlet boundary condition b(x,t) at the boundary Γ

ut=(u,x,t;Θ),xΩ, s.t. u(x,t)=b(x,t),xΓ and ut0=u0; (1)

(u,x,t;Θ) is a linear operator and can be discretized as i=1NΘiiδxpiu, with uniform spatial discretization step δx and PDE parameter set Θ={Θi}i=1:N, where Θi is a diagonal matrix comprising the coefficients of the differential operator i of order pi. Let’s call u(x,t) as ut for simplicity. A first order semi-implicit update rule to get ut+δt from ut (with time step δt) is given by

ut+δt-utδt=ϵ(u,x,t+δt;Θ)+(1-ϵ)(u,x,t;Θ);[0<ϵ1] (2)

To obtain ut+δt one needs to solve the following linear system of equations

(I-δtϵi=1NΘidiδxpi)ut+δt=δtϵi=1NΘi(i-diI)δxpiut+δt+c(ut,Θ,δx,δt,ϵ;) (3)

where c is independent of ut+δt and di is the central element of the central difference discretization of operator i. Note that for central difference scheme, i-diI is real, zero-diagonal, and either circulant or skew-circulant matrix. One can use an iterative scheme to compute ut+δt from an arbitrary initialization ut+δt0 on the right-hand-side of Eq. 3. For simplicity of notation, we refer to (I-δtϵj=1NΘjdjδxpj)-1δtϵΘiδxpi as Λi, and, we drop the subscript of u and use a superscript to denote a single iteration at a time t+δt. We enforce Dirichlet boundary condition using a projection step with a binary boundary mask G.

um+1=G(i=1NΛi(i-diI)um+c)+(I-G)b (4)

Eq. 4 can be seen as a linear operator um+1=Ψ(um)=Tum+k. We can guarantee the spectral radius of the linear transformer T, i.e. ρ(T)<1, by appropriately selecting δx,δt, and ϵ [see Appendix A], leading to a fixed-point algorithm.

Figure 1: Qualitative comparison of u (c.f. Eq 7) from the neural scheme (10 iterations) and a semi-implicit scheme (25 iterations) against the FEniCS solution for a test sequence of 10 time points. All methods use same initial- and boundary condition. The neural update shows consistently faster convergence than semi-implicit one.

2.2 Neural Solver

We propose the following iterator

ΦH(u)=Ψ(u)+G(i=1NΛiHiw) (5)

where w=Ψ(u)-u, and Hi is a learned linear operator which satisfies Hi0=0 for i=1:N. Substituting w in Eq. 5 we get ΦH(u)=Tu+k, where k denotes the additive part which is independent of u, and T=T+Gi=1NΛiHi(T-I).

Lemma 1.

For a linear PDE problem ({Θi,i}i=1:N,G,u0,b,δx,δt,ϵ) and choice of {Hi}i=1:N if u* is a fixed point of Ψ, it is a fixed point of ΦH in Eq. 5.

Proof.

Based on the iterative rule in Eq.5 , if u* satisfies Ψ(u*)=u* then w=Ψ(u*)-u*=0 . Therefore, ΦH(u*)=Ψ(u*)+Gi=1NΛiHi0=u*.

Moreover, the space of ΦH subsumes the standard solver Ψ. If Hi=0, then ΦH=Ψ. Furthermore, if Hi=i , then since GT=T

ΦH(u)=Ψ(u)+GT(Ψ(u)-u)=TΨ(u)+c=Ψ2(u) (6)

which is equal to two iterations of Ψ. Because computing Φ requires two convolutions: Hi and T one iteration with ΦH has the same number of convolution operations as two iterations of Ψ. This shows that we can learn an Hi such that our iterator ΦH is at least as the standard solver Ψ. In the following theorem, we show that there is a convex open set of Hi that the learning algorithm can explore.

Theorem 1.

For fixed G,{Θi}i=1:N,u0,b,δx,δt, and ϵ the spectral norm of ΦH(u;G,{Θi}i=1:N,u0,b,δx,δt,ϵ) is a convex function of {Hi}i=1:N and the set of {Hi}i=1:N such that the spectral norm of ΦH(u)<1 is a convex open set.

Proof.

See Appendix B. ∎

On a stark contrast with previous work [5], we have several sets of parameters {Θi}i=1:N,δx,δt, and ϵ attached to the PDEs governing equation. Although we train on a single domain, the model posses a generalization properties, which we show in the following.

Proposition 1.

For fixed {i}i=1:N,G, and {Hi}i=1:N, and some u0,b,{Θi}i=1:N,δx,δt, and ϵ, if ΦH(u) is valid iterator for the PDE problem ({Θi,i}i=1:N,G,u0,b,δx,δt,ϵ), then for all u0 and b, the iterator ΦH(u) is valid iterator for the PDE problem ({Θi,i}i=1:N,G,u0,b,δx,δt,ϵ) if Λi<1j=1Nj-djI,i=1:N.

Proof.

See Appendix B. ∎

3 Experiments

We consider a 2-D advection-diffusion equation of the following form

ut=𝐯[xuyu]+𝐃[xxuyyu]; subject to ut0=u0(x,y) (7)

where 𝐯=[vx,vy] and 𝐃=[Dxx,Dyy] are advection velocity and diffusivity respectively. We minimize the following loss function

=1Ln=1Nl=1LΦHn,k(utl)-ut+nδtl;k𝒰[1,20]

where n is the number of time-step and k iteration for a single time step is denoted as ΦH(ΦH(ΦH))=ΦHk.

Data Generation: We consider a rectangular domain of Ω=[0,2π]×[0,2π]. Elements of 𝐯 and 𝐃 are drawn from a uniform distribution of 𝒰[-2.0,2.0] and 𝒰[0.2,0.8] respectively. The computational domain is discretized into 64 x 64 regular mesh. We assume zero Dirichlet boundary condition and the initial value is generated according to [8] as u0=λcos(kx+ly)+γsin(kx+ly) where γ and λ are drawn from a normal distribution of 𝒩(0,0.02), and, k and l are values drawn in random from a uniform distribution of 𝒰[1,9]. We generate 200 simulations each having 50 time steps, using FEniCS [7][1] for δt=0.2. Further, we take the train, test, and validation split of the simulated time series as 80%:10%:10%. A time series of a test data is shown in Fig 1.

Implementation Details: Following [5], we use a three-layer convolutional neural network to model each of the Hi. We use zero-padding to enforce zero Dirichlet condition at the boundary and use a kernel size of 3x3. During training, we fixed the following parameters as follows δx=0.098,δt=0.2,ϵ=0.9. We use PyTorch framework and train our network with Adam Optimizer [6].

3.1 Results and Discussion

Figure 2: (a), (b), and (c) shows the mean-squared error (between FEniCS solution and Semi-implicit scheme and neural scheme) vs time plot for different parameters during test time as specified. The banded curves indicate the 25% and 75% percentile of the normalized errors among 20 test samples.

Generalization for Different Parameters: We investigate the effect of different parameter settings than those used during, to validate the generalizability of the neural scheme. To study the effect of different ϵ we use the original test set. We generate two additional test cases; varying one parameter at a time : a) δt=0.12, and b) δx=0.049. The elements of 𝐯 and 𝐃 are drawn from the same distribution as before. The average error propagation over ten time step between semi-implicit finite difference method and the proposed neural implicit solver is being compared in Figure 2.

We observe that error from the neural scheme (10 iterations per time step) is less compared to the error from the semi-implicit FDM (25 iterations per time step) scheme for all three different test sets. This affirms our hypothesis that the neural solver is more accurate compared to the semi-implicit FDM and generalizable to other parameter settings at the same time.

Run-time Comparison: We compare the run time for the neural solver (10 iterations per time step) and semi-implicit scheme (25 iterations per time step), for one time-steps. The experiments are conducted on an Intel Xeon W-2123 CPU @ 3.60GHz, with code running on one of the four cores. We report that the trained neural solver takes circa 0.0148s compared to 0.0141s for the semi-implicit scheme, whereas the FEniCS solution takes 3.19s for machine precision convergence.

4 Conclusion

This abstract introduces a novel implicit neural scheme to solve linear time-dependent PDEs. We leverage an existing semi-implicit update rule to design a learnable iterator which provides theoretical guarantees. The learned iterator achieves faster convergence compared to the existing semi-implicit solver and produces a more accurate solution for a fixed computation budget. More importantly, we empirically demonstrate that training on a single parameter setting is enough to generalize over other parameter settings which confirms our theoretical results. The learned neural solver can be a faster alternative for simulation of various physical processes.

Acknowledgments

Suprosanna Shit and Ivan Ezhov are supported by the Translational Brain Imaging Training Network (TRABIT) under the European Union’s ‘Horizon 2020’ research & innovation programme (Grant agreement ID: 765148).

References

  • [1] M. Alnæs et al. (2015) The FEniCS project version 1.5. Archive of Numerical Software 3 (100). Cited by: §3.
  • [2] J. Cai et al. (2012) Image restoration: total variation, wavelet frames, and beyond. Journal of the American Mathematical Society 25 (4), pp. 1033–1089. Cited by: Appendix C.
  • [3] I. Ezhov et al. (2019) Neural parameters estimation for brain tumor growth modeling. In Proceedings of the International Conference on Medical Image Computing and Computer Assisted Intervention, Cited by: §1.
  • [4] R. Horn and C. Johnson (1991) Topics in matrix analysis. Cambridge University Press. Cited by: Appendix A, Appendix A.
  • [5] J. Hsieh et al. (2019) Learning neural PDE solvers with convergence guarantees. In Proceedings of the International Conference on Learning Representations, Cited by: Appendix B, §1, §1, §2.2, §3.
  • [6] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. In Proceedings of the International Conference on Learning Representations, Cited by: §3.
  • [7] A. Logg et al. (2012) Automated solution of differential equations by the finite element method: the FEniCS book. Vol. 84, Springer Science & Business Media. Cited by: §3.
  • [8] Z. Long et al. (2018) PDE-net: learning PDEs from data. In Proceedings of the 35th International Conference on Machine Learning, Vol. 80, pp. 3208–3216. Cited by: §1, §3.
  • [9] M. Magill et al. (2018) Neural networks trained to solve differential equations learn general representations. In Advances in Neural Information Processing Systems, pp. 4071–4081. Cited by: §1.
  • [10] M. Raissi et al. (2019) Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics 378, pp. 686 – 707. Cited by: §1.
  • [11] J. Tompson et al. (2017) Accelerating Eulerian fluid simulation with convolutional networks. In Proceedings of the 34th International Conference on Machine Learning, Vol. 70, pp. 3424–3433. Cited by: §1.

Appendix A Convergence Criteria for Semi-implicit Update

The spectral radius of T can be expresses as following

ρ(T) T =Gi=1NΛi(i-diI)
Gi=1NΛi(i-diI);[ Invoking norm inequalities[4]]
=i=1NΛi(i-diI);[G=1]

Given Λi<1j=1N(j-djI);i=1:N, we have ρ(T)<1.

Appendix B Proofs

See 1

Proof.

The spectral norm is convex from the sub-additive property, and T is linear in {Hi}i=1:N. To prove that it is open, observe that is a continuous function, so (T+Gi=1NΛiHi(T-I)) is continuous in {Hi}i=1:N. Given ρ(T), the set of {Hi}i=1:N is the preimage under this continuous function of (0,1-ζ) for some ζ>0, and the inverse image of open set (0,1-ζ) must be open. ∎

Lemma 2.

The upper bound of the spectral norm of ΦH is independent of {Θi}i=1:N,δx,δt, and ϵ Given Λi<1j=1Nj-djI,i=1:N.

Proof.

Considering the spectral norm of T and invoking product and triangular inequality of norm, we obtain the following tight bound

T =Gi=1NΛi(i-diI-Hi)+Gi=1N(ΛiHi)T
Gi=1NΛii-diI-Hi+Gi=1NΛiHiT
<i=1NΛi(i-diI-Hi+Hi)[G=1,T<1]

Given Λi<1j=1Nj-djI,i=1:N, we have

T<1j=1Nj-djIi=1N(i-diI-Hi+Hi)

See 1

Proof.

From Theorem 1 of [5] and Lemma 1, our iterator is valid if and only if ρ(T)<1. From Lemma 2 the upper bound of spectral norm of iterator only depends on {i}i=1:N and {Hi}i=1:N given Λi<1j=1Nj-djI,i=1:N. Nonetheless, for any matrix spectral radius is upper bounded by its spectral norm. Thus, if the iterator is valid for some u0,b,{Θi}i=1:N,δx,δt, and ϵ, then it is valid for any feasible choice of u0,b,{Θi}i=1:N,δx,δt, and ϵ satisfying the constraints. ∎

Appendix C Geometric Interpretation

Surprisingly, we find that the form of T has a special structure. As the denominator is constant the objective is to minimize i-diI-Hi+Hi w.r.t. Hi;i=1:N. Invoking triangular inequality of norm we have the lower-bound

i-diI-Hi+Hii-diI

Taking square of both side, we have

Find Hi such that (i-diI)HiHi

When the equality holds the local optima is the surface of the hyper-sphere with center at 12(i-diI) with radius 12i-diI. We interpret that each Hi explores an optimal discretization scheme near the manifold of i by learning the sum of order rules as described in [2].