### Abstract

Fast and accurate solution of time-dependent partial differential equations(PDEs) is of key interest in many research fields including physics,engineering, and biology. Generally, implicit schemes are preferred over theexplicit ones for better stability and correctness. The existing implicitschemes are usually iterative and employ a general-purpose solver which may besub-optimal for a specific class of PDEs. In this paper, we propose a neuralsolver to learn an optimal iterative scheme for a class of PDEs, in adata-driven fashion. We achieve this goal by modifying an iteration of anexisting semi-implicit solver using a deep neural network. Further, we providetheoretical proof that our approach preserves the correctness and convergenceguarantees provided by the existing iterative-solvers. We also demonstrate thatour model generalizes to a different parameter setting than the one seen duringtraining and achieves faster convergence compared to the semi-implicit schemes.

### Quick Read (beta)

# Implicit Neural Solver for Time-dependent Linear PDEs with Convergence Guarantee

###### Abstract

Fast and accurate solution of time-dependent partial differential equations (PDEs) is of key interest in many research fields including physics, engineering, and biology. Generally, implicit schemes are preferred over the explicit ones for better stability and correctness. The existing implicit schemes are usually iterative and employ a general-purpose solver which may be sub-optimal for a specific class of PDEs. In this paper, we propose a neural solver to learn an optimal iterative scheme for a class of PDEs, in a data-driven fashion. We achieve this goal by modifying an iteration of an existing semi-implicit solver using a deep neural network. Further, we provide theoretical proof that our approach preserves the correctness and convergence guarantees provided by the existing iterative-solvers. We also demonstrate that our model generalizes to a different parameter setting than the one seen during training and achieves faster convergence compared to the semi-implicit schemes.

Implicit Neural Solver for Time-dependent Linear PDEs with Convergence Guarantee

Suprosanna Shit Technical University Munich [email protected] Abinav Ravi Technical University Munich [email protected] Ivan Ezhov Technical University Munich [email protected] Jana Lipkova Technical University Munich [email protected] Marie Piraud Konica Minolta Laboratory Europe [email protected] Bjoern Menze Technical University Munich [email protected]

noticebox[b]33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\[email protected]

## 1 Introduction

Time-dependent partial differential equations (PDEs) are an essential mathematical tool to describe numerous physical processes such as heat transfer, wave propagation, quantum transport, and tumor growth. Solving the initial-value problem (IVP) and boundary-value problem (BVP) accurately and efficiently is of primary research interest in computational science. Numerical solution of time-dependant PDEs requires appropriate spatio-temporal discretization. Spatial discretization can be cast as either finite difference method (FDM) or finite element method (FEM) or finite volume method (FVM), whereas temporal discretization relies on either explicit, implicit or semi-implicit methods. Explicit temporal update rules are generally a single or few forward computation steps, while implicit or semi-implicit update rules, such as Crank-Nicolson’s scheme, resort to a fixed-point iterative scheme. Small time and spatial resolution facilitate a more accurate solution, however, increases computational burden at the same time. Moreover, maximally allowed spatio-temporal resolution is not only constrained by the desired accuracy but also limited to numerical stability criteria. Note that implicit and semi-implicit methods offer a relaxed stability constraints (sometimes unconditionally stable) at the expense of an increased computational cost caused by the iterative schemes.

In recent times, deep neural networks [10] have gained significant attention by numerical computation community due to its superior performance in solving forward simulations [9] and inverse-problems [3]. Recent work by Tompson et al. [11] shows a data-driven convolutional neural network (CNN) can accelerate fluid simulation compared to traditional numerical schemes. Long et al. [8] shows that learned differential operator can outperform hand-designed discrete schemes. However, on the contrary to the well understood and theoretically grounded classical methods, the deep learning-based approaches rely mainly on empirical validity. Recently, Hsieh et al. [5] develop a promising way to learn numerical solver while providing a theoretical convergence guarantee. They demonstrate that a feed-forward CNN, which is trained to mimic a single iteration of a linear solver, can deliver faster solution than the handcrafted solver. Astonishingly for time-dependent PDEs, the temporal update step of the previously proposed neural schemes relies on an explicit forward Euler method and none of them is capable of making use of the advanced implicit and semi-implicit methods. This limitation restricts the general use of neural architectures in solving time-dependent PDEs.

In this paper, we introduce a novel neural solver for time-dependant linear PDEs. Motivated by [5] we construct a neural iterator from a semi-implicit update rule. We replace a single iteration of the semi-implicit scheme with a learnable parameterized function such that the fixed point of the algorithm is preserved. To leverage the theoretical guarantees we perform data-driven learning to enhance the convergence speed. As a result, our approach provides: (i) theoretical guarantees of convergence to the correct stationary solution, (ii) faster convergence than existing solvers, and (iii) generalizes to a resolutions and parameter settings very different from the ones seen at training time.

## 2 Methodology

In this section, we first introduce the necessary background on the semi-implicit method for time-dependant linear PDEs and subsequently describe our proposed neural solver.

### 2.1 Background

In the following, we consider the general IVP form of time-dependant linear PDE for the variable of interest $u$ within the computation domain $\mathrm{\Omega}$, w.r.t. the spatial variable $x$ and temporal variable $t$, subject to Dirichlet boundary condition $b(x,t)$ at the boundary $\mathrm{\Gamma}$

$$\frac{\partial u}{\partial t}=\mathcal{F}(u,x,t;\mathrm{\Theta}),\forall x\in \mathrm{\Omega},\text{s.t.}u(x,t)=b(x,t),\forall x\in \mathrm{\Gamma}\text{and}{u}_{{t}_{0}}={u}_{0};$$ | (1) |

$\mathcal{F}(u,x,t;\mathrm{\Theta})$ is a linear operator and can be discretized as ${\sum}_{i=1}^{N}\frac{{\mathrm{\Theta}}_{i}{\partial}_{i}}{\delta {x}^{{p}_{i}}}u$, with uniform spatial discretization step $\delta x$ and PDE parameter set $\mathrm{\Theta}={\{{\mathrm{\Theta}}_{i}\}}_{i=1:N}$, where ${\mathrm{\Theta}}_{i}$ is a diagonal matrix comprising the coefficients of the differential operator ${\partial}_{i}$ of order ${p}_{i}$. Let’s call $u(x,t)$ as ${u}_{t}$ for simplicity. A first order semi-implicit update rule to get ${u}_{t+\delta t}$ from ${u}_{t}$ (with time step $\delta t$) is given by

$$ | (2) |

To obtain ${u}_{t+\delta t}$ one needs to solve the following linear system of equations

$$\left(I-\delta t\u03f5\sum _{i=1}^{N}\frac{{\mathrm{\Theta}}_{i}{d}_{i}}{\delta {x}^{{p}_{i}}}\right){u}_{t+\delta t}=\delta t\u03f5\sum _{i=1}^{N}\frac{{\mathrm{\Theta}}_{i}({\partial}_{i}-{d}_{i}I)}{\delta {x}^{{p}_{i}}}{u}_{t+\delta t}+c({u}_{t},\mathrm{\Theta},\delta x,\delta t,\u03f5;\partial )$$ | (3) |

where $c$ is independent of ${u}_{t+\delta t}$ and ${d}_{i}$ is the central element of the central difference discretization of operator ${\partial}_{i}$. Note that for central difference scheme, ${\partial}_{i}-{d}_{i}I$ is real, zero-diagonal, and either circulant or skew-circulant matrix. One can use an iterative scheme to compute ${u}_{t+\delta t}$ from an arbitrary initialization ${u}_{t+\delta t}^{0}$ on the right-hand-side of Eq. 3. For simplicity of notation, we refer to ${\left(I-\delta t\u03f5{\sum}_{j=1}^{N}\frac{{\mathrm{\Theta}}_{j}{d}_{j}}{\delta {x}^{{p}_{j}}}\right)}^{-1}\frac{\delta t\u03f5{\mathrm{\Theta}}_{i}}{\delta {x}^{{p}_{i}}}$ as ${\mathrm{\Lambda}}_{i}$, and, we drop the subscript of $u$ and use a superscript to denote a single iteration at a time $t+\delta t$. We enforce Dirichlet boundary condition using a projection step with a binary boundary mask $G$.

$${u}^{m+1}=G\left(\sum _{i=1}^{N}{\mathrm{\Lambda}}_{i}({\partial}_{i}-{d}_{i}I){u}^{m}+c\right)+(I-G)b$$ | (4) |

Eq. 4 can be seen as a linear operator ${u}^{m+1}=\mathrm{\Psi}({u}^{m})=T{u}^{m}+k$. We can guarantee the spectral radius of the linear transformer $T$, i.e. $$, by appropriately selecting $\delta x,\delta t,\text{and}\u03f5$ [see Appendix A], leading to a fixed-point algorithm.

### 2.2 Neural Solver

We propose the following iterator

$${\mathrm{\Phi}}_{H}(u)=\mathrm{\Psi}(u)+G\left(\sum _{i=1}^{N}{\mathrm{\Lambda}}_{i}{H}_{i}w\right)$$ | (5) |

where $w=\mathrm{\Psi}(u)-u$, and ${H}_{i}$ is a learned linear operator which satisfies ${H}_{i}0=0$ for $i=1:N$. Substituting $w$ in Eq. 5 we get ${\mathrm{\Phi}}_{H}(u)={T}^{\prime}u+{k}^{\prime}$, where ${k}^{\prime}$ denotes the additive part which is independent of $u$, and ${T}^{\prime}=T+G{\sum}_{i=1}^{N}{\mathrm{\Lambda}}_{i}{H}_{i}(T-I)$.

###### Lemma 1.

For a linear PDE problem $\mathrm{(}{\mathrm{\{}{\mathrm{\Theta}}_{i}\mathrm{,}{\mathrm{\partial}}_{i}\mathrm{\}}}_{i\mathrm{=}\mathrm{1}\mathrm{:}N}\mathrm{,}G\mathrm{,}{u}_{\mathrm{0}}\mathrm{,}b\mathrm{,}\delta \mathit{}x\mathrm{,}\delta \mathit{}t\mathrm{,}\u03f5\mathrm{)}$ and choice of ${\mathrm{\{}{H}_{i}\mathrm{\}}}_{i\mathrm{=}\mathrm{1}\mathrm{:}N}$ if ${u}^{\mathrm{*}}$ is a fixed point of $\mathrm{\Psi}\mathrm{,}$ it is a fixed point of ${\mathrm{\Phi}}_{H}$ in Eq. 5.

###### Proof.

Based on the iterative rule in Eq.5 $,$ if ${u}^{*}$ satisfies $\mathrm{\Psi}\left({u}^{*}\right)={u}^{*}$ then $w=\mathrm{\Psi}\left({u}^{*}\right)-{u}^{*}=0$ . Therefore, ${\mathrm{\Phi}}_{H}\left({u}^{*}\right)=\mathrm{\Psi}\left({u}^{*}\right)+G{\sum}_{i=1}^{N}{\mathrm{\Lambda}}_{i}{H}_{i}0={u}^{*}.$ ∎

Moreover, the space of ${\mathrm{\Phi}}_{H}$ subsumes the standard solver $\mathrm{\Psi}.$ If ${H}_{i}=0,$ then ${\mathrm{\Phi}}_{H}=\mathrm{\Psi}.$ Furthermore, if ${H}_{i}={\partial}_{i}$ , then since $GT=T$

$${\mathrm{\Phi}}_{H}(u)=\mathrm{\Psi}(u)+GT(\mathrm{\Psi}(u)-u)=T\mathrm{\Psi}(u)+c={\mathrm{\Psi}}^{2}(u)$$ | (6) |

which is equal to two iterations of $\mathrm{\Psi}.$ Because computing $\mathrm{\Phi}$ requires two convolutions: ${H}_{i}$ and $T$ one iteration with ${\mathrm{\Phi}}_{H}$ has the same number of convolution operations as two iterations of $\mathrm{\Psi}.$ This shows that we can learn an ${H}_{i}$ such that our iterator ${\mathrm{\Phi}}_{H}$ is at least as the standard solver $\mathrm{\Psi}.$ In the following theorem, we show that there is a convex open set of ${H}_{i}$ that the learning algorithm can explore.

###### Theorem 1.

For fixed $G\mathrm{,}{\mathrm{\{}{\mathrm{\Theta}}_{i}\mathrm{\}}}_{i\mathrm{=}\mathrm{1}\mathrm{:}N}\mathrm{,}{u}_{\mathrm{0}}\mathrm{,}b\mathrm{,}\delta \mathit{}x\mathrm{,}\delta \mathit{}t$, and $\u03f5$ the spectral norm of ${\mathrm{\Phi}}_{H}\mathit{}\mathrm{(}u\mathrm{;}G\mathrm{,}{\mathrm{\{}{\mathrm{\Theta}}_{i}\mathrm{\}}}_{i\mathrm{=}\mathrm{1}\mathrm{:}N}\mathrm{,}{u}_{\mathrm{0}}\mathrm{,}b\mathrm{,}\delta \mathit{}x\mathrm{,}\delta \mathit{}t\mathrm{,}\u03f5\mathrm{)}$ is a convex function of ${\mathrm{\{}{H}_{i}\mathrm{\}}}_{i\mathrm{=}\mathrm{1}\mathrm{:}N}$ and the set of ${\mathrm{\{}{H}_{i}\mathrm{\}}}_{i\mathrm{=}\mathrm{1}\mathrm{:}N}$ such that the spectral norm of $$ is a convex open set.

###### Proof.

See Appendix B. ∎

On a stark contrast with previous work [5], we have several sets of parameters ${\{{\mathrm{\Theta}}_{i}\}}_{i=1:N},\delta x,\delta t$, and $\u03f5$ attached to the PDEs governing equation. Although we train on a single domain, the model posses a generalization properties, which we show in the following.

###### Proposition 1.

For fixed {${\partial}_{i}\}{}_{i=1:N},G$, and ${\mathrm{\{}{H}_{i}\mathrm{\}}}_{i\mathrm{=}\mathrm{1}\mathrm{:}N}$, and some ${u}_{\mathrm{0}}^{\mathrm{\prime}}\mathrm{,}{b}^{\mathrm{\prime}}\mathrm{,}{\mathrm{\{}{\mathrm{\Theta}}_{i}^{\mathrm{\prime}}\mathrm{\}}}_{i\mathrm{=}\mathrm{1}\mathrm{:}N}\mathrm{,}\delta \mathit{}{x}^{\mathrm{\prime}}\mathrm{,}\delta \mathit{}{t}^{\mathrm{\prime}}$, and ${\u03f5}^{\mathrm{\prime}}$, if ${\mathrm{\Phi}}_{H}\mathit{}\mathrm{(}u\mathrm{)}$ is valid iterator for the PDE problem $\mathrm{(}{\mathrm{\{}{\mathrm{\Theta}}_{i}^{\mathrm{\prime}}\mathrm{,}{\mathrm{\partial}}_{i}\mathrm{\}}}_{i\mathrm{=}\mathrm{1}\mathrm{:}N}\mathrm{,}G\mathrm{,}{u}_{\mathrm{0}}^{\mathrm{\prime}}\mathrm{,}{b}^{\mathrm{\prime}}\mathrm{,}\delta \mathit{}{x}^{\mathrm{\prime}}\mathrm{,}\delta \mathit{}{t}^{\mathrm{\prime}}\mathrm{,}{\u03f5}^{\mathrm{\prime}}\mathrm{)}$, then for all ${u}_{\mathrm{0}}$ and $b\mathrm{,}$ the iterator ${\mathrm{\Phi}}_{H}\mathit{}\mathrm{(}u\mathrm{)}$ is valid iterator for the PDE problem $\mathrm{(}{\mathrm{\{}{\mathrm{\Theta}}_{i}\mathrm{,}{\mathrm{\partial}}_{i}\mathrm{\}}}_{i\mathrm{=}\mathrm{1}\mathrm{:}N}\mathrm{,}G\mathrm{,}{u}_{\mathrm{0}}\mathrm{,}b\mathrm{,}\delta \mathit{}x\mathrm{,}\delta \mathit{}t\mathrm{,}\u03f5\mathrm{)}$ if $$.

###### Proof.

See Appendix B. ∎

## 3 Experiments

We consider a 2-D advection-diffusion equation of the following form

$\frac{\partial u}{\partial t}}={\mathbf{v}}^{\top}\cdot \left[\begin{array}{c}\hfill {\partial}_{x}u\hfill \\ \hfill {\partial}_{y}u\hfill \end{array}\right]+{\mathbf{D}}^{\top}\cdot \left[\begin{array}{c}\hfill {\partial}_{xx}u\hfill \\ \hfill {\partial}_{yy}u\hfill \end{array}\right]\text{; subject to}{u}_{{t}_{0}}={u}_{0}(x,y)$ | (7) |

where $\mathbf{v}={[{v}_{x},{v}_{y}]}^{\top}$ and $\mathbf{D}={[{D}_{xx},{D}_{yy}]}^{\top}$ are advection velocity and diffusivity respectively. We minimize the following loss function

$$\mathcal{L}=\frac{1}{L}\sum _{n=1}^{N}\sum _{l=1}^{L}\parallel {\mathrm{\Phi}}_{H}^{n,k}({u}_{t}^{l})-{u}_{t+n\delta t}^{l}\parallel ;k\sim \mathcal{U}[1,20]$$ |

where $n$ is the number of time-step and $k$ iteration for a single time step is denoted as ${\mathrm{\Phi}}_{H}({\mathrm{\Phi}}_{H}\mathrm{\dots}({\mathrm{\Phi}}_{H}))={\mathrm{\Phi}}_{H}^{k}$.

Data Generation: We consider a rectangular domain of $\mathrm{\Omega}=[0,2\pi ]\times [0,2\pi ]$. Elements of $\mathbf{v}$ and $\mathbf{D}$ are drawn from a uniform distribution of $\mathcal{U}[-2.0,2.0]$ and $\mathcal{U}[0.2,0.8]$ respectively. The computational domain is discretized into 64 x 64 regular mesh. We assume zero Dirichlet boundary condition and the initial value is generated according to [8] as ${u}_{0}=\lambda cos(kx+ly)+\gamma sin(kx+ly)$ where $\gamma $ and $\lambda $ are drawn from a normal distribution of $\mathcal{N}(0,0.02)$, and, $k$ and $l$ are values drawn in random from a uniform distribution of $\mathcal{U}[1,9]$. We generate 200 simulations each having 50 time steps, using FEniCS [7][1] for $\delta t=0.2$. Further, we take the train, test, and validation split of the simulated time series as $80\%:10\%:10\%$. A time series of a test data is shown in Fig 1.

Implementation Details: Following [5], we use a three-layer convolutional neural network to model each of the ${H}_{i}$. We use zero-padding to enforce zero Dirichlet condition at the boundary and use a kernel size of 3x3. During training, we fixed the following parameters as follows $\delta x=0.098,\delta t=0.2,\u03f5=0.9$. We use PyTorch framework and train our network with Adam Optimizer [6].

### 3.1 Results and Discussion

Generalization for Different Parameters: We investigate the effect of different parameter settings than those used during, to validate the generalizability of the neural scheme. To study the effect of different $\u03f5$ we use the original test set. We generate two additional test cases; varying one parameter at a time : a) $\delta t=0.12$, and b) $\delta x=0.049$. The elements of $\mathbf{v}$ and $\mathbf{D}$ are drawn from the same distribution as before. The average error propagation over ten time step between semi-implicit finite difference method and the proposed neural implicit solver is being compared in Figure 2.

We observe that error from the neural scheme (10 iterations per time step) is less compared to the error from the semi-implicit FDM (25 iterations per time step) scheme for all three different test sets. This affirms our hypothesis that the neural solver is more accurate compared to the semi-implicit FDM and generalizable to other parameter settings at the same time.

Run-time Comparison: We compare the run time for the neural solver (10 iterations per time step) and semi-implicit scheme (25 iterations per time step), for one time-steps. The experiments are conducted on an Intel Xeon W-2123 CPU @ 3.60GHz, with code running on one of the four cores. We report that the trained neural solver takes circa 0.0148s compared to 0.0141s for the semi-implicit scheme, whereas the FEniCS solution takes 3.19s for machine precision convergence.

## 4 Conclusion

This abstract introduces a novel implicit neural scheme to solve linear time-dependent PDEs. We leverage an existing semi-implicit update rule to design a learnable iterator which provides theoretical guarantees. The learned iterator achieves faster convergence compared to the existing semi-implicit solver and produces a more accurate solution for a fixed computation budget. More importantly, we empirically demonstrate that training on a single parameter setting is enough to generalize over other parameter settings which confirms our theoretical results. The learned neural solver can be a faster alternative for simulation of various physical processes.

#### Acknowledgments

Suprosanna Shit and Ivan Ezhov are supported by the Translational Brain Imaging Training Network (TRABIT) under the European Union’s ‘Horizon 2020’ research & innovation programme (Grant agreement ID: 765148).

## References

- [1] (2015) The FEniCS project version 1.5. Archive of Numerical Software 3 (100). Cited by: §3.
- [2] (2012) Image restoration: total variation, wavelet frames, and beyond. Journal of the American Mathematical Society 25 (4), pp. 1033–1089. Cited by: Appendix C.
- [3] (2019) Neural parameters estimation for brain tumor growth modeling. In Proceedings of the International Conference on Medical Image Computing and Computer Assisted Intervention, Cited by: §1.
- [4] (1991) Topics in matrix analysis. Cambridge University Press. Cited by: Appendix A, Appendix A.
- [5] (2019) Learning neural PDE solvers with convergence guarantees. In Proceedings of the International Conference on Learning Representations, Cited by: Appendix B, §1, §1, §2.2, §3.
- [6] (2014) Adam: a method for stochastic optimization. In Proceedings of the International Conference on Learning Representations, Cited by: §3.
- [7] (2012) Automated solution of differential equations by the finite element method: the FEniCS book. Vol. 84, Springer Science & Business Media. Cited by: §3.
- [8] (2018) PDE-net: learning PDEs from data. In Proceedings of the 35th International Conference on Machine Learning, Vol. 80, pp. 3208–3216. Cited by: §1, §3.
- [9] (2018) Neural networks trained to solve differential equations learn general representations. In Advances in Neural Information Processing Systems, pp. 4071–4081. Cited by: §1.
- [10] (2019) Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics 378, pp. 686 – 707. Cited by: §1.
- [11] (2017) Accelerating Eulerian fluid simulation with convolutional networks. In Proceedings of the 34th International Conference on Machine Learning, Vol. 70, pp. 3424–3433. Cited by: §1.

## Appendix A Convergence Criteria for Semi-implicit Update

The spectral radius of $T$ can be expresses as following

$\rho (T)$ | $\le \parallel T\parallel $ | $=\parallel G{\displaystyle \sum _{i=1}^{N}}{\mathrm{\Lambda}}_{i}({\partial}_{i}-{d}_{i}I)\parallel $ | ||

$\le \parallel G\parallel {\displaystyle \sum _{i=1}^{N}}\parallel {\mathrm{\Lambda}}_{i}\parallel \parallel ({\partial}_{i}-{d}_{i}I)\parallel ;[\text{Invoking norm inequalities}\text{[4]}]$ | ||||

$={\displaystyle \sum _{i=1}^{N}}\parallel {\mathrm{\Lambda}}_{i}\parallel \parallel ({\partial}_{i}-{d}_{i}I)\parallel ;[\parallel G\parallel =1]$ |

Given $$, we have $$.

## Appendix B Proofs

See 1

###### Proof.

The spectral norm $\parallel \cdot \parallel $ is convex from the sub-additive property, and ${T}^{\prime}$ is linear in ${\{{H}_{i}\}}_{i=1:N}$. To prove that it is open, observe that $\parallel \cdot \parallel $ is a continuous function, so $(T+G{\sum}_{i=1}^{N}{\mathrm{\Lambda}}_{i}{H}_{i}(T-I))$ is continuous in ${\{{H}_{i}\}}_{i=1:N}$. Given $\rho ({T}^{\prime})$, the set of ${\{{H}_{i}\}}_{i=1:N}$ is the preimage under this continuous function of $(0,1-\zeta )$ for some $\zeta >0$, and the inverse image of open set $(0,1-\zeta )$ must be open. ∎

###### Lemma 2.

The upper bound of the spectral norm of ${\mathrm{\Phi}}_{H}$ is independent of ${\mathrm{\{}{\mathrm{\Theta}}_{i}\mathrm{\}}}_{i\mathrm{=}\mathrm{1}\mathrm{:}N}\mathrm{,}\delta \mathit{}x\mathrm{,}\delta \mathit{}t$, and $\u03f5$ Given $$.

###### Proof.

Considering the spectral norm of ${T}^{\prime}$ and invoking product and triangular inequality of norm, we obtain the following tight bound

$\parallel {T}^{\prime}\parallel $ | $=\parallel G{\displaystyle \sum _{i=1}^{N}}{\mathrm{\Lambda}}_{i}({\partial}_{i}-{d}_{i}I-{H}_{i})+G{\displaystyle \sum _{i=1}^{N}}({\mathrm{\Lambda}}_{i}{H}_{i})T\parallel $ | ||

$\le \parallel G\parallel {\displaystyle \sum _{i=1}^{N}}\parallel {\mathrm{\Lambda}}_{i}\parallel \parallel {\partial}_{i}-{d}_{i}I-{H}_{i}\parallel +\parallel G\parallel {\displaystyle \sum _{i=1}^{N}}\parallel {\mathrm{\Lambda}}_{i}\parallel \parallel {H}_{i}\parallel \parallel T\parallel $ | |||

$$ |

Given $$, we have

$$ |

∎

See 1

###### Proof.

From Theorem 1 of [5] and Lemma 1, our iterator is valid if and only if $$. From Lemma 2 the upper bound of spectral norm of iterator only depends on {${\partial}_{i}\}{}_{i=1:N}$ and ${\{{H}_{i}\}}_{i=1:N}$ given $$. Nonetheless, for any matrix spectral radius is upper bounded by its spectral norm. Thus, if the iterator is valid for some ${u}_{0}^{\prime},{b}^{\prime},{\{{\mathrm{\Theta}}_{i}^{\prime}\}}_{i=1:N},\delta {x}^{\prime},\delta {t}^{\prime}$, and ${\u03f5}^{\prime}$, then it is valid for any feasible choice of ${u}_{0},b,{\{{\mathrm{\Theta}}_{i}\}}_{i=1:N},\delta x,\delta t$, and $\u03f5$ satisfying the constraints. ∎

## Appendix C Geometric Interpretation

Surprisingly, we find that the form of $\parallel {T}^{\prime}\parallel $ has a special structure. As the denominator is constant the objective is to minimize $\parallel {\partial}_{i}-{d}_{i}I-{H}_{i}\parallel +\parallel {H}_{i}\parallel $ w.r.t. $\parallel {H}_{i}\parallel ;\forall i=1:N$. Invoking triangular inequality of norm we have the lower-bound

$$\parallel {\partial}_{i}-{d}_{i}I-{H}_{i}\parallel +\parallel {H}_{i}\parallel \ge \parallel {\partial}_{i}-{d}_{i}I\parallel $$ |

Taking square of both side, we have

$$\text{Find}{H}_{i}\text{such that}{({\partial}_{i}-{d}_{i}I)}^{\top}{H}_{i}\le \parallel {H}_{i}\parallel $$ |

When the equality holds the local optima is the surface of the hyper-sphere with center at $\frac{1}{2}({\partial}_{i}-{d}_{i}I)$ with radius $\frac{1}{2}\parallel {\partial}_{i}-{d}_{i}I\parallel $. We interpret that each ${H}_{i}$ explores an optimal discretization scheme near the manifold of ${\partial}_{i}$ by learning the sum of order rules as described in [2].