
Citation: Siddhartha Mishra. A machine learning framework for data driven acceleration of computations of differential equations[J]. Mathematics in Engineering, 2019, 1(1): 118-146. doi: 10.3934/Mine.2018.1.118
[1] | Giorgia Franchini, Valeria Ruggiero, Federica Porta, Luca Zanni . Neural architecture search via standard machine learning methodologies. Mathematics in Engineering, 2023, 5(1): 1-21. doi: 10.3934/mine.2023012 |
[2] | Tommaso Tassi, Alberto Zingaro, Luca Dede' . A machine learning approach to enhance the SUPG stabilization method for advection-dominated differential problems. Mathematics in Engineering, 2023, 5(2): 1-26. doi: 10.3934/mine.2023032 |
[3] | Zeljko Kereta, Valeriya Naumova . On an unsupervised method for parameter selection for the elastic net. Mathematics in Engineering, 2022, 4(6): 1-36. doi: 10.3934/mine.2022053 |
[4] | Carlo Ciliberto, Massimiliano Pontil, Dimitrios Stamos . Reexamining low rank matrix factorization for trace norm regularization. Mathematics in Engineering, 2023, 5(3): 1-22. doi: 10.3934/mine.2023053 |
[5] | Giulio Ortali, Nicola Demo, Gianluigi Rozza . A Gaussian Process Regression approach within a data-driven POD framework for engineering problems in fluid dynamics. Mathematics in Engineering, 2022, 4(3): 1-16. doi: 10.3934/mine.2022021 |
[6] | Jinghong Li, Hongyu Liu, Wing-Yan Tsui, Xianchao Wang . An inverse scattering approach for geometric body generation: a machine learning perspective. Mathematics in Engineering, 2019, 1(4): 800-823. doi: 10.3934/mine.2019.4.800 |
[7] | Stefania Fresca, Federico Fatone, Andrea Manzoni . Long-time prediction of nonlinear parametrized dynamical systems by deep learning-based reduced order models. Mathematics in Engineering, 2023, 5(6): 1-36. doi: 10.3934/mine.2023096 |
[8] | Diogo Gomes, Julian Gutierrez, Ricardo Ribeiro . A mean field game price model with noise. Mathematics in Engineering, 2021, 3(4): 1-14. doi: 10.3934/mine.2021028 |
[9] | Wenjing Liao, Mauro Maggioni, Stefano Vigogna . Multiscale regression on unknown manifolds. Mathematics in Engineering, 2022, 4(4): 1-25. doi: 10.3934/mine.2022028 |
[10] | Evan Patterson, Andrew Baas, Timothy Hosgood, James Fairbanks . A diagrammatic view of differential equations in physics. Mathematics in Engineering, 2023, 5(2): 1-59. doi: 10.3934/mine.2023036 |
Differential equations, both ordinary and partial, are ubiquitous in science and engineering. It is not possible to obtain explicit solution formulas for differential equations, except in the simplest cases. Hence, numerical approximations of differential equations constitutes a key tool in their study. A wide variety of numerical methods have been developed to approximate differential equations robustly and efficiently. For the initial value problem for ordinary differential equations, popular methods include Runge-Kutta and multi-step methods, [12,16] and references therein. Widely used numerical methods for approximating PDEs include finite difference [16], finite volume [10], finite element [5] and spectral [27] methods.
The exponential increase in computational power in the last decades provides the opportunity for solving very challenging, large scale computational problems for differential equations, such as uncertainty quantification (UQ) [3,9], (Bayesian) inverse problems [24] and (real time) optimal control, design and constrained optimization [4,28]. One requires very large number of fast approximations of ODE and PDEs to solve such problems, for instance when evaluating Monte Carlo samples in an UQ or Bayesian inverse problem framework. Currently available numerical methods, particularly for nonlinear PDEs, tend to be too slow to allow such realistic computations.
Machine learning, in the form of artificial neural networks (ANNs), has become extremely popular in computer science in recent years. This term is applied to a plethora of methods that aim to approximate functions with layers of units (neurons), connected by linear operations between units and nonlinear activations within units, [11] and references therein. Deep learning, i.e an artificial neural network with a large number of intermediate (hidden) layers has proven extremely successful at diverse tasks, for instance in image processing, signal processing and natural language processing [15]. A key element in deep learning is the training of tuning parameters in the underlying neural network by (approximately) minimizing suitable loss functions. The resulting (non-convex) optimization problem, on a very high dimensional parameter space, can be efficiently solved with variants of the stochastic gradient descent method [14,22].
Machine learning methods, particularly deep learning, are being increasingly used in the context of numerical computation of differential equations. As this is a rapidly evolving field, we will only attempt a skeletal literature survey here. One class of methods attempt to replace numerical schemes for differential equations by deep networks, see [2,7,20,18] and references therein. These methods have been successfully used in different contexts, for instance in approximating very high dimensional problems arising in mathematical finance [2], by exploiting integral representation formulas for the underlying solutions. However, it is as yet unclear if an end to end deep neural network can learn the physics of the underlying PDE in the absence of such formulas. This could constitute a stumbling block in approximating solutions of complicated nonlinear PDEs with deep learning.
Another school of thought aims to augment existing numerical methods by embedding deep learning modules within them. As examples, one can think of solving the pressure Poisson equation within an incompressible flow solver by a convolutional neural network as in [26] or learning troubled cell indicators in a RKDG code by a deep network as in [21].
In this paper, we propose a variant of the machine learning framework for approximating (time-dependent) differential equations. Our starting point is the observation that evaluating approximate solutions of ODEs and PDEs on coarse space-time grids is very cheap computationally. However, the accuracy of such coarse grid representations is rather poor. Consequently, we will use a machine learning framework to train explicit or implicit parameters in (generalizations of) standard numerical methods in order to minimize a loss (error) function that measures the difference of the (trained) solution on coarse grids with projections of fine grid solutions. The resulting scheme will hopefully be significantly more accurate than the underlying standard method on the coarse grid, while being as computationally cheap. We motivate our general strategy with the following simple example.
We consider the following autonomous ODE,
u′(t)=F(u(t)),ct∈(0,T),u(0)=u0. | (1.1) |
Here, u:[0,T]→Rd is the vector of unknowns and F∈Liploc(Rd) is the right hand side.
A very popular class of numerical methods to solve (1.1), particularly for stiff right hand sides, are the so-called implicit multi-step methods or backward difference formulas (BDFs) [12]. Assuming a uniform time step Δt , denoting the time level by tn=nΔt and the approximate solution by Un≈u(tn) , a general form of a three-point BDF is given by,
(1+gn+2)Un+2−(1+2gn+2)Un+1+gn+2Un=ΔtF(Un+2), | (1.2) |
for n≥0 and gn+2∈R for each n . We need initial values U0,U1 to march in time in (1.2). It is straightforward to check using Taylor expansions that for any gn+2∈R , the method (1.2) is consistent with (1.1) and at least first-order accurate. By setting gn+2=0, ∀n , we recover the standard backward Euler method while gn+2=12 for all n yields the second-order accurate BDF2 method. It is customary to assume that a second-order accurate method is preferable. Hence, one always sets gn+2≡0.5 .
However, this point of view does not take the data of the specific problem that we are solving into account, namely the non-linearity F , the dimension d , the initial data u0 and the grid size Δt . It is not a priori obvious if the choice of gn+2=0.5 will provide the best solution (the least error) for a given data set. In fact, this is not true in general. As an example, we consider the simplest linear scalar ODE by setting d=1 and F(u)=−cu , for some c∈R+ , in (1.1). In this case, the explicit solution is given by
u(t)=u0e−ct. | (1.3) |
Now setting U0=u0 , U1=e−cΔtu0 , the solution computed at the second time level 2Δt , by the generalized form of the three point BDF scheme (1.2) is
U2=1+2g21+g2+cΔtU1−g21+g2+cΔtU0. | (1.4) |
Hence, the local error at the second time level is
E2(g2)=|U2−e−2cΔtu0|2, | (1.5) |
It is straightforward to observe that the minimizer,
g∗2=arg ming2∈RE2(g2), |
depends explicitly on the parameters Δt and c and is not universally g∗=0.5 . In Figure 1, we plot the function E2(g2) for Δt=0.5 and two different values of c , namely c=1 and c=5 , respectively. We see from this figure that the loss (error) function is convex in the parameter g2 and the error is vastly reduced in a minimization process, i.e, by a factor of 39.97 for c=1 and a factor of 677.95 for c=5 , respectively. Hence, by focusing on a specific data set, we can potentially obtain speed ups of two to three orders of magnitude for this simple ODE.
Our objective is to generalize the strategy presented in the above motivating example. We will recast (generalizations of) standard numerical methods for time-dependent ODEs and PDEs as multi-layer artificial neural networks with a set of trainable parameters. The resulting network will always be designed to be consistent with the underlying differential equation by constraining the parameter set. During an offline training phase, we will train these parameters by (approximately) minimizing a loss function, over the parameter states, by a suitable (stochastic) gradient descent method. The efficiency of the resulting trained scheme is verified on a test set. Thus, our methods will be (rather restricted) types of (deep) neural networks for approximating time-dependent differential equations.
We organize the rest of the paper as follows: in section 2, we present our abstract machine learning framework. In section 3, we apply the proposed algorithm to ordinary differential equations. The machine learning algorithm is applied to the heat equation, linear transport equation, scalar conservation laws and the Euler equations of gas dynamics in sections 4, 5, 6 and 7. We summarize the contents of the paper and provide a perspective in section 8.
For definiteness, we consider a one-dimensional nonlinear time-dependent PDE of the form,
ut=L(u,ux,uxx),(x,t)∈[Xl,Xr]×[0,T],u(0,x)=u0(x,ω),Lbu(Xl,t)=ul(t,γl),Lbu(Xr,t)=ur(t,γr). | (2.1) |
Here, u:[Xl,Xr]×[0,T]→Rm is the vector of unknowns. L is a possibly non-linear differential operator involving both the first and second spatial derivatives of u (interpreted as vectors) that will be specified in subsequent examples but is kept deliberately ambiguous here for the sake of generality. Lb refers to a boundary operator and the initial and boundary data depend on parameters ω,γl,r∈RD for possibly D>>1 .
For simplicity, we discretize [Xl,Xr] on a uniform grid with mesh size Δx and denote the discrete points as xj=Xl+jDx , for 0≤j≤J+1 . Similarly, we choose a uniform time step Δt and denote the n -th time level as tn=nΔt , with T=NΔt , and denote the approximate solution (by a finite difference scheme as) as Unj≈u(xj,tn) . Moreover, one can readily consider a finite volume or finite element discretization by letting Unj approximate cell averages or nodal point values, respectively. We denote the vector, Un={Unj}1≤j≤N .
On this grid, we discretize the abstract PDE (2.1) with the following numerical method,
Un+1=Ln(Δt,Δx,Un,AnUn,F(Un)),R(Un)). | (2.2) |
Several remarks are in order about the form of the abstract scheme (2.2). First, An is a linear operator that subsumes potentially multiple applications of matrices. Second, the function F denotes a composite of nonlinearities that occur within the non-linear operator L in (2.1). Furthermore, we write (2.2) as a one-step explicit scheme. However, we implicitly assume that even if the underlying time discretization was (semi-)implicit, the resulting system of nonlinear equations is solved by an iterative procedure, such as a Newton method, and the results of the Newton steps are subsumed in the general form of the right hand side term Ln in (2.2). In particular, such iterations might involve multiple applications of the non-linearity F in (2.2) and could involve the function R in (2.2) that might include (multiple compositions of) the standard ReLU function,
σ(w)=max(w,0). | (2.3) |
The whole method can be represented as a neural network as shown in Figure 2, but with non-linearities F , based on the underlying PDE instead of scalar activation functions as in a traditional deep network [11]. However, given the universal approximation property of standard artificial neural networks [1,13], one can approximate the underlying nonlinearities F in (2.2) by artificial neural networks. In particular, networks with a few hidden layers can be trained to approximate smooth nonlinear functions [30]. Hence, one can think of the module F in (2.4), Figure 2 as an additional neural network, realizing the whole scheme (2.2) as a neural network in the sense of [11]. However, for the sake of consistency and computational efficiency, we perform a direct evaluation of the nonlinearity F in (2.2) in this paper. A very concrete realization of the neural network representation for numerical schemes is provided in section 6 for a Rusanov type scheme approximating scalar conservation laws, see Figure 7.
We constrain the numerical method (or alternatively the artificial neural network) (2.2) to be consistent with the PDE (2.1) by imposing constraints on the linear operator An (and the structure of the neural network approximating F ). Further stability conditions on the scheme can also be imposed. Consequently, we can rewrite the numerical method (2.2) in the following parametric form,
Un+1=Ln(Δt,Δx,Un,θn), | (2.4) |
in terms of a parameter vector θn∈Rd . We impose the constraints on the scheme such that by choosing θn=¯θn , leads to the recovery of standard numerical methods, for instance the choice of g=0.5 in the scheme (1.2) yields the standard second-order three-point BDF2 scheme for discretizing the ODE (1.1).
For generating the training set, we select a certain subset of the parameter space {ωi,γ1,i,γ2,i}1≤i≤M where M>>1 is the size of the training set. Let Δtf<<Δt and Δxf<<Δx be the time step and mesh size for a (very) fine grid (uniform) discretization of [0,T]×[Xl,Xr] . For training data {u0(ωi),ul(γ1,i),ur(γ2,i)}i , we approximate (2.1) with the scheme (2.4), with a particular choice of θn=¯θn , on the fine grid. The resulting solution, denoted as Un,iref , is obtained by projecting the fine grid solution to the coarse grid, either with cell averaging or point wise sampling. This training data is generated offline and will be rather computationally expensive as a fine grid solution needs to be computed.
Next we set up a training loss function by defining the error,
E(θ):=1p∑iN∑n=1‖Un,i(θn)−Un,iref‖pLp. | (2.5) |
Here, 1≤p<∞ and we usually consider p=1 or p=2 and denote θ={θn}1≤n≤N as the combined vector of trainable parameters.
The objective of the training process is to minimize the loss function (2.5) i.e, find a minimizer θ∗ :
θ∗=arg minθ∈RNdE(θ). | (2.6) |
The resulting trained time-marching scheme is
Un+1=Ln(Δt,Δx,Un,θn,∗). | (2.7) |
We summarize the resulting algorithm for our machine learning framework below:
Algorithm 2.1. Given a specific model i.e, underlying ODE or PDE, for instance (2.1) on a specific space-time domain, we
Step 1: Choose a consistent (and stable) numerical method (alternatively neural network) for approximating the underlying differential equation, for instance the general form (2.2) or (2.4). This numerical method will approximate solutions of (2.1) on a coarse grid.
Step 2: Generate the training set by choosing a specific (finite but possibly large) parametric set of initial (and boundary conditions) and approximating the underlying PDE with a standard numerical method, for instance (2.4) but with the parameter set ¯θ , on a (very) fine grid. Then, project the fine grid solutions on the coarse grid to create the training set.
Step 3: Set up the loss function (2.5) and use a gradient descent method to (approximately) find a local minimum of this possibly non-convex function over the parameter space. The gradient descent method can be initialized with the parameter values ¯θ , corresponding to a standard numerical method.
Step 4: The minimizers θ∗ serve as the parameters in the trained scheme (2.7). The trained scheme is run on a test set in the online phase, to ascertain gains in computational efficiency.
We remark that the algorithm 2.1 is guaranteed to reduce the error, on the underlying coarse grid, over a standard scheme (i.e (2.4) with parameter set ¯θ ), on the training set. Moreover, the trained scheme (2.7) will always be a consistent (and stable) discretization of the underlying model. It is difficult to obtain theoretical guarantees on the amplitude of the gain in computational efficiency for our machine learning framework, on the test set. This gain depends on the underlying model, numerical method, grid size, choice of training set and on the efficacy of the gradient descent in finding a minimum. We will test this machine learning algorithm on a variety of problems in the following sections in order to empirically demonstrate its efficiency.
In this section, we will test algorithm 2.1 on the following two ordinary differential equations,
We start with the following second-order linear ODE modeling oscillators,
u′′(t)+c2u(t)=0,u(0)=u0. | (3.1) |
We can readily write (3.1) as a first-order system by introducing the auxiliary variable v=u′ , resulting in
u′=−cv,v′=cu. | (3.2) |
It is easy to see that (3.1) (equivalently (3.2)) has an explicit solution given by,
u(t)=u0cos(ct),v(t)=u0sin(ct). | (3.3) |
For the sake of simplicity, we choose the generalized three-point backward difference formula (1.2), with U=[u,v] and F(U)=[−cv,cu] , as the underlying numerical scheme i.e step 1 of Algorithm 2.1. We choose a uniform grid in time with time step Δt and initialize the scheme (1.2) with U0=[u0,u0] and U1=[u0cos(cΔt),u0sin(cΔt)] .
Our objective is to approximate the solution of (3.2) at the next two time levels, i.e determine U2 and U3 by (1.2) with Δt=13 . As the constant c determines the frequency of oscillations in (3.1), a grid with a time step of Δt=13 is extremely coarse for large values of c , as several oscillations occur within a single time step. The task at hand is to determine whether the machine learning algorithm 2.1 can (significantly) improve the accuracy of the scheme (1.2) on such a coarse grid.
For step 2 of Algorithm 2.1, we generate the training set by randomly selecting I data points on the interval [0,1] and denoting them as {ui0} with 1≤i≤I=10 . Corresponding to these data points, we use the exact solutions (3.3) at times t2=23 and t3=1 to generate the reference training data Un,iref for all i and n=2,3 .
In step 3 of Algorithm 2.1, we set up the l2 error,
E2(g2,g3):=12I∑i=1∑n=2,3|Un,i(gn)−Ul,iref|2, | (3.4) |
and minimize the loss function (3.4) over two parameters g2,3∈R×R . Given this simple two-dimensional (in parameters) problem, the minimization is performed by a standard steepest gradient descent initialized at the point (g2,g3)=(0.5,0.5) , corresponding to the second-order BDF2 scheme.
The gradient descent algorithm converges quite quickly (atmost 9 steps) to a local minimum of the non-convex loss function. The minimizers g∗2,3 are shown in Table 1 for three different values of c=1,10 and 100 , and indicate a significant difference between the optimized values and the initial value of (0.5,0.5) (corresponding to the standard second-order BDF2 scheme).
c | g∗2 | g∗3 | Gain |
1 | 0.1 | 1.4 | 4.13 |
10 | −0.64 | −2.04 | 2.11 |
100 | 11 | 0.02 | 13.03 |
We test the trained scheme i.e, (1.2) with parameters g∗2,3 on a test set, chosen by randomly selecting 50 points in the interval [−5,5] as the initial data u0 in (3.1). The mean gains in error i.e the ratio of the error with the standard BDF2 scheme and the error with the trained (data learned) scheme, with respect to the underlying exact solution, are presented in Table 1. The gain in efficiency with the trained scheme is considerable for all the three values of c , rising to at least an order of magnitude for c=100 . In this case, the value of u computed with the trained scheme is remarkably close to the exact solution at times T=2/3 and T=1 . Moreover, the trained scheme even seems to outperform an explicit second-order Runge-Kutta method (see [16]) with a fine grid of Δt=0.001 , as observed in a plot of the approximate solutions on a particular realization of the test set in Figure 3.
We consider a simple but non-linear population (saturation) model for the time evolution of the population density u , described by the ODE,
u′=cu(1−u),u(0)=u0≥0. | (3.5) |
It is straightforward to check that the exact solution of (3.5) is given by,
u(t)=u0u0+(1−u0)e−ct. | (3.6) |
Hence, any non-negative initial condition converges to a saturation value (stable equilibrium) at u=1 , at a time scale dictated by the constant c . We are interested in computing the solution in the time interval [0,1] .
For step 1 of Algorithm 2.1, we again choose the generalized three-point BDF scheme (1.2) and a coarse grid with time step Δt=0.5 . To generate the training set, we choose initial data {ui0}1≤i≤I with I=10 , uniformly over the interval [0,2] and set Ui1 as the exact solution (3.6) at time t=Δt . On this training set, we compute the approximate solution U2 of (3.5) by the generalized three point BDF method (with a single trainable parameter g2 ) at time T=2Δt=1 . The exact solution U2,iref is calculated by (3.6) at time T=1 with the initial data ui0 to define the loss function,
E1(g2):=I∑i=1|U2,i(g2)−U2,iref|1. | (3.7) |
Compared to the previous example of a linear ODE, we choose the l1 norm as the loss function in this nonlinear example.
The loss function for two different values of c=1 and c=5 is shown in Figure 4. In all cases that we tested, the loss function is convex and is readily minimized by a straightforward steepest descent algorithm. The resulting optimal parameter g∗2 for three different values of c is shown in Table 2. As seen in table 2 and Figure 4, the optimal value g∗ is very different from g=0.5 .
c | g∗2 | Gain |
0.2 | 0.41 | 1.5 |
1 | 0.16 | 3.02 |
5 | 0.03 | 10.54 |
The test set is constructed by randomly choosing 50 points from the interval [0,5] as the initial data and approximating (3.5) with the trained scheme, i.e (1.2) with parameter g∗2 . The corresponding error with respect to the exact solution is calculated and the gain, defined as before, is shown in Table 2. We observe a consistent gain in computational efficiency with the trained scheme that is approximately one order of magnitude for c=5 . Thus, the machine learning algorithm 2.1 performs well in this nonlinear example and the gains in efficiency are similar to the linear problem even though a different loss function was used.
As the first example for PDEs, we consider the heat equation in one space dimension,
ut=cuxx,(x,t)∈(0,1)×(0,T),u(x,0)=u0(x),x∈(0,1),u(0,t)=u(1,t)=0,t∈(0,T). | (4.1) |
Here, u is the temperature and 0<c∈R is a diffusion coefficient.
We discretize the interval [0,1] uniformly with a grid size Δx and label the resulting points as xj=jΔx for 0≤j≤J+1 , with Δx=1J+1 . The time interval [0,T] is discretized uniformly with a time step Δt and the time points are labeled as tn=nΔt , with 0≤n≤N and Δt=T/N . We approximate the heat equation (4.1) by evolving Unj≈u(xj,tn) , with the following generalized (or weighted) five-point finite difference scheme,
Un+1j−UnjΔt=c(1−gn)Δx2(bn−2Un+1j−2+bn−1Un+1j−1+bn0Un+1j+bn1Un+1j+1+bn2Un+1j+2)+cgnΔx2(bn−2Unj−2+bn−1Unj−1+bn0Unj+bn1Unj+1+bn2Unj+2),∀n,2≤j≤J−1. | (4.2) |
The update formulas for the points Un1 and UnJ is computed by setting the Dirichlet boundary conditions Un−1,0,J+1,J+2≡0 , for all n .
By using Taylor expansions, one can readily prove the following lemma,
Lemma 4.1. For all n and any gn∈R , the finite difference scheme (4.2) is a consistent and first-order accurate discretization of the one-dimensional heat equation (4.1) if and only if the coefficients bnk , for −2≤k≤2 , satisfy the following algebraic conditions,
bn2+bn1+bn0+bn−1+bn−2=0,2bn2+bn1−bn−1−2bn−2=0,2bn2+bn12+bn−12+2bn−2=0. | (4.3) |
Here, consistency and accuracy are defined in terms of the local truncation error [16]. As (4.3) has three equations containing five unknowns, we can eliminate three of them in terms of bn−2,−1 to obtain,
bn0=1−3bn−1−6bn−2,bn1=3bn−1+8bn−2−2,bn2=1−bn−1−3bn−2. | (4.4) |
Hence, per time level, the scheme (4.2) contains three undetermined parameters gn,bn−1 and bn−2 . These parameters will be determined by the training process, i.e, Step 2 of Algorithm 2.1.
Remark 4.2. Lemma 4.1 provides sufficient conditions for consistency of the finite difference scheme (4.2). Moreover, this scheme is conservative i.e, ∑jUn+1j=∑jUnj . We can also obtain stability, for instance energy (L2 ) stability or discrete maximum principles. These require additional constraints on the parameters and may constrain the training process further. We do not consider this aspect in the following.
Although the form (4.2) of a two time-level, five point finite difference scheme is non-standard, it embeds several well-known finite difference approximations, namely
Scheme S1: Backward Euler in time and second-order accurate in space by setting gn=0,bn−2=0,bn−1=1 for all n .
Scheme S2: Crank-Nicolson in time and second-order accurate in space by setting gn=0.5,bn−2=0,bn−1=1 for all n .
Scheme S3: Backward Euler in time and fourth-order accurate in space by setting gn=0,bn−2=−112,bn−1=43 for all n .
Scheme S4: Crank-Nicolson in time and fourth-order accurate in space by setting gn=0.5,bn−2=−112,bn−1=43 for all n .
One can also set gn=1 to recover an explicit forward Euler time discretization. However, we focus on implicit time stepping methods in order to avoid the constraint of the severe CFL restriction for explicit time discretizations of the heat equation.
We approximate the solution u of the heat equation with scheme (4.2) at time T=0.05 , for three different values of the diffusion coefficient c , namely c=0.1,1,10 ranging from slow to fast diffusion. All the experiments will be performed on a spatial grid with mesh size Δx=110 i.e, with 10 mesh points. Moreover, the time grid will based on a single very large time step of Δt=0.05 .
We focus on varying the initial datum u0 in (4.1) to generate the training set. However, in contrast to ODEs, the initial datum u0 for a PDE lies in an infinite dimensional function space, for instance u0∈L2((0,1)) . Given the challenge of approximating the resulting data to solution operator, in infinite dimensions, we focus on particular classes of initial datum, defined in terms of (finite dimensional) parameters. Motivated by applications in uncertainty quantification [3] and reduced order modeling [29], we concentrate on the following specific parametric random initial data,
We consider the following L -term Karhunen-Loeve expansion,
u0(x,ω)=L∑l=1λlYl(ω)sin(lπx), | (4.5) |
with L=3 , λl=12l−1 and the random numbers Yl(ω) chosen from a uniform distribution on [0,1] . The training set is chosen by selecting (at random) I draws of the random variables Yil(ω) with 1≤i≤I=20 . We compute a reference solution, for the resulting initial data ui0 , with an explicit forward Euler time stepping and standard second-order spatial finite difference discretization [16] on a very fine grid of 1000 mesh points and a time step, chosen to satisfy the standard CFL requirement for the heat equation. This fine grid solution is projected onto the underlying coarse grid by sampling this solution at points xj and at final time T=Δt=0.05 , 1≤j≤J . We denote this reference solution as Un,ij,ref .
The loss function is defined as the L2 error,
E2(g1,b1−2,b1−1):=Δx2I∑i=1J∑j=1|U1,ij−U1,ij,ref|2. | (4.6) |
The loss function (4.6) is minimized using a simplified version of the stochastic gradient algorithm [22] with a batch size of 4 , initialized with the starting values of g1=0.5 , b1−2=0 and b1−1=1 , corresponding to the overall second-order Crank-Nicolson type scheme S2. We denote the (approximate) minimizers as {g1,∗,b1,∗−2,b1,∗−1} and the trained (data learned) scheme is the finite difference (4.2) with these parameters.
The training (for three different cases of the diffusion coefficient c considered here) resulted in (local) minimizers shown in Table 3. We observe that in most cases, the trained scheme is very different from any standard scheme. A test set is generated by choosing 100 random values of Yl , l=1,2,3 in (4.5). Care is taken to exclude repetition of values from the training set and the corresponding reference solution is computed, analogously to the training set.
c | g1,∗ | b1,∗−2 | b1,∗−1 | Gain-1 | Gain-2 | Gain-3 | Gain-4 |
0.1 | 0.64 | 0 | 1 | 77.31 | 19.58 | 49.6 | 38.97 |
1 | 0.3 | 0 | 1.12 | 3.97 | 3.21 | 3.59 | 3.98 |
10 | 0.04 | 1.2 | 2.43 | 11.91 | 46.97 | 11.49 | 48.17 |
Summarizing these results, we first observe that there is no clear winner among the standard schemes on the test set. For slow to moderate values of the diffusion coefficient, the scheme S2 i.e, Crank-Nicolson in time and second-order in space scores over the other three schemes whereas for a large value of the diffusion coefficient, the scheme S1 and S3 clearly perform the best. On the other hand, there is a large gain with the trained scheme compared to all the standard schemes. The gain of approximately 4 is most modest for the c=1 value of the diffusion coefficient. On the other hand, there is clearly a gain of a factor of at least 10 or 20 for the extreme values of the diffusion coefficient. This gain in accuracy comes at no additional online cost and justifies the efficacy of the proposed machine learning algorithm. This significant gain in performance with the trained scheme, for one particular instance of the test set, is also displayed in Figure 5.
Next, we consider the following discontinuous random initial data,
u0(x,ω)={1+ϵY1(ω),if13+ϵY2(ω)<x<23+ϵY3(ω),0,otherwise, | (4.7) |
Here, ϵ=0.2 and Y1,2,3 are chosen randomly from a uniform distribution on [−1,1] . In other words, the initial data (4.7) represents a step function with two discontinuities where the amplitude of the jump at the discontinuity and the location of both jumps are random. The underlying coarse grid is the same as for the smooth case. The training and test sets are generated in a manner, identical to the smooth case and the loss function (4.6) is minimized similarly.
The training (for three different cases of the diffusion coefficient c considered here) resulted in (local) minimizers shown in Table 4. We report that the training process converged very slowly for the c=10 value in the case of this rough initial data. This is reflected in the values of 20 for both spatial weights as we terminated the iterations in the gradient descent method at this stage. One can provide a heuristic explanation for the values of the minimizers in this case. Recall that c=10 implies a very large amount of diffusion in the solution, such that the solution is almost zero at T=0.05 . The value of g1=0 corresponds to the most diffusive backward Euler method and similarly very high values for b1−2,−1 also imply a large amount of diffusion and drive the approximate solution (computed by (4.2)) to zero.
c | g1,∗ | b1,∗−2 | b1,∗−1 | Gain-1 | Gain-2 | Gain-3 | Gain-4 |
0.1 | 0.24 | −0.26 | 2.15 | 12.7 | 3.16 | 10.56 | 3.78 |
1 | 0.14 | −0.38 | 2.53 | 2.09 | 6.03 | 1.98 | 6.38 |
10 | 0 | 20 | 20 | 46.02 | 228.93 | 44.72 | 231.32 |
The gains with the trained scheme, over the four standard schemes, are shown in Table 4 and indicate a very large gain over the best performing of the standard schemes, amounting to a factor of approximately 50 for the case of c=10 .
The linear advection equation is considered as a prototype for the design and analysis of efficient numerical methods for hyperbolic equations. In one space dimension, it is given by
ut+cux=0,(x,t)∈(0,1)×(0,T),u(x,0)=u0(x),x∈(0,1). | (5.1) |
For definiteness, we assume that 0≤c∈R and supplement (5.1) with periodic boundary conditions.
We discretize the computational domain [0,1]×[0,T] as in section 4.1 and use the following three-point finite difference scheme to approximate the linear advection equation (5.1) by evolving Unj≈u(xj,tn) with
Un+1j−UnjΔt=c(1−gn)Δx(bn−1Un+1j−1+bn0Un+1j+bn1Un+1j+1)+cgnΔx(bn−1Unj−1+bn0Unj+bn1Unj+1),∀n,1≤j≤J. | (5.2) |
The update formulas for the points Un1 and UnJ are computed by using the periodic boundary conditions. By using Taylor expansions, one can readily prove the following lemma,
Lemma 5.1. For all n and any gn∈R , the finite difference scheme (5.2) is a consistent and first-order accurate discretization of the linear advection equation (5.1) if and only if the coefficients bnk , for −1≤k≤1 , satisfy the following algebraic conditions,
bn1+bn0+bn−1=0,bn1−bn−1=1. | (5.3) |
We can eliminate two parameters in the system (5.3) in terms of the undetermined parameter bn−1 to obtain,
bn0=−1−2b−1,bn1=1+bn−1 | (5.4) |
Hence, per time level, the scheme (5.2) contains two undetermined parameters gn and bn−1 . The scheme (5.2) with constraints (5.3) is consistent as well as conservative i.e, ∑jUn+1j=∑jUnj . Additional constraints on the parameters are needed to impose stability conditions such as discrete L2 (energy) stability or a discrete maximum principle.
The generalized form (5.2) embeds several well-known finite difference approximations namely,
Scheme S1: Backward Euler in time and upwind in space by setting gn=0,bn−1=0
Scheme S2: Crank-Nicolson in time and upwind in space by setting gn=0.5,bn−1=0
Scheme S3: Backward Euler in time and central in space by setting gn=0,bn−1=0.5 .
Scheme S4: Crank-Nicolson in time and central accurate in space by setting gn=0.5,bn−1=0.5 .
As for the heat equation, we consider only implicit (in time) schemes as they do not require a restriction on the time step Δt . We approximate the solution u of the linear advection equation with scheme (5.2) at time T=0.5 and consider two different values of the wave speed c , namely c=0.5 and c=2 All the experiments will be performed on a very coarse spatial grid with mesh size Δx=110 i.e, 10 mesh points, and a single, very large, time step of Δt=0.5 .
To generate the training and test sets, we use an identical set up as described for the heat equation in section 4.2.1. In particular, both the training and test sets are generated from the three term Karhunen-Loeve expansion (4.5). We compute a reference solution on a fine mesh of 1000 points, with an explicit forward Euler time stepping and standard upwind finite difference discretization [16]. The time step is chosen to satisfy the standard CFL requirement for the advection equation. This fine grid solution is projected onto the underlying coarse grid by sampling this solution at points xj and at final time T=Δt=0.5 , 1≤j≤J , to generate the reference solutions. The loss function (4.6) is minimized with a simplified version of the stochastic gradient algorithm [22] with a batch size of 4 . We initialize the gradient descent method with the starting values of g1=0.5 , b1−1=0 , corresponding to the Crank-Nicolson in time, upwind in space, scheme S2, and denote the (approximate) minimizers as {g1,∗,b1,∗−1} . The minimizers, for different values of c , are shown in table 5. We observe that for c=0.5 , the computed minimizers deviate greatly from any standard scheme. However, this contrast is much more pronounced in the c=2 case as there was very slow convergence of the stochastic gradient method and it was terminated at g1,∗=−20 , indicating that there is a path along which the loss function (very slowly) approaches a value of zero.
c | g1,∗ | b1,∗−1 | Gain |
0.5 | 0.2 | −2 | 3.72 |
2.0 | −20 | 0 | 96.51 |
In order to compare with standard schemes, we define a Gain as the ratio of the (mean) error on the test set with the best performing of the four schemes S1,S2,S3,S4 (the one with the least mean error) and the trained scheme. For c=0.5 , the scheme S2 is the best performing scheme and for c=2 , the scheme S1 is the best performing scheme. The computed gain is shown in Table 5. For further comparison, we plot a single randomly chosen realization of the test data for both c=0.5 and c=2 in Figure 6.
As shown in Table 5, for the case of c=0.5 , the trained scheme provides a gain of 3.72 over the best performing of the standard schemes (the scheme S2). The gains with respect to the backward Euler time stepping schemes are larger. This is also shown in Figure 6 (left), where we observe that the trained scheme is significantly more accurate than standard schemes.
However, the gains with the trained scheme are enormous in the case of c=2 , amounting to a gain of almost two orders of magnitude vis a vis the best performing of the standard schemes, see Table 5 and Figure 6 (right). A heuristic explanation for this observation goes as follows: recall that the exact solution coincides with the initial data in this case. Thus in the limit of gn→−∞ , we can see from (5.2) that Un+1≈Un and we are very close to the initial data. It appears that the machine learning algorithm learns this fact, when shown training data, and provides this remarkable gain in accuracy in this special case.
The Burgers' equation given by
ut+(u22)x=0,(x,t)∈(0,1)×(0,T), | (6.1) |
is a prototypical example for nonlinear hyperbolic conservation laws,
ut+(f(u))x=0,u(x,0)=u0(x), | (6.2) |
These equations arise in a wide variety of applications and examples include the Euler equations of gas dynamics, the shallow water equations of oceanography and the MHD equations of plasma physics [6]. It is well-known that solutions of (6.2) develop finite time singularities in the form of shock waves, when even the initial data is smooth. Thus, solutions of (6.2) are sought in the sense of distributions and additional entropy conditions are imposed in order to recover uniqueness [6].
There is a large body of literature on numerical methods for hyperbolic conservation laws and popular numerical methods include the conservative finite difference schemes and discontinuous Galerkin finite element methods [10]. However, for the sake of simplicity, we consider the simplest first order finite volume scheme in this section.
We discretize the interval [0,1] uniformly with a grid size Δx and label the resulting points as xj=jΔx for 0≤j≤J+1 , with Δx=1J+1 . Thus, the interval is partitioned into cells (control volumes),
Cj:=(xj−1/2,xj+1/2),xj+1/2=xj+xj+12, ∀j. |
The time interval [0,T] is discretized uniformly with a time step Δt and the time levels are denoted as tn=nΔt . We approximate the cell averages of the solution of (6.2),
Unj≈∫Cju(x,tn)dx, |
by writing the update formula,
Un+1j=Unj−ΔtΔx(Fnj+1/2−Fnj−1/2),Unj=∫Cju0(x)dx | (6.3) |
Here, Fnj+1/2=F(Unj,Unj+1) is a numerical flux, consistent with the flux function f in (6.2). A popular choice is the so-called Local Lax-Friedrichs or Rusanov flux given by,
F(Unj,Unj+1)=12(f(Unj)+f(Unj+1))−12max(|f′(Unj)|,|f′(Unj+1)|)(Unj+1−Unj). | (6.4) |
Thus, the numerical diffusion is weighted by a local wave speed. It is well-known that the resulting scheme (6.3) with flux (6.4) is conservative, consistent and monotone [10]. Consequently, the solutions computed by the scheme converge to an entropy solution of (6.2).
We cast the finite volume scheme (6.3) in our machine learning framework by generalizing the flux (6.4) to
F(Unj,Unj+1)=12(f(Unj)+f(Unj+1))−(wnj+1/2max(|f′(Unj)|,|f′(Unj+1)|)(Unj+1−Unj)). | (6.5) |
Here wnj+1/2∈R, ∀j , are weights corresponding to the scaling of the local wave speed. The resulting scheme is given by,
Un+1j=Unj−Δt2Δx(f(Unj+1)−f(Unj−1))+ΔtΔx(wnj+1/2max(|f′(Unj)|,|f′(Unj+1)|)(Unj+1−Unj)−wnj−1/2max(|f′(Unj)|,|f′(Unj−1)|)(Unj−Unj−1)). | (6.6) |
The properties of this generalized finite volume scheme (6.6) are summarized in the lemma below,
Lemma 6.1. The solutions Unj generated by the scheme (6.6) satisfy the following,
i. For any wnj+1/2∈R , for all j,n , the scheme (6.6) is conservative and consistent. Moreover, it is (formally) first-order accurate.
ii. Under the CFL condition,
maxj|f′(Unj)|ΔtΔx≤1, | (6.7) |
and under the condition wnj+12≥0.5 , for all j,n , the scheme (6.6) is monotone. Hence, the corresponding approximations converge to the entropy solution of the scalar conservation law (6.2).
The proof of the above lemma is a straightforward adaptation of the results of [10].
Remark 6.2. We remark that setting wnj+1/2≡0.5 for all n,j , yields the standard Rusanov scheme. The weights w in (6.6) serve to modulate the numerical diffusion in the scheme. We can design an alternative scheme by replacing the arithmetic averages of the fluxes in (6.5) by an entropy conservative flux and replacing the conservative variable u in the numerical diffusion term of the flux (6.5) by the entropy variables [25,8]. The resulting scheme can be proved to be entropy stable for any wnj+1/2≥0 .
Motivated by machine learning architectures such as convolution neural networks [11]. we make a further simplification by pooling values of the weights wn in longer windows i.e by requiring that
wnj+1/2=wnjs+1/2,∀|j−js|≤m. | (6.8) |
Here 0≤m≤J is the window length and js⊆{1,…,J} is a subset of grid points on which we center the pooling windows.
The generalized Rusanov scheme (6.6) can be represented as an artificial neural network as shown in Figure 7, where we focus on the specific example of the Burgers' equation (6.1). Note that in Figure 7, the input neurons or units are components of the vector of unknowns Un={Unj} . The output at the end of each time step is the vector Un+1 . The network transforms the input into the output through layers of operations that consist of linear maps (connecting different neurons) and nonlinear operations. For this specific example, there are the following nonlinear operations,
ABS : refers to ABS(a)=|a|=σ(a)+σ(−a) , with σ being the ReLU activation function (2.3).
MAX : refers to MAX(a,b)=max(a,b)=a+σ(b−a) .
SQ; refers to SQ(a)=0.5a2 .
PROD : refers to PROD(a,b)=ab .
Thus, ABS and MAX are directly expressed in the terms of a traditional neural network, in the sense of [11], with a ReLU activation. On the other hand, SQ and PROD are bespoke nonlinear operations for this particular example. Hence, the neural network in Figure 7 is not a traditional neural network. However, it can be directly represented in terms of the so-called sum product networks [19]. On the other hand, following recent papers such as [30], one can approximate the square and product functions very efficiently in terms of neural networks with ReLU activations, see [23] Figure 1. In particular, one can approximate the square and product maps up to accuracy δ by a neural network of width that is at most logarithmic in δ . Therefore, the neural network underlying the scheme (6.6) is very readily approximated by a traditional ReLU based network architecture. Given several hidden layers per time step (see Figure 7) and multiple time steps, the scheme (6.6) is realized as a deep neural network.
It is standard in deep learning to optimize the weights (corresponding to entries in all matrices) for the whole network during the training process. It is in this step that we differ from traditional machine learning and take a more conservative approach. Given that we wish to be consistent (and formally first-order accurate) for any of the weights that might crop up in the training process, we severely constrain the set of free parameters within the network to only the weights w of the numerical viscosity coefficient in (6.6). Even these weights are pooled in windows. Thus, at most we have two free (trainable) parameters in the sub-network shown in Figure 7. Hence, the training process operates on a (considerably) constricted part of the deep neural network.
For the numerical experiments, we will only consider the Burgers' equation (6.1) with periodic boundary conditions. We fix Δx=0.1 i.e, we discretize the interval [0,1] into 10 cells. Our final time is T=0.1 the time interval is discretized into two time steps with time step size Δt=0.05 . On this coarse spatial grid, such a large time step is consistent with the CFL condition (6.7). Moreover, we pool the weights of the numerical diffusion operator by a defining a pooling window of size m=3 in (6.8). Hence, at each time step we need to specify 3 weights namely {wn1,wn2,wn3} , correspond of the first, middle and last three of the interior interfaces. The weights on the boundary interfaces are determined from the periodic boundary conditions.
To generate the training set, we first consider the smooth random initial data, specified by the Karhunen-Loeve expansion (4.5). As in the previous sections, the expansion is truncated at L=3 and we choose random variables Yi1,2,3 , for 1≤i≤I and I=20 , uniformly from [0,1] . The corresponding initial data, labeled as ui0 is used to initialize the scheme (6.6) and to generate the updated solutions U1,ij,U2,ij for all j . A reference solution is computed using the standard Rusanov scheme (i.e setting wnj+1/2≡0.5 ) on a fine mesh of 1000 points and with the time step determined by the corresponding CFL number as in (6.7). This fine grid solution at the two time levels Δt,2Δt is projected to the underlying coarse grid by averaging over the coarse cells and is denoted as Un,ij,ref for all n,j,i .
The training loss function is the L1 error given by,
E1(w):=ΔxI∑i=1J∑j=12∑n=1|Un,ij−Un,ij,ref|. | (6.9) |
Here, w={wn}1,2,3 for n=1,2 represents the vector of 6 parameters that specifies the scheme (6.6) on this grid. We remark that the L1 error is natural for conservation laws as it the norm under which the data to solution operator is continuous [6].
The loss function is minimized using a simplified version of the stochastic gradient descent algorithm with a batch size of 4 . The stochastic gradient algorithm is applied sequentially i.e, first the minimizers at first time step are determined and then we determine the minimizers at the second time step. This step by step minimization may not yield the optimum in the 6 dimensional parameter space but was found to be reasonably accurate. Moreover, it is computationally cheaper and consistent with the time marching form of numerical methods for evolutionary PDEs. The algorithm was initialized by setting wn1,2,3≡0.5 (corresponding to the standard Rusanov scheme on the coarse grid) and the approximate minimizers, labeled by the vector w∗ are shown in table 6. As seen from the table, the minimizing weights are always below the value of 0.5 (being equal to zero in one case). This implies that the numerical diffusion is reduced during training as the solutions in this case are still identified as mostly smooth.
w1,∗1 | w1,∗2 | w1,∗3 | w2,∗1 | w2,∗2 | w2,∗3 | Gain | Speedup |
0.26 | 0.2 | 0 | 0.3 | 0.3 | 0 | 1.42 | 5.33 |
A test set is generated by selecting at random 100 realizations from the initial datum (4.5) and the gain, defined as the ratio of the mean error of the standard Rusanov scheme (on the coarse grid) to the trained scheme is calculated and shown in Table 6. This gain of 1.42 is rather modest in this case, compared to the previous examples of linear PDEs. However, when we calculate the speed up i.e, the ratio of the computational work (time) required to obtain a similar error as the trained scheme (on the coarse grid), but by the standard Rusanov scheme on a finer mesh. In the case of smooth data for the Burgers' equation, this speed up is given in table 6 and amounts to a factor of 5.33 .
Larger gains are obtained for rough initial data given by the random initial condition (4.7). Here, the amplitude of the initial discontinuity and the locations of both jumps are uncertain. In this case, we generate the training data ui0 , identically to the previous case. The reference solution is computed and consists of a right moving shock and a rarefaction on the left (see Figure 8), for each realization. The loss function (6.9) is minimized and the (approximate) minimizers are shown in Table 7. In this case, some optimal values of the weights are significantly higher than 0.5 , indicating larger diffusion around the right moving shock whereas many weights are well below the value of 0.5 , indicating a modulation of numerical diffusion around smooth regions, identified from the training set. The test set is chosen as before and gain, shown in Table 7 and amounting to a factor of 2.48 , is higher than the smooth case. Moreover, the overall speed up in this case is 9.55 , representing an order of magnitude computational speed up over the standard Rusanov scheme.
w1,∗1 | w1,∗2 | w1,∗3 | w2,∗1 | w2,∗2 | w2,∗3 | Gain | Speedup |
0.24 | 0.24 | 1.25 | 0.24 | 0.26 | 0 | 2.48 | 9.55 |
The solutions computed with the trained scheme and with the standard Rusanov scheme, for one particular realization of the initial data (4.7) are shown in Figure 8. We observe from the figure that the trained scheme significantly outperforms the Rusanov scheme for this realization and provides an accurate approximation, even on this very coarse grid.
The Euler equations of gas dynamics are a prototypical example of a hyperbolic system of conservation laws [6]. In one space dimension, the Euler equations, representing the conservation of mass, momentum and energy are,
ut+(f(u))x=0,(x,t)∈(0,1)×(0,T)u=[ρ,ρv,E],f(u)=[ρv,ρv2+p,(E+p)v],u(x,0)=u0(x). | (7.1) |
Here u:(0,1)×(0,T)→R3 is the vector of unknowns and f:R3→R3 is the flux vector. The density of the gas is denoted by ρ , the velocity by v , the pressure by p and the total energy by E . The equations are closed by specified a thermodynamic relation between the variables, such as the ideal gas equation of state:
E:=pγ−1+12ρv2, | (7.2) |
with gas constant γ . The system (7.1) is hyperbolic with the three eigenvalues,
λ1=u−a, λ2=u,λ3=u+a, | (7.3) |
given in terms of sound speed,
a=√pγρ. | (7.4) |
The solutions to systems of conservation laws, such as the Euler equations (7.1), develop finite time discontinuities such as shock waves and contact discontinuities, even when the initial data is smooth and as in the scalar case, there is a notion of entropy solutions for them.
We discrete the computational domain of [0,1]×[0,T] as in the case of scalar conservation laws (section 6.1), and evolve cell averages of the vector of unknowns i.e, Unj=[ρnj,(ρv)nj,Enj] by the (generalized) finite volume scheme,
Un+1j=Unj−ΔtΔx(Fnj+1/2−Fnj−1/2),Unj=∫Cju0(x)dx | (7.5) |
Here, Fnj+1/2=F(Unj,Unj+1) is a numerical flux vector, consistent with the flux vector f in (7.1). The Local Lax-Friedrichs or Rusanov flux for the Euler equations is given by,
F(Unj,Unj+1)=12(f(Unj)+f(Unj+1))−12max(|vnj|+anj,|vnj+1|+anj+1)(Unj+1−Unj), | (7.6) |
with anj being the sound speed corresponding to the state Unj , computed from (7.4). Thus, the numerical diffusion is scaled with an estimate (upper bound) of the local maximal wave speed given in (7.3). The primitive variables are calculated from the computed conservative variables at the end of each time step. The Rusanov flux is known to very diffusive for systems of conservation laws, particularly around contact discontinuities. However, we choose it here in order to be consistent with our choice in the scalar case and to investigate whether we can train a scheme to (significantly) improve on its numerical performance.
As in the scalar case, we cast the finite volume scheme (7.5) in our machine learning framework by generalizing the Rusanov flux (7.6) to
F(Unj,Unj+1)=12(f(Unj)+f(Unj+1))−(wnj+1/2max(|vnj|+anj,|vnj+1|+anj+1)(Unj+1−Unj)). | (7.7) |
Here wnj+1/2∈R, ∀j , are weights corresponding to the scaling of the local wave speed. The resulting scheme is given by,
Un+1j=Unj−Δt2Δx(f(Unj+1)−f(Unj−1))+ΔtΔx(wnj+1/2max(|vnj|+anj,|vnj+1|+anj+1)(Unj+1−Unj))−ΔtΔx(wnj−1/2max(|vnj|+anj,|vnj−1|+anj−1)(Unj−Unj−1)). | (7.8) |
It is straightforward to verify that the scheme (7.8) is a conservative and consistent discretization of the one-dimensional Euler equations. It is also (formally) first-order accurate. As in the case of the Burgers' equation, we make a further simplification by pooling values of the weights wn in longer windows i.e by requiring (6.8) with 0≤m≤J as the window length and js⊆{1,…,J} as a subset of grid points on which we center the pooling windows.
For our numerical experiments, we consider the one-dimensional Euler equations (7.1) with the ideal gas equation of state (7.2) and gas constant γ=1.4 , corresponding to a diatomic gas. We fix Δx=0.05 i.e, we discretize the interval [0,1] into 20 cells. Our final time is T=0.15 and the time interval is divided into five time steps with time step size Δt=0.03 . The scheme (7.8) is closed at the boundary points by imposing transparent boundary conditions i.e, a zeroth order extrapolation by setting Un0=Un1 and UnJ+1=UnJ .
We also pool the weights of the numerical diffusion by a defining a pooling window of size m=3 in (6.8). Hence, at each time step we need to specify 6 weights namely {wn1,wn2,wn3,wn4,wn5,wn6} in (7.8) by grouping every 3 cell interfaces (starting from the left)., The weights on the boundary interfaces are determined from the transparent boundary conditions. Hence, the scheme (7.8) contains 30 parameters that need to be determined in the training process.
To generate the training set, we consider the following random initial data,
ρ0(x,ω)={ρl+ϵY1(ω),if0<x<0.5+ϵY2(ω),ρr+ϵY3(ω),if0.5+ϵY2(ω)<x<1,v0(x,ω)=0,0<x<1,p0(x,ω)={pl+ϵY4(ω),if0<x<0.5+ϵY2(ω),pr+ϵY5(ω),if0.5+ϵY2(ω)<x<1, | (7.9) |
Here, ρl=pl=1 , ρr=pr=0.4 and ϵ=0.1 . The random variables Y1,2,3,4,5(ω) are drawn from a uniform distribution on the interval [−1,1] . Thus, the initial data (7.9) corresponds to a stochastic version [17] of the well-known Sod shock tube problem for the Euler equations by considering a random interface that separate random jumps in the initial density and pressure.
We draw I=50 samples in (7.9) and label the resulting initial data as ui0 . These data initialize the scheme (7.8) to generate the updated solutions Un,ij for all j and 1≤n≤5 . A reference solution is computed using the standard Rusanov scheme (i.e setting wnj+1/2≡0.5 ) in (7.8) on a fine mesh of 1000 points and with the time step determined by the corresponding CFL number as in (6.7). This fine grid solution is projected on the underlying coarse grid by averaging. We denote the reference solution as Un,ij,ref for all n,j,i .
The training loss function is following version of the L1 error,
E1(w):=ΔxI∑i=1J∑j=15∑n=1(|ρn,ij−ρn,ij,ref|+|vn,ij−vn,ij,ref|+|pn,ij−pn,ij,ref|) | (7.10) |
Here, ρn,ij,vn,ij,pn,ij are the density, velocity and pressure computed on the coarse grid by the numerical scheme (7.8) for the training initial data un,i0 . We choose the L1 error in the primitive variables, rather than in the conservative variables. Our choice is motivated by the fact that in practice, one is interested in measuring the velocity and the pressure, rather than the momentum and energy.
We minimize the loss function (7.10) on the 30 -dimensional parameter space by using a stochastic gradient method with batch size of 5 . The stochastic gradient method is initialized by setting all the weights to wnl≡0.5 , corresponding to the standard Rusanov scheme. As in the case of the Burgers' equation, we will optimize the loss function sequentially in time, corresponding to each time step. The stochastic gradient method converges fairly quickly in this case to the optimized weights presented in Table 8.
n | wn,∗1 | wn,∗2 | wn,∗3 | wn,∗4 | wn,∗5 | wn,∗6 |
1 | 0.5 | 0.5 | 0.42 | 0.5 | 0.5 | 0.5 |
2 | 0.5 | 0.5 | 0.23 | 0.48 | 0.5 | 0.5 |
3 | 0.5 | −0.3 | 0.29 | 0.51 | 0.5 | 0.5 |
4 | 0.5 | 0.17 | 0.19 | 0.54 | 0.5 | 0.5 |
5 | 0.5 | 0.2 | 0.33 | 0.77 | 0.3 | 0.5 |
The (approximate) optimized weights, shown in table 8, follow an interesting pattern. We recall that the solutions of the Euler equations (7.1) with initial data (7.9) consist of the initial discontinuity breaking down into three waves namely, a left moving rarefaction, a right moving contact and an even faster right moving shock wave, see figure 9 for a snapshot of the solution. First, we observe that only 13 of the 30 weights assume a value different from the initial value and this variation is more pronounced with time as waves develop, separate and move away from the initial discontinuity. The values assumed by the weights include locations where the optimal weights are greater than 0.5 , implies that more diffusion is added but there are many locations, particularly on the left of the initial interface, corresponding to the rarefaction wave, where the weights are significantly less than the initial value of 0.5 (one is even negative), indicating that diffusion is removed near the continuous part of the solution.
We generate a test set by drawing 1000 samples from (7.9). The resulting gain, defined as the ratio of the (mean) error of the standard Rusanov scheme on the test set, to the (mean) error of the trained scheme, is computed and is determined to be a factor of 2.17 . Although this seems rather modest given the much more significant gains for the heat and the linear transport equation, it is comparable to the gain for the Burgers' equation reported in Table 7. However, the key quantity to demonstrate the efficiency of the trained scheme is the speed up i.e the ratio of the (mean) computational time for the standard Rusanov scheme (on a finer grid) to achieve the same error as the trained scheme on the underlying coarse grid. Given that the observed (mean) order of convergence of the standard Rusanov scheme on this problem is found to be 0.57 (even if the Rusanov scheme is (formally) first-order accurate), we obtain that the trained scheme on a grid of 20 mesh points (and five time steps) is comparable in error to a standard Rusanov scheme on a grid of 80 mesh points (and time steps determined from the CFL number). Consequently, the speedup is a factor of 16 .
Further insight into the performance of the trained scheme is provided in Figure 9, where we plot the computed density, velocity and pressure with the trained scheme (7.8) and compare it with the standard Rusanov scheme and a reference solution. As seen from the Figure 9 (left), the trained scheme provides a considerably more accurate approximation of the rarefaction wave and the (fast) shock, even on this very coarse grid. On the other hand, it dissipates the contact even further. This counter-intuitive behavior can be explained on the basis of the loss function (7.10). Note that the pressure and the velocity are constant across the contact. Thus, the contribution of the contact in the loss function(7.10) can be rather small. Hence, during the training process, the scheme "decides" not to approximate the contact better but to focus on approximating the shock and the rarefaction more accurately. This strategy clearly bears fruit as the velocity (Figure 9 (middle)) and pressure (Figure 9 (right)) are approximated very well, albeit with small oscillations, leading to a larger overall reduction in the loss function(7.10). This non-intuitive behavior is in stark contrast to traditional approaches that focus on approximating the contact discontinuity more accurately.
Numerical methods for efficient approximation of (time-dependent) ordinary and partial differential equations are well established. However, emerging applications such as uncertainty quantification (UQ), (Bayesian) inverse problems and (real time) optimal control and design require fast (computationally cheap) yet accurate numerical methods. Existing numerical schemes, particularly for nonlinear PDEs, fail to provide reasonable accuracy at very low computational cost.
In this paper, we have proposed a machine learning framework, summarized in Algorithm 2.1, for designing such cheap yet accurate methods. The basis of our algorithm is the observation that the computational cost of existing numerical methods on (very) coarse (space-time) grids is rather low. However, these methods are too inaccurate on such grids to be of practical use. We aim to increase the accuracy (reduce the numerical error) of these methods on coarse grids. To this end, we recast of generalizations of standard numerical methods in terms of artificial neural networks i.e layers of units coupled with linear operators and possibly nonlinear activations, but with a set of undetermined parameters. These parameters are trained to minimize a loss function on a carefully chosen training set in an offline training phase. The training is performed with a stochastic gradient descent method initialized with parameters corresponding to a standard numerical method. Experience indicates that the training process was fast and converged to a local minimum (with a significant decrease in the loss function) in a few iterations.
The key properties of our proposed algorithm are
● The resulting method is always consistent with underlying ODE or PDE by design. Additional constraints can be imposed on the trainable parameters to ensure stability.
● The method is guaranteed to be more accurate than a standard numerical method on the same grid as the gradient descent method is initialized with parameters that correspond to a standard method.
● The method is very simple to implement with minor changes in existing numerical ODE and PDE solvers.
Although no theoretical guarantees have been established on whether the proposed algorithm significantly outperforms standard methods on the test set, we have presented extensive numerical experiments to ascertain this enhancement in performance. Our numerical experiments include a linear and a non-linear ODE, the linear heat and transport equations, scalar conservation laws (Burgers' equation) and the Euler equations of gas dynamics. We considered underlying numerical schemes that include implicit multi-step methods for ODEs, implicit finite difference schemes for linear PDEs and explicit finite volume schemes for nonlinear PDEs. Loss functions, measuring error in either L1 or L2 norms were minimized with stochastic gradient methods. The numerical experiments demonstrated a significant gain in performance (computational speed up) over the underlying standard numerical method. The gains ranged from an order of magnitude for nonlinear problems to two (or three) orders of magnitude for linear problems. In all cases considered here, the machine learning algorithm 2.1 provided a numerical method with reasonable accuracy on a very coarse space-time grid. Hence, it could serve as a basis for the solution of complex problems in UQ, inverse problems or (real time) optimal control.
It is instructive to compare our approach to possible deep learning of the solutions of differential equations. It is essential to recall that one can cast standard numerical methods for time-dependent differential equations in the form of a deep neural network, see section 2 (Figure 2) and section 6 (Figure 7) for concrete examples. At the very least, non-linearities that occur in standard numerical methods, can be approximated with standard deep neural networks based on ReLU activations. Thus, our approach does consist of approximating differential equations with a form of deep networks. Hence, standard deep learning methodology such as back propagation, stochastic gradients, pooling etc and software frameworks like TENSORFLOW, can be readily used.
However, as explained in section 6.2, there is a significant difference in our algorithm with customary deep learning. In machine learning, one usually trains all the parameters in the network. Doing so in our context, see Figure 7, may lead to a lack of consistency with the underlying differential equation. In order to retain consistency (and possibly stability), one needs to constrain the set of parameters in order to recover these properties for every value of the trainable parameters. We do so with our (natural) generalization of numerical schemes. Thus, one can consider algorithm 2.1 as a deep learning algorithm, with a very particular architecture, and with a (very) restricted set of trainable parameters. We retain consistency and at the same time, notice significant gains in computational efficiency. It could be that a free training of all the parameters in the deep network, underlying our generalized numerical method, will automatically identify regions of the parameter space (particularly if additional penalization terms are added to the loss function) to retain consistency and stability. This approach needs to be explored in the future.
It should be emphasized that the (approximate) optimal values of parameters calculated in all our examples, depend strongly on the training set, the underlying coarse grid and parameters of the problem such as the diffusion coefficient for the heat equation (4.1) or the wave speed in the linear advection equation (5.1). One can also train the algorithm to take a possible stochastic model for some of these parameters into account. Moreover for sake of simplicity of exposition, we have mostly considered model problems with very simple underlying numerical methods in this paper. The number of undetermined parameters was in the range of 2–3 for ODEs and linear PDEs to at most 30 for nonlinear PDEs. State of the art deep learning architectures handle hundreds of thousands to millions of trainable parameters and we anticipate a much larger gain in efficiency when we dramatically increase the width and depth of our schemes, represented as networks. Applications of the proposed algorithm 2.1 to realistic multi-dimensional problems in uncertainty quantification and inverse problems is the subject of ongoing work.
The author thanks Kjetil O. Lye (SAM, ETH Zürich) for interesting discussions on this topic. The research of SM was partially funded by ERC CoG 770880 COMANFLO.
All authors declare no conflicts of interest in this paper.
[1] |
Barron AR (1993) Universal approximation bounds for superpositions of a sigmoidal function. IEEE T Inform Theory 39: 930–945. doi: 10.1109/18.256500
![]() |
[2] | Beck C, Weinan E and Jentzen A (2017) Machine learning approximation algorithms for high dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations. Technical Report 2017-49, Seminar for Applied Mathematics, ETH Zürich. |
[3] | Bijl H, Lucor D, Mishra S, et al. (2014) Uncertainty quantification in computational fluid dynamics. Lecture notes in computational science and engineering 92, Springer. |
[4] | Borzi A and Schulz V (2012) Computational optimization of systems governed by partial differential equations, SIAM. |
[5] | Brenner SC and Scott LR (2008) The mathematical theory of finite element methods. Texts in applied mathematics 15, Springer. |
[6] | Dafermos CM (2005) Hyperbolic Conservation Laws in Continuum Physics (2nd Ed.), Springer Verlag. |
[7] |
Weinan E and Yu B (2018) The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems. Commun Math Stat 6: 1–12. doi: 10.1007/s40304-018-0127-z
![]() |
[8] |
Fjordholm US, Mishra S and Tadmor E (2012) Arbitrarily high-order order accurate essentially non-oscillatory entropy stable schemes for systems of conservation laws. SIAM J Numer Anal 50: 544–573. doi: 10.1137/110836961
![]() |
[9] | Ghanem R, Higdon D and Owhadi H (2016) Handbook of uncertainty quantification, Springer. |
[10] | Godlewski E and Raviart PA (1991) Hyperbolic Systems of Conservation Laws. Mathematiques et Applications, Ellipses Publ., Paris. |
[11] | Goodfellow I, Bengio Y and Courville A (2016) Deep learning. MIT Press. Available from: http://www.deeplearningbook.org. |
[12] | Hairer E and Wanner G (1991) Solving ordinary differential equations. Springer Series in computational mathematics, 14, Springer. |
[13] |
Hornik K, Stinchcombe M, and White H (1989) Multilayer feedforward networks are universal approximators. Neural networks 2: 359–366. doi: 10.1016/0893-6080(89)90020-8
![]() |
[14] | Kingma DP and Ba JL (2015) Adam: a Method for Stochastic Optimization. International Conference on Learning Representations, 1–13. |
[15] |
LeCun Y, Bengio Y and Hinton G (2015) Deep learning. Nature 521: 436–444. doi: 10.1038/nature14539
![]() |
[16] | LeVeque RJ (2007) Finite difference methods for ordinary and partial differential equations, steady state and time dependent problems, SIAM. |
[17] |
Mishra S, Schwab C and Šukys J (2012) Multi-level Monte Carlo finite volume methods for nonlinear systems of conservation laws in multi-dimensions. J Comput Phys 231: 3365–3388. doi: 10.1016/j.jcp.2012.01.011
![]() |
[18] | Miyanawala TP and Jaiman RK (2017) An efficient deep learning technique for the Navier-Stokes equations: application to unsteady wake flow dynamics. Preprint, arXiv :1710.09099v2. |
[19] | Poon H and Domingos P (2011) Sum-product Networks: A new deep architecture. International conference on computer vision (ICCV), 689–690. |
[20] |
Raissi M and Karniadakis GE (2018) Hidden physics models: machine learning of nonlinear partial differential equations. J Comput Phys 357: 125–141. doi: 10.1016/j.jcp.2017.11.039
![]() |
[21] | Ray D and Hesthaven JS (2018) An artificial neural network as a troubled cell indicator. J Comput Phys to appear. |
[22] | Ruder S (2017) An overview of gradient descent optimization algorithms. Preprint, arXiv.1609.04747v2. |
[23] | Schwab C and Zech J (2017) Deep learning in high dimension. Technical Report 2017-57, Seminar for Applied Mathematics, ETH Zürich. |
[24] |
Stuart AM (2010) Inverse problems: a Bayesian perspective. Acta Numerica 19: 451–559. doi: 10.1017/S0962492910000061
![]() |
[25] | Tadmor E (2003) Entropy stability theory for difference approximations of nonlinear conservation laws and related time-dependent problems. Acta Numerica, 451–512. |
[26] | Tompson J, Schlachter K, Sprechmann P, et al. (2017) Accelarating Eulerian fluid simulation with convolutional networks. Preprint, arXiv:1607.03597v6. |
[27] | Trefethen LN (2000) Spectral methods in MATLAB, SIAM. |
[28] | Troltzsch F (2010) Optimal control of partial differential equations. AMS. |
[29] | Quateroni A, Manzoni A and Negri F (2015) Reduced basis methods for partial differential equations: an introduction, Springer Verlag. |
[30] |
Yarotsky D (2017) Error bounds for approximations with deep ReLU networks. Neural Networks 94: 103–114 doi: 10.1016/j.neunet.2017.07.002
![]() |
1. | Pan Zhang, Yanyan Hu, Yuchen Jin, Shaogui Deng, Xuqing Wu, Jiefu Chen, 2020, A Maxwell's Equations Based Deep Learning Method for Time Domain Electromagnetic Simulations, 978-1-7281-6192-1, 1, 10.1109/WMCS49442.2020.9172407 | |
2. | Martin Hutzenthaler, Arnulf Jentzen, Thomas Kruse, Tuan Anh Nguyen, A proof that rectified deep neural networks overcome the curse of dimensionality in the numerical approximation of semilinear heat equations, 2020, 1, 2662-2963, 10.1007/s42985-019-0006-9 | |
3. | Kjetil O. Lye, Siddhartha Mishra, Deep Ray, Deep learning observables in computational fluid dynamics, 2020, 410, 00219991, 109339, 10.1016/j.jcp.2020.109339 | |
4. | Pan Zhang, Yanyan Hu, Yuchen Jin, Shaogui Deng, Xuqing Wu, Jiefu Chen, A Maxwell's Equations Based Deep Learning Method for Time Domain Electromagnetic Simulations, 2021, 6, 2379-8815, 35, 10.1109/JMMCT.2021.3057793 | |
5. | KJETIL O. LYE, SIDDHARTHA MISHRA, ROBERTO MOLINARO, A multi-level procedure for enhancing accuracy of machine learning algorithms, 2020, 0956-7925, 1, 10.1017/S0956792520000224 | |
6. | Ignacio Brevis, Ignacio Muga, Kristoffer G. van der Zee, A machine-learning minimal-residual (ML-MRes) framework for goal-oriented finite element discretizations, 2020, 08981221, 10.1016/j.camwa.2020.08.012 | |
7. | Kjetil O. Lye, Siddhartha Mishra, Deep Ray, Praveen Chandrashekar, Iterative surrogate model optimization (ISMO): An active learning algorithm for PDE constrained optimization with deep neural networks, 2021, 374, 00457825, 113575, 10.1016/j.cma.2020.113575 | |
8. | Dennis Elbrächter, Philipp Grohs, Arnulf Jentzen, Christoph Schwab, DNN Expression Rate Analysis of High-Dimensional PDEs: Application to Option Pricing, 2021, 0176-4276, 10.1007/s00365-021-09541-6 | |
9. | Siddhartha Mishra, Roberto Molinaro, Estimates on the generalization error of physics-informed neural networks for approximating PDEs, 2023, 43, 0272-4979, 1, 10.1093/imanum/drab093 | |
10. | Michael Herty, Torsten Trimborn, Giuseppe Visconti, Mean-field and kinetic descriptions of neural differential equations, 2022, 4, 2639-8001, 271, 10.3934/fods.2022007 | |
11. | P. Minakowski, T. Richter, A priori and a posteriori error estimates for the Deep Ritz method applied to the Laplace and Stokes problem, 2023, 421, 03770427, 114845, 10.1016/j.cam.2022.114845 | |
12. | Paola F. Antonietti, Matteo Caldana, Luca Dede’, Accelerating Algebraic Multigrid Methods via Artificial Neural Networks, 2023, 51, 2305-221X, 1, 10.1007/s10013-022-00597-w | |
13. | Ru Huang, Ruipeng Li, Yuanzhe Xi, Learning Optimal Multigrid Smoothers via Neural Networks, 2022, 1064-8275, S199, 10.1137/21M1430030 | |
14. | Siddhartha Mishra, Roberto Molinaro, Estimates on the generalization error of physics-informed neural networks for approximating a class of inverse problems for PDEs, 2022, 42, 0272-4979, 981, 10.1093/imanum/drab032 | |
15. | Dmitry Alexeevich Zhevnenko, Fedor Pavlovich Meshchaninov, Vladislav Sergeevich Kozhevnikov, Evgeniy Sergeevich Shamin, Oleg Alexandrovich Telminov, Evgeniy Sergeevich Gornev, Research and Development of Parameter Extraction Approaches for Memristor Models, 2021, 12, 2072-666X, 1220, 10.3390/mi12101220 | |
16. | Qing Li, Steinar Evje, Learning the nonlinear flux function of a hidden scalar conservation law from data, 2022, 18, 1556-1801, 48, 10.3934/nhm.2023003 | |
17. | Tommaso Tassi, Alberto Zingaro, Luca Dede', A machine learning approach to enhance the SUPG stabilization method for advection-dominated differential problems, 2022, 5, 2640-3501, 1, 10.3934/mine.2023032 | |
18. | Qiang Zou, En-liang Hu, 2022, A Differential Equation Solving Method Based on Fourier Series Neural Network, 978-1-6654-5351-6, 977, 10.1109/ICNISC57059.2022.00197 | |
19. | Ignacio Brevis, Ignacio Muga, Kristoffer G. van der Zee, Neural control of discrete weak formulations: Galerkin, least squares & minimal-residual methods with quasi-optimal weights, 2022, 402, 00457825, 115716, 10.1016/j.cma.2022.115716 | |
20. | Yixin Li, Xianliang Hu, Artificial neural network approximations of Cauchy inverse problem for linear PDEs, 2022, 414, 00963003, 126678, 10.1016/j.amc.2021.126678 | |
21. | Vladimir I. Gorbachenko, Dmitry A. Stenkin, 2023, Chapter 1, 978-3-031-20874-4, 3, 10.1007/978-3-031-20875-1_1 | |
22. | Yangyang Qiao, Qing Li, Steinar Evje, On the numerical discretization of a tumor progression model driven by competing migration mechanisms, 2022, 4, 2640-3501, 1, 10.3934/mine.2022046 | |
23. | Siyu Qi, Minxue He, Zhaojun Bai, Zhi Ding, Prabhjot Sandhu, Yu Zhou, Peyman Namadi, Bradley Tom, Raymond Hoang, Jamie Anderson, Multi-Location Emulation of a Process-Based Salinity Model Using Machine Learning, 2022, 14, 2073-4441, 2030, 10.3390/w14132030 | |
24. | Jan Blechschmidt, Oliver G. Ernst, Three ways to solve partial differential equations with neural networks — A review, 2021, 44, 0936-7195, 10.1002/gamm.202100006 | |
25. | Rabiu Bashir Yunus, Samsul Ariffin Abdul Karim, Afza Shafie, Muhammad Izzatullah, Ahmed Kherd, Mohammad Khatim Hasan, Jumat Sulaiman, 2022, Chapter 4, 978-3-031-04027-6, 37, 10.1007/978-3-031-04028-3_4 | |
26. | Roberto Molinaro, Joel-Steven Singh, Sotiris Catsoulis, Chidambaram Narayanan, Djamel Lakehal, Embedding data analytics and CFD into the digital twin concept, 2021, 214, 00457930, 104759, 10.1016/j.compfluid.2020.104759 | |
27. | Siddhartha Mishra, T. Konstantin Rusch, Enhancing Accuracy of Deep Learning Algorithms by Training with Low-Discrepancy Sequences, 2021, 59, 0036-1429, 1811, 10.1137/20M1344883 | |
28. | Philipp Grohs, Fabian Hornung, Arnulf Jentzen, Philippe von Wurstemberger, A Proof that Artificial Neural Networks Overcome the Curse of Dimensionality in the Numerical Approximation of Black–Scholes Partial Differential Equations, 2023, 284, 0065-9266, 10.1090/memo/1410 | |
29. | Qing Li, Jiahui Geng, Steinar Evje, Identification of the flux function of nonlinear conservation laws with variable parameters, 2023, 01672789, 133773, 10.1016/j.physd.2023.133773 | |
30. | Harshini Pothina, K. V. Nagaraja, Chandan K, 2023, An Overview of Material Study in Robotic Hand using Finite Element Approach, 978-1-6654-9400-7, 1, 10.1109/ICAECT57570.2023.10117855 | |
31. | Suryanarayana Maddu, Dominik Sturm, Bevan L. Cheeseman, Christian L. Müller, Ivo F. Sbalzarini, STENCIL-NET for equation-free forecasting from data, 2023, 13, 2045-2322, 10.1038/s41598-023-39418-6 | |
32. | Vitaly Gyrya, Mikhail Shashkov, Alexei Skurikhin, Svetlana Tokareva, Machine Learning Approaches for the Solution of the Riemann Problem in Fluid Dynamics: a Case Study, 2024, 2096-6385, 10.1007/s42967-023-00334-1 | |
33. | Danimir T. Doncevic, Alexander Mitsos, Yue Guo, Qianxiao Li, Felix Dietrich, Manuel Dahmen, Ioannis G. Kevrekidis, A Recursively Recurrent Neural Network (R2N2) Architecture for Learning Iterative Algorithms, 2024, 46, 1064-8275, A719, 10.1137/22M1535310 | |
34. | Jinghong Xu, Yuqian Zhou, Qian Liu, Neural operator Res-FNO based on dual-view feature fusion and Fourier transform, 2024, 148, 10512004, 104468, 10.1016/j.dsp.2024.104468 | |
35. | S. Piani, P. Farrell, W. Lei, N. Rotundo, L. Heltai, Data-driven solutions of ill-posed inverse problems arising from doping reconstruction in semiconductors, 2024, 32, 2769-0911, 10.1080/27690911.2024.2323626 | |
36. | Ignacio Brevis, Ignacio Muga, David Pardo, Oscar Rodriguez, Kristoffer G. van der Zee, Learning quantities of interest from parametric PDEs: An efficient neural-weighted Minimal Residual approach, 2024, 164, 08981221, 139, 10.1016/j.camwa.2024.04.006 | |
37. | Victor Morand, Nils Müller, Ryan Weightman, Benedetto Piccoli, Alexander Keimer, Alexandre M. Bayen, Deep learning of first-order nonlinear hyperbolic conservation law solvers, 2024, 00219991, 113114, 10.1016/j.jcp.2024.113114 | |
38. | Steven L. Brunton, J. Nathan Kutz, Promising directions of machine learning for partial differential equations, 2024, 2662-8457, 10.1038/s43588-024-00643-2 | |
39. | M. Manav, R. Molinaro, S. Mishra, L. De Lorenzis, Phase-field modeling of fracture with physics-informed deep learning, 2024, 429, 00457825, 117104, 10.1016/j.cma.2024.117104 | |
40. | Nick McGreivy, Ammar Hakim, Weak baselines and reporting biases lead to overoptimism in machine learning for fluid-related partial differential equations, 2024, 2522-5839, 10.1038/s42256-024-00897-5 | |
41. | Scott Levie, Philip Cardiff, Accelerated segregated finite volume solvers for linear elastostatics using machine learning, 2024, 198, 09659978, 103763, 10.1016/j.advengsoft.2024.103763 | |
42. | Bassey Etim, Alia Al-Ghosoun, Jamil Renno, Mohammed Seaid, M. Shadi Mohamed, Machine Learning-Based Modeling for Structural Engineering: A Comprehensive Survey and Applications Overview, 2024, 14, 2075-5309, 3515, 10.3390/buildings14113515 | |
43. | Ariel Neufeld, Philipp Schmocker, Sizhou Wu, Full error analysis of the random deep splitting method for nonlinear parabolic PDEs and PIDEs, 2024, 10075704, 108556, 10.1016/j.cnsns.2024.108556 | |
44. | Bangti Jin, Qingle Lin, Zhi Zhou, Optimizing Coarse Propagators in Parareal Algorithms, 2025, 47, 1064-8275, A735, 10.1137/23M1619733 | |
45. | Ayan Chakraborty, Thomas Wick, Timon Rabczuk, Xiaoying Zhuang, Multigoal-oriented dual-weighted-residual error estimation using PINNs, 2025, 1, 3005-1428, 10.1007/s44379-025-00012-4 |
c | g∗2 | g∗3 | Gain |
1 | 0.1 | 1.4 | 4.13 |
10 | −0.64 | −2.04 | 2.11 |
100 | 11 | 0.02 | 13.03 |
c | g∗2 | Gain |
0.2 | 0.41 | 1.5 |
1 | 0.16 | 3.02 |
5 | 0.03 | 10.54 |
c | g1,∗ | b1,∗−2 | b1,∗−1 | Gain-1 | Gain-2 | Gain-3 | Gain-4 |
0.1 | 0.64 | 0 | 1 | 77.31 | 19.58 | 49.6 | 38.97 |
1 | 0.3 | 0 | 1.12 | 3.97 | 3.21 | 3.59 | 3.98 |
10 | 0.04 | 1.2 | 2.43 | 11.91 | 46.97 | 11.49 | 48.17 |
c | g1,∗ | b1,∗−2 | b1,∗−1 | Gain-1 | Gain-2 | Gain-3 | Gain-4 |
0.1 | 0.24 | −0.26 | 2.15 | 12.7 | 3.16 | 10.56 | 3.78 |
1 | 0.14 | −0.38 | 2.53 | 2.09 | 6.03 | 1.98 | 6.38 |
10 | 0 | 20 | 20 | 46.02 | 228.93 | 44.72 | 231.32 |
c | g1,∗ | b1,∗−1 | Gain |
0.5 | 0.2 | −2 | 3.72 |
2.0 | −20 | 0 | 96.51 |
w1,∗1 | w1,∗2 | w1,∗3 | w2,∗1 | w2,∗2 | w2,∗3 | Gain | Speedup |
0.26 | 0.2 | 0 | 0.3 | 0.3 | 0 | 1.42 | 5.33 |
w1,∗1 | w1,∗2 | w1,∗3 | w2,∗1 | w2,∗2 | w2,∗3 | Gain | Speedup |
0.24 | 0.24 | 1.25 | 0.24 | 0.26 | 0 | 2.48 | 9.55 |
n | wn,∗1 | wn,∗2 | wn,∗3 | wn,∗4 | wn,∗5 | wn,∗6 |
1 | 0.5 | 0.5 | 0.42 | 0.5 | 0.5 | 0.5 |
2 | 0.5 | 0.5 | 0.23 | 0.48 | 0.5 | 0.5 |
3 | 0.5 | −0.3 | 0.29 | 0.51 | 0.5 | 0.5 |
4 | 0.5 | 0.17 | 0.19 | 0.54 | 0.5 | 0.5 |
5 | 0.5 | 0.2 | 0.33 | 0.77 | 0.3 | 0.5 |
c | g∗2 | g∗3 | Gain |
1 | 0.1 | 1.4 | 4.13 |
10 | −0.64 | −2.04 | 2.11 |
100 | 11 | 0.02 | 13.03 |
c | g∗2 | Gain |
0.2 | 0.41 | 1.5 |
1 | 0.16 | 3.02 |
5 | 0.03 | 10.54 |
c | g1,∗ | b1,∗−2 | b1,∗−1 | Gain-1 | Gain-2 | Gain-3 | Gain-4 |
0.1 | 0.64 | 0 | 1 | 77.31 | 19.58 | 49.6 | 38.97 |
1 | 0.3 | 0 | 1.12 | 3.97 | 3.21 | 3.59 | 3.98 |
10 | 0.04 | 1.2 | 2.43 | 11.91 | 46.97 | 11.49 | 48.17 |
c | g1,∗ | b1,∗−2 | b1,∗−1 | Gain-1 | Gain-2 | Gain-3 | Gain-4 |
0.1 | 0.24 | −0.26 | 2.15 | 12.7 | 3.16 | 10.56 | 3.78 |
1 | 0.14 | −0.38 | 2.53 | 2.09 | 6.03 | 1.98 | 6.38 |
10 | 0 | 20 | 20 | 46.02 | 228.93 | 44.72 | 231.32 |
c | g1,∗ | b1,∗−1 | Gain |
0.5 | 0.2 | −2 | 3.72 |
2.0 | −20 | 0 | 96.51 |
w1,∗1 | w1,∗2 | w1,∗3 | w2,∗1 | w2,∗2 | w2,∗3 | Gain | Speedup |
0.26 | 0.2 | 0 | 0.3 | 0.3 | 0 | 1.42 | 5.33 |
w1,∗1 | w1,∗2 | w1,∗3 | w2,∗1 | w2,∗2 | w2,∗3 | Gain | Speedup |
0.24 | 0.24 | 1.25 | 0.24 | 0.26 | 0 | 2.48 | 9.55 |
n | wn,∗1 | wn,∗2 | wn,∗3 | wn,∗4 | wn,∗5 | wn,∗6 |
1 | 0.5 | 0.5 | 0.42 | 0.5 | 0.5 | 0.5 |
2 | 0.5 | 0.5 | 0.23 | 0.48 | 0.5 | 0.5 |
3 | 0.5 | −0.3 | 0.29 | 0.51 | 0.5 | 0.5 |
4 | 0.5 | 0.17 | 0.19 | 0.54 | 0.5 | 0.5 |
5 | 0.5 | 0.2 | 0.33 | 0.77 | 0.3 | 0.5 |