
In this paper, we considered a partitioned epidemic model with reaction-diffusion behavior, analyzed the dynamics of populations in various compartments, and explored the significance of spreading parameters. Unlike traditional approaches, we proposed a novel paradigm for addressing the dynamics of epidemic models: Inferring model dynamics and, more importantly, parameter inversion to analyze disease spread using Reaction-Diffusion Disease Information Neural Networks (RD-DINN). This method leverages the principles of hidden disease spread to overcome the black-box mechanism of neural networks relying on large datasets. Through an embedded deep neural network incorporating disease information, the RD-DINN approximates the dynamics of the model while predicting unknown parameters. To demonstrate the robustness of the RD-DINN method, we conducted an analysis based on two disease models with reaction-diffusion terms. Additionally, we systematically investigated the impact of the number of training points and noise data on the performance of the RD-DINN method. Our results indicated that the RD-DINN method exhibits relative errors less than 1% in parameter inversion with 10% noise data. In terms of dynamic predictions, the absolute error at any spatiotemporal point does not exceed 5%. In summary, we present a novel deep learning framework RD-DINN, which has been shown to be effective for reaction-diffusion disease modeling, providing an advanced computational tool for dynamic and parametric prediction of epidemic spread. The data and code used can be found at https://github.com/yuanfanglila/RD-DINN.
Citation: Xiao Chen, Fuxiang Li, Hairong Lian, Peiguang Wang. A deep learning framework for predicting the spread of diffusion diseases[J]. Electronic Research Archive, 2025, 33(4): 2475-2502. doi: 10.3934/era.2025110
[1] | Baolan Li, Shaojuan Ma, Hufei Li, Hui Xiao . Probability density prediction for linear systems under fractional Gaussian noise excitation using physics-informed neural networks. Electronic Research Archive, 2025, 33(5): 3007-3036. doi: 10.3934/era.2025132 |
[2] | Jinjun Yong, Xianbing Luo, Shuyu Sun . Deep multi-input and multi-output operator networks method for optimal control of PDEs. Electronic Research Archive, 2024, 32(7): 4291-4320. doi: 10.3934/era.2024193 |
[3] | Ruohan Cao, Jin Su, Jinqian Feng, Qin Guo . PhyICNet: Physics-informed interactive learning convolutional recurrent network for spatiotemporal dynamics. Electronic Research Archive, 2024, 32(12): 6641-6659. doi: 10.3934/era.2024310 |
[4] | Hui-Ching Wu, Yu-Chen Tu, Po-Han Chen, Ming-Hseng Tseng . An interpretable hierarchical semantic convolutional neural network to diagnose melanoma in skin lesions. Electronic Research Archive, 2023, 31(4): 1822-1839. doi: 10.3934/era.2023094 |
[5] | Ishtiaq Ali . Advanced machine learning technique for solving elliptic partial differential equations using Legendre spectral neural networks. Electronic Research Archive, 2025, 33(2): 826-848. doi: 10.3934/era.2025037 |
[6] | Zhiyuan Feng, Kai Qi, Bin Shi, Hao Mei, Qinghua Zheng, Hua Wei . Deep evidential learning in diffusion convolutional recurrent neural network. Electronic Research Archive, 2023, 31(4): 2252-2264. doi: 10.3934/era.2023115 |
[7] | Ziqing Yang, Ruiping Niu, Miaomiao Chen, Hongen Jia, Shengli Li . Adaptive fractional physical information neural network based on PQI scheme for solving time-fractional partial differential equations. Electronic Research Archive, 2024, 32(4): 2699-2727. doi: 10.3934/era.2024122 |
[8] | Sanqiang Yang, Zhenyu Yang, Leifeng Zhang, Yapeng Guo, Ju Wang, Jingyong Huang . Research on deformation prediction of deep foundation pit excavation based on GWO-ELM model. Electronic Research Archive, 2023, 31(9): 5685-5700. doi: 10.3934/era.2023288 |
[9] | Xite Yang, Ankang Zou, Jidi Cao, Yongzeng Lai, Jilin Zhang . Systemic risk prediction based on Savitzky-Golay smoothing and temporal convolutional networks. Electronic Research Archive, 2023, 31(5): 2667-2688. doi: 10.3934/era.2023135 |
[10] | Wenhui Feng, Yuan Li, Xingfa Zhang . A mixture deep neural network GARCH model for volatility forecasting. Electronic Research Archive, 2023, 31(7): 3814-3831. doi: 10.3934/era.2023194 |
In this paper, we considered a partitioned epidemic model with reaction-diffusion behavior, analyzed the dynamics of populations in various compartments, and explored the significance of spreading parameters. Unlike traditional approaches, we proposed a novel paradigm for addressing the dynamics of epidemic models: Inferring model dynamics and, more importantly, parameter inversion to analyze disease spread using Reaction-Diffusion Disease Information Neural Networks (RD-DINN). This method leverages the principles of hidden disease spread to overcome the black-box mechanism of neural networks relying on large datasets. Through an embedded deep neural network incorporating disease information, the RD-DINN approximates the dynamics of the model while predicting unknown parameters. To demonstrate the robustness of the RD-DINN method, we conducted an analysis based on two disease models with reaction-diffusion terms. Additionally, we systematically investigated the impact of the number of training points and noise data on the performance of the RD-DINN method. Our results indicated that the RD-DINN method exhibits relative errors less than 1% in parameter inversion with 10% noise data. In terms of dynamic predictions, the absolute error at any spatiotemporal point does not exceed 5%. In summary, we present a novel deep learning framework RD-DINN, which has been shown to be effective for reaction-diffusion disease modeling, providing an advanced computational tool for dynamic and parametric prediction of epidemic spread. The data and code used can be found at https://github.com/yuanfanglila/RD-DINN.
The early spread dynamics of infectious diseases have never garnered as much attention as they do today. The use of mathematical models to understand the dynamics of epidemiology can be traced back to 1927 when Kermack first introduced the SIR epidemic model [1]. In a closed environment, we assume that the total population size N remains constant and employ a set of ordinary differential equations (ODEs) to describe the mutual transitions among three categories of populations. S(t),I(t),R(t) are functions of time t, where S(t) represents susceptible individuals, I(t) represents infected individuals, and R(t) represents recovered individuals. The dynamics of these three populations can be expressed as the following equations
{dSdt=−βSI,dIdt=βSI−αI,dRdt=αI. | (1.1) |
The model in Eq (1.1) assumes that births and deaths are not allowed in a closed environment. Parameter β represents the disease transmission rate and βSI denotes the number of individuals infected with the disease per unit of time. α represents the recovery rate, signifying the number of individuals recovering and transitioning from the infectious category I(t) to the recovered category R(t) per unit of time.
However, diseases do not initially spread uniformly across the region but also diffuse in space over time. Moreover, different diseases have different ways of spreading. For instance, the COVID-19 pandemic, reached 30 countries within 45 days, which saw the virus geographically spreading. Countries with a higher per-capita reported case counts tended to be highly globalized and had a significant degree of population mobility. Additionally, countries with fewer cases are mostly characterized by lower levels of globalization and domestic mobility. This implies that the disease gradually spreads, and the variation in the number of infections changes over time and space [2]. Using PDE to study diffuse disease models. Ndolane combines fractional derivative with diffusion term to study the dynamic behavior of fractional SIR model, which provides a new perspective for the application of fractional diffusion term in epidemic models [3].
Reaction-diffusion systems are employed to describe the spatial spread of infectious diseases. We assume that the transmission and recovery rates remain constant over time and space, described by the following partial differential equations
{∂S∂t−d1∂2S∂x2=B−βSI−μS,x∈Ω,t>0,∂I∂t−d2∂2I∂x2=βSI−αI−μI,x∈Ω,t>0,∂R∂t−d3∂2R∂x2=αI−μR,x∈Ω,t>0,∂ηS=∂ηI=∂ηR=0,x∈∂Ω,t>0,S(x,0)=S0(x),I(x,0)=I0(x),R(x,0)=R0(x),x∈ˉΩ, | (1.2) |
where Ω is a fixed and bounded domain with a smooth boundary ∂Ω in Rn, η represents the outward unit normal vector on the boundary. The homogeneous Neumann boundary conditions imply that the considered diffusion SIR system is self-contained and has no migration across the boundary. In reality, populations always involve new births and deaths, so we consider births and deaths within the region Ω. B denotes the number of newborn individuals and μ represents the natural death rate. Here we assume the death rates are the same for each population category. The constants di(i=1,2,3) are diffusion coefficients representing the diffusion capabilities of each population category. S(x,t),I(x,t),R(x,t) are the population quantities dependent on time and space S(x,t)+I(x,t)+R(x,t)=N. Since the top two equations in the system do not contain the recovered individuals R, we can analyze the dynamics without the third equation in the system.
Understanding the dynamics of disease spread and accurately estimating the transmission and recovery rates are crucial for epidemiological simulation models. However, accurately estimating these rates often require large amounts of expensive data [4]. Moreover, estimating parameters in most infectious disease models remains challenging because disease transmission is inherently stochastic and interdependent with other parameters [5].
As is well-known, traditional mathematical methods require only limited data to solve problems. These problems involve inferring the transmission dynamics of infectious diseases based on finite disease dynamic data, including the use of finite difference approximations [6]. Sindhu et al. discussed the application of classical statistical distributions [7] and parametric methods in medical data analysis [8,9]. However, traditional numerical methods are expensive and difficult to obtain data, and they especially perform poorly when dealing with noisy data. Therefore, there is an urgent need to explore new algorithms that can consider limited disease data while leveraging computational models to improve the accuracy, reliability, and predictability of disease dynamics and transmission parameters. Shafiq et al. compared the performance of artificial neural networks (ANN) and traditional parametric methods in epidemic data modeling [10], while combining maximum likelihood estimation (MLE) with ANN to explore breast cancer survival data analysis [11]. The scientific machine learning approach that combines neural networks with physics-informed modeling has emerged as a new tool in situations with limited or even no data. This in turn extends to the study of a wide range of problems in physics, biology, chemistry, and infectious diseases, including fluid dynamics, solid mechanics, and biochemistry, and the ecology of infectious diseases [12]. The Physics-Informed Neural Network (PINN) proposed by Raissi et al. proved to be effective in combining data with physical information[13]. Specifically, Physics-Informed Neural Network (PINN) has been widely used to solve inverse problems or ill-posed problems by incorporating partial differential equations (PDEs) models with sparse data [14]. By modeling nonlinear systems with neural networks and including PDEs in the loss function as a soft penalty term, the amount of required data is reduced. This is achieved using existing knowledge (PDE) to constrain the search space of the model. Yazdani et al. successfully embedded systems biology information into neural networks, creating Systems Biology-Informed Neural Networks (SBINN) [15]. Using relatively fewer experimental measurements, they inferred the dynamics of unobserved species and unknown model parameters. Raissi et al. initially applied the Physics-Informed Neural Network method to a standard SIR(ODE) infectious disease system to accurately estimate model parameters [16]. Subsequently, Shaier et al. extended the application of the Physics-Informed Neural Network method to various variants of SIR models, generating the unified Differential Information Neural Network (DINN) framework [17]. Yin et al. used a generalized partial differential approach to embed ODE disease mechanisms in neural networks, seamlessly coupling deep neural networks and dynamic systems and predicting disease states [18]. Difonzo et al. verified that even in the presence of challenging kernel functions, Physics-Informed Neural Network (PINN) can effectively solve inverse problems in periodic dynamics [19]. Difonzo et al. proposed applying radial basis functions (RBFs) as activation functions in suitably designed Physics-Informed Neural Network (PINN) to solve the inverse problem [20]. These frameworks effectively learn about disease transmission and provide estimates for unknown parameters.
Inspired by the Physics-Informed Neural Network (PINN) framework, we propose the RD-DINN method, which is a deep neural network embedded with reaction-diffusion disease information. This model embeds information about the diffusion disease into a neural network to accurately estimate the dynamics of different population categories and unknown parameters. We consider both the direct (forward) and the inverse problems. The direct problem involves inferring unknown dynamics at any time and location related with known initial and boundary conditions, while the inverse problem aims to deduce unknown parameters, dynamics, and model parameters with limited observed data. We analyze benchmark cases of SIR [21] and SEIR [22] infectious disease models with diffusion terms, demonstrating the robustness and reliability of our method through various network architectures and the addition of different levels of noise perturbations. We also examine the limitations and shortcomings of our proposed approach in terms of accuracy. The structure of this paper is as follows: In Section 2, we review essential background information on infectious diseases and the development of neural networks and Physics-Informed Neural Network (PINN). In Section 3, we introduce the technical approach of our proposed diffusion disease information neural network. Section 4 showcases the performance of our method on direct and inverse problems in two benchmark cases through computational experiments. In Section 5, we verify the feasibility of the model under real data on the new crown outbreak using five cities in Michigan in April 2020 as distributional conditions for the space x *. Finally, in Section 6, we summarize and generalize our approach.
*https://www.michigan.gov/coronavirus/stats/archived-coronavirus-data
In the field of mathematical biology, PDE has a better understanding and description of infectious disease propagation, epidemiological dynamics, and other related phenomena exhibiting diffusion behavior. However, when solving most coupled nonlinear PDE systems, it becomes quite challenging, and numerical solutions are the preferred approach. Typically, expensive commercial numerical solvers, such as finite difference method (FDM) [23] and finite element method (FEM). Moreover, numerical methods have some drawbacks, including high time consumption, repetitiveness, and a lack of autonomy. To better analyze the dynamic behavior of infectious diseases, it is crucial to build models that can quickly predict diseases. Easily, the SIR equation with a diffusion term and an initial function being positive, has only positive for all t∈(0,Tmax). Therefore, there are two scenarios for the disease spread in the spatiotemporal domain: the disease disappears or persists, which leads to disease-free equilibrium and endemic equilibrium one. We also point out the concept of the basic reproduction number R0 [24]. In the diffusion SIR model, R0 is often expressed as R0=βα and serves as a crucial indicator measuring the potential of infectious disease transmission. It represents the average number of susceptible individuals per infected person per unit of time. If R0 is greater than 1, an outbreak is likely and otherwise the epidemic may gradually subside. In our two benchmark cases, considering births and deaths, it is expressed as
R0=βBμ(μ+α). | (2.1) |
Theorem 2.1. Let β denote the infection rate, α the recovery rate, B the birth rate, and μ the natural mortality rate, where β, α, B, μ ∈ R>0. Then, the basic reproduction number R0 of the system at the disease-free equilibrium (DFE) is given by:
R0=βBμ(μ+α). |
Proof. At the disease-free equilibrium, infected individuals I=0. Thus, the system equation simplifies to
∂S∂t−d1∂2S∂x2=B−μS. | (2.2) |
At equilibrium, the time derivative is zero and, assuming a uniform spatial distribution, the spatial derivative is also zero. Therefore, the disease-free equilibrium point S∗=Bμ. In the vicinity of the disease-free equilibrium, we can linearize the system. Let S=S+s and I=i, where s and i are small perturbations. By substituting these into the infection equation, ignoring the higher-order terms βsi, we get
∂i∂t−d2∂2i∂x2=βS∗i−(α+μ)i. | (2.3) |
In the neighborhood of the disease-free equilibrium, the generation rate of infected individuals is βSi and the disappearance rate is (α+μ)i. Thus, the basic reproduction number R0 is the ratio of the generation rate to the disappearance rate
R0=βS∗α+μ=βBμ(α+μ). | (2.4) |
The stability analysis of diseases and the estimation of the basic reproduction number are fundamental explanations for disease dynamics, playing a crucial role in predicting disease spread and formulating preventive measures. However, the ability to track computed values of various populations may vary with different parameter combinations, and there could be variations in real data for different demographic groups. Furthermore, all discussed methods require known initial subpopulation sizes.
With the advancement of science and technology, mathematical biology and epidemiology face a significant challenge in developing a coherent framework that enables them to integrate differential equations with the existing data. We introduce deep learning as a powerful approach, adopting a neural network architecture that mimics brain mechanisms [25]. The multi-layer neural network structure is the first perceptron layer that initially processes time and space through weighted sums and biases. Subsequently, it undergoes multiple layers of intricate decision-making until reaching the final decision layer. This decision layer generates outputs corresponding to the dynamic compartments S(x,t),I(x,t), and R(x,t). Deep neural networks seek to minimize the regression mean square error relative to weights and biases by incorporating diffusion terms ∂∂x2 into the neural network as PDE residuals through automatic differentiation during backpropagation [26] using software such as PyTorch, TensorFlow and incorporating gradient descent methods.
The finite difference method (FDM) approximates the partial derivatives in partial differential equations using differences, thereby converting a continuous problem into a discrete one. Discretizing space and time enables computing values at grid points via recursive updates.
Theorem 2.2. Consider the spatial interval x∈Ω=[0,L] divided into N equidistant grid points with step size Δx, and time discretized into steps Δt. The spatial and temporal nodes are given by
xi=iΔx,i=0,1,2,…,N,tn=nΔt,n=0,1,2,…. | (2.5) |
The time derivative is approximated using forward difference
∂S∂t≈Sn+1i−SniΔt,∂I∂t≈In+1i−IniΔt. | (2.6) |
The spatial second derivative is approximated using central difference
∂2S∂x2≈Sni+1−2Sni+Sni−1(Δx)2,∂2I∂x2≈Ini+1−2Ini+Ini−1(Δx)2. | (2.7) |
The no-flow boundary conditions ∂ηS=∂ηI=0 are discretized as
Sn1−Sn0Δx=0⇒Sn1=Sn0,In1−In0Δx=0⇒In1=In0. | (2.8) |
The initial conditions S(x,0)=S0(x) and I(x,0)=I0(x) are discretized as
S0i=S0(xi),I0i=I0(xi). | (2.9) |
Using the discretized equations, Sn+1i and In+1i can be solved iteratively (e.g., explicit or implicit methods). Explicit methods require stability conditions (e.g., CFL condition), while implicit methods are more stable but require solving linear systems.
Physics-Informed Neural Network (PINN) is a data-driven algorithm, which was initially introduced as an alternative model for approximating the solution of differential equations and determining model parameters [27]. It utilizes any neural network architecture as the primary framework, such as Fully Connected Neural Networks (FNN) and Convolutional Neural Networks (CNN). The activation function and optimization methodology applied in Physics-Informed Neural Network (PINN) are the same as those of the usual deep learning techniques, and the most intriguing part of the algorithm is the loss function, which is unlike the traditional deep learning methods. Physics-Informed Neural Network (PINN) incorporates physical the residuals of differential equations in the loss function to make the output of the neural network to approximate the system of differential equations. Automatic differentiation is employed to perform derivative calculations with respect to input coordinates and discover underlying dynamics system laws.
Ldata=N∑i=1|uθ(xi,ti)−ui|2,Lphysics=M∑i=1|∂uθ∂t−L(uθ;λ)|2. | (2.10) |
Here, uθ(xi,ti) is the neural network's prediction at the data point (xi,ti), ui is the true value at that point, and N is the total number of data points. L(u;λ) is an operator expressed by PDE, and λ is the unknown parameters. M is the number of points used to calculate PDE residuals. The total loss is the weighted sum of the data fitting loss and the physical information loss.
Ltotal =ω1Ldata+ω2Lphysics, | (2.11) |
where ω1 and ω2 are hyperparameters that govern the relative importance of data fitting loss and physical information loss in the total loss.
By minimizing the loss function Ltotal , the parameters θ and λ of the neural network are optimized simultaneously.
θ∗,λ∗=argminθ,λLtotal , | (2.12) |
where θ∗ is the optimal parameter of the neural network and λ∗ is the optimal parameter obtained by inversion. We use the Adam optimizer to optimize both θ and λ.
Residual connections [28] have become a cornerstone of modern deep neural networks. These connections address the vanishing gradient problem by facilitating the flow of gradients through the network. Figure 1 shows the residual connection adds the input of a layer directly to its output, enabling the training of much deeper architectures. This approach not only enhances gradient propagation but also improves the overall performance and convergence of the model.
Inspired by the latest developments in Physics-Informed Deep Learning and the incorporation of hidden physics models, we propose the RD-DINN framework. This framework utilizes concealed disease information to spread information about diffusive diseases. It employs a fully connected neural network with two-dimensional input and multidimensional output to represent surrogate models for the partial differential equation solutions S(x,t), I(x,t), and R(x,t). The RD-DINN loss function incorporates the encoded disease information model, serving as the computational model for infectious diseases introduced in Section 1. Of course, the embedded disease information could also be a variant of the SIR model, which we later validated with the SEIR model.
Drawing inspiration from the neural attention mechanism in computer vision tasks [29], we incorporate residual network modules in addition to fully connected layers, which ensure the preservation of the initial data structure and enhance the weight of hidden states to a significant extent while reducing errors. Figure 2 depicts the schematic diagram of the RD-DINN. We use the residual connection network for both forward and inverse problem experiments.
In Figure refRD-DINN, Res-DNN represents a deep neural network with spatial variable x and temporal variable t as inputs, providing disease dynamics and transmission parameters as outputs. 'PDE Loss' denotes the residual loss of the disease model control equation with a diffusion term, ωr, ωic, and ωbc are the weight values of equation residuals, initial and boundary condition residuals in the loss function. R(x,t) indicates that the output u(x,t) of neural network meets the dynamic behavior of PDE
R(x,t)=∂u∂t−L(u), | (2.13) |
where u is the prediction solution of neural network at {x,t}, L(u) is differential operator of PDE. I(x,t) and B(x,t) are the dynamic behavior of neural network prediction solution u(x,t), satisfying initial and boundary conditions
I(x,t)=u(x,0)−u0(x), | (2.14) |
B(x,t)=u|∂Ω−ub. | (2.15) |
'Data Loss' accounts for the residual loss between the true values corresponding to randomly sampled points in the spatiotemporal domain in the inverse problem and the network predictions. ωd represents the weight of the data residual in the loss function of the neural network. Ud(x,t)−U(x,t) represents the difference between the predicted solution and the real solution of the neural network at {x,t}. We separate the direct and inverse problems as follows:
● For the direct problem, the loss function comprises three parts: PDE residual loss, initial value loss, and boundary loss.
● For the inverse problem, the loss function consists of two parts: The loss between sampled data true values and Res-DNN outputs, and PDE residual loss.
In all benchmark experiments, the Tanh activation function is utilized due to its smooth gradient properties, which effectively address nonlinear terms (e.g., S∗I) in the equations and accurately capture complex nonlinear dynamics. Both direct and inverse problems are addressed by minimizing the loss function L using the Adam optimizer, which is employed with hyperparameters: Learning rate α=1e−03, momenta (β1,β2)=(0.9,0.999), and ϵ=1e−08. Convergence is achieved when the total loss falls below 1e−04. This configuration combines adaptive learning rates with momentum optimization to handle the non-convexity of the loss landscape effectively. The neural network fits the data and infers unobserved disease dynamics and transmission parameters by satisfying the PDE system.
Noting that parameters can generate the propagation systems of biological diseases, we restrict the parameters within a specific range near their actual values and within an acceptable margin of error. Therefore, we assume a diffusion disease model with transmission rate β and recovery rate α obtained from the literature. For the forward problem, we consider known initial conditions S(x,0),I(x,0) and Neumann boundary conditions. We aim to obtain the solutions for the unknown compartments S(x,t),I(x,t).
The next crucial step is to constrain the neural network to comply with the dispersed observed values of S and I, and the dynamical system of the control equation. It is necessary to incorporate terms corresponding to the observational data and partial differential equation to construct the loss function.
Let x∈[0,L], the density distribution of the unknown compartments S(x,t) and I(x,t) in the spatiotemporal domain satisfies the partial differential equation system
{∂S∂t−d1∂2S∂x2=B−βSI−μS,x∈[0,L],t>0,∂I∂t−d2∂2I∂x2=βSI−αI−μI,x∈[0,L],t>0,∂ηS=∂ηI=0,x=0,L,t>0,S(x,0)=S0(x),I(x,0)=I0(x),x∈[0,L], | (3.1) |
subject to the initial conditions
S(x,0)=S0(x),I(x,0)=I0(x), | (3.2) |
and Neumann boundary conditions
∂S∂x(0,t)=g(1)1(t),∂I∂x(0,t)=g(2)1(t),∂S∂x(L,t)=g(1)2(t),∂I∂x(L,t)=g(2)2(t). | (3.3) |
Here, the Neumann boundary conditions represent the migration of different population groups at the boundary. We assume that the latent solution u(x,t) can be approximated by a deep neural network uθ(x,t) with parameters θ. The parameters of the neural network uθ(x,t) can be estimated by minimizing the mean square loss function of the form
L(θ)=L(1)f(θ)+L(2)f(θ)+LSic(θ)+LIic(θ)+LSbc(θ)+LIbc(θ), | (3.4) |
where the PDE residual Lf(θ), the boundary residual Lbc(θ), and initial residual Lic(θ) can be defined as follows:
L(1)f(θ)=1NN∑i=1|∂Sθ∂t(xif,tif)−d1∂2Sθ∂x2(xif,tif)−B+βSθ(xif,tif)I(xif,tif)+μSθ(xif,tif)|2,L(2)f(θ)=1NN∑i=1|∂Iθ∂t(xif,tif)−d2∂2Iθ∂x2(xif,tif)−βSθ(xif,tif)Iθ(xif,tif)+αIθ(xif,tif)+μIθ(xif,tif)|2,LSic(θ)=1NicNic∑i=1|Sθ(xid,0)−S0(xid)|2,LIic(θ)=1NicNic∑i=1|Iθ(xid,0)−I0(xid)|2,LSbc(θ)=1NbcNbc∑i=1|∂Sθ∂x(0,tid)−g(1)1(tid)|2+|∂Sθ∂x(L,tid)−g(2)1(tid)|2,LIbc(θ)=1NbcNbc∑i=1|∂Iθ∂x(0,tid)−g(1)2(tid)|2+|∂Iθ∂x(L,tid)−g(2)2(tid)|2. |
Here, {xid,tid}Nic,bci=1 denotes the random sampled coordinates of the initial and boundary conditions, and {xif,tif}Ni=1 are a set of spatial-temporal points random sampled within the computational domain [0,L]×[0,T] for enforcing the partial differential equation. Here, we employ Latin hypercube sampling as our random sampling method to ensure full coverage of the sampling point range [30]. The interval [xmin,xmax] is divided into Nic layers, with one random point per layer selected to define initial values S(xi,t0),I(xi,t0) at {(xi,t0)}Nici=1. The interval [t0,tend] is divided into Nbc layers, with one random point per layer selected to define boundary values at {(xboundary ,tj)}Nbcj=1 for both xmin and xmax.
In the benchmark experiments, reference solutions for the partial differential equations were computed using a high-order finite difference method. For the SIR model, datasets were generated by systematically varying the parameters {β,α}∈{0.1,0.3,0.5,0.7,0.9}, while for the SEIR model, parameter sets {β,σ,α}∈{0.6,0.8,1.0,1.2,1.4} were employed. A comprehensive sensitivity analysis was conducted across these parameter configurations to rigorously evaluate the robustness of the proposed RD-DINN method under diverse conditions.
To demonstrate the reliability of our algorithm, let's focus on two specific benchmark examples. The parameters for these experiments are given in Table 1.
Parameter | Notation | Value | |
Example 1 | Example 2 | ||
Recruitment rate | B | 0.1 | 1.0 |
Natural death rate | μ | 0.1 | 0.1 |
Infection rate | β | 0.5 | 1.0 |
Recovery rate | α | 0.5 | 1.0 |
Latent rate | σ | 0.0 | 1.0 |
S-diffusion capacity | d1 | 0.001 | 0.005 |
E-diffusion capacity | d2 | 0.000 | 0.005 |
I-diffusion capacity | d3 | 0.001 | 0.005 |
R-diffusion capacity | d4 | 0.001 | 0.005 |
This benchmark experiment is for the direct problem. We start by considering the SIR model with diffusion
{∂S(x,t)∂t−d1∂2S∂x2=B−βS(x,t)I(x,t)−μS(x,t),∂I(x,t)∂t−d2∂2I∂x2=βS(x,t)I(x,t)−αI(x,t)−μI(x,t), | (3.5) |
subject to the initial conditions
S(x,0)={1.1x,0≤x<0.5,1.1(1−x),0.5≤x≤1,I(x,0)={0.5x,0≤x<0.5,0.5(1−x),0.5≤x≤1, | (3.6) |
and the Neumann boundary conditions
∂S(x,t)∂η=∂I(x,t)∂η=0,(x,t)∈[0,1]×[0,30]. | (3.7) |
Here, η is the unit normal vector. Boundary condition equal to 0 means the domain with no population migration in and out of the boundary. d1 and d2 represent the diffusion capabilities of susceptible and infected individuals. β represents the transmission rate, and α represents the recovery rate. We assume that these coefficients remain constant. Using the RD-DINN method, we predict the latent solution of the equation. The results are given by Figure 3.
We can observe that when t near zero and x near the boundary, the RD-DINN method exhibits a slightly larger deviation compared to the traditional finite difference method, especially at (x,t)∈[0,10]×0.25. It is reasonable to have a slightly larger deviation due to the abrupt change in population at the early stage of the infectious disease outbreak, as predicting the spread of the disease is challenging during the initial phase. RD-DINN produces excellent predictive solutions, with a relative L2 error of 3%.
To demonstrate the reliability and reproducibility of our proposed method, the second case selects the SEIR infectious disease model with an added incubation period. The exposed population E(x,t) refers to individuals who have been infected with the virus but have not yet developed symptoms or transmitted the virus to other individuals. Generally, researchers consider the latent period and incubation period as the same concept, with an average latent period of 1.9 days [31]. Ndolane studied the numerical method of fractional SEIR model, and discussed the potential application of fractional derivative in spatial propagation model. Fractional derivatives can be combined with diffusion term to describe spatial heterogeneity [32]. The SEIR model is described by the following equations
{∂S(x,t)∂t−d1∂2S∂x2=B−βS(x,t)I(x,t)−μS(x,t),∂E(x,t)∂t−d2∂2E∂x2=βS(x,t)I(x,t)−(μ+σ)E(x,t),∂I(x,t)∂t−d3∂2I∂x2=σE(x,t)−(α+μ)I(x,t), | (3.8) |
where the recovered population R are not involved. The initial conditions and Neumann boundary conditions are given by
S(x,0)=1−sin2(πx),I(x,0)=sin2(πx),E(x,0)=0,R(x,0)=0,x∈[0,1], | (3.9) |
∂S(x,t)∂η=∂E(x,t)∂η=∂I(x,t)∂η=∂R(x,t)∂η=0,(x,t)∈[0,1]×[0,20], | (3.10) |
where η is the unit normal vector. That Neumann boundary conditions equal to 0 indicates no migration of population at the boundary. Parameters B, μ are consistent with those in the SIR model. β represents the disease latent rate, σ represents the latent exposure rate, and α represents the disease recovery rate.
Figure 4 shows a schematic diagram of SEIR model prediction. Based on these two disease models mentioned above, we can easily observe that the RD-DINN method, leveraging only a minimal number of initial and boundary data points, along with the selective use of limited data coordinates within the domain to enforce the disease control equations, can effectively predict the unknown dynamics of infectious diseases. The initial and boundary conditions of the model are unknown at the beginning in most cases, which requires inverse problem, leveraging only a small portion of data and incompletely known control equations to find suitable model parameters for modeling.
Sensitivity analysis is an important tool for assessing the sensitivity of model outputs to changes in input parameters. Small changes in parameters such as infection rate β, latency rate σ, and recovery rate α may lead to significant differences in system behavior. ∂∂x describes spatial heterogeneity, and {β,σ,α} are the parameters to be analyzed. We perform individual parameter sensitivity analyses by assigning a specific value to each target parameter p∈{β,σ,α} and keeping the other parameters as defaults. The loss L is minimized using PINN, the MSE between the predicted and true solution on the test set is calculated and the sensitivity is quantified by analyzing the curve of MSE as a function of parameter p
MSE=1KK∑k=1(ˆQ(k)−Q(k))2, | (3.11) |
where K is the total number of data points in the test set, ˆQ(k) and Q(k) are the predicted and true observed values of the model for the kth sample, respectively.
In Figure 5, the SIR model maintains MSE consistently below 0.003 across parameter ranges, while the SEIR model maintains MSE under 0.01. This demonstrates the robustness of the RD-DINN framework under parameter variations, where its residual-driven mechanism and integration of prior knowledge mitigate sensitivity effects and enhance adaptability to parameter uncertainties.
In this section, we consider the inverse problem to identify the unknown parameters (β, α) contained in the equations and infer each compartment's unknown dynamics. Specifically, given a small amount of measured values {S(xid,tid),I(xid,tid)}Ni=1, which are dispersed and may contain noise, We wish to infer a solution S(x,t),I(x,t) in the region Ω that satisfies the diffusion disease model, and determine the unknown parameters of the model. It is worth emphasizing that for any given data point {S(xid,tid),I(xid,tid)}Ni=1, the disease information conditions that are met during training are unknown.
So what are the parameters β,α that best describe the observed data? To answer this question, we apply RD-DINN to infer the unknown parameters β,α. Importantly, no initial or boundary conditions are required, which is close to many practical applications where obtaining such information is challenging. We conduct the inverse analysis of two diffusion disease models, assuming that the spatiotemporal domain is subject to dispersed and latent noisy measurements. We explore different network architectures and varying noise levels to demonstrate the robustness and reliability of the RD-DINN in terms of accuracy. To achieve this, we employ a deep neural network to approximate the latent solutions S(x,t),I(x,t) and the unknown parameters β,α. The training parameters for both are shared, and by minimizing the composite loss below, we train a disease information neural network model that can simultaneously determine unknown parameters and unknown dynamics.
L(θ,β,α)=λf1L(1)f(θ,β,α)+λf2L(2)f(θ,β,α)+λd1L(1)d(θ)+λd2L(2)d(θ), | (3.12) |
L(1)f(θ,β,α)=1NN∑i=1|∂Sθ∂t(xid,tid)−d1∂2Sθ∂x2(xid,tid)+βθSθ(xid,tid)Iθ(xid,tid)|2,L(2)f(θ,β,α)=1NN∑i=1|∂Iθ∂t(xid,tid)−d2∂2Iθ∂x2(xid,tid)−βθSθ(xid,tid)Iθ(xid,tid)+αθIθ(xid,tid)|2,L(1)d(θ)=1NN∑i=1|Sθ(xid,tid)−S(xid,tid)|2,L(2)d(θ)=1NN∑i=1|Iθ(xid,tid)−I(xid,tid)|2. |
Here, {xid,tid}Ni=1 represents a set of spatial and temporal locations randomly sampled in [0,L]×[0,T] during each training iteration. The corresponding population values S(xid,tid),I(xid,tid) serve as training datas. We enforce these training datas to the structure dictated by the PDE control equations. The number and position of these location points are identical to the training datas.
To validate the effectiveness of the method, we randomly created a training dataset with N=3000 points based on numerical solutions corresponding to β and α. Subsequently, a 6-layer deep neural network with 50 neurons in each hidden layer was trained using this dataset. The Adam optimizer is utilized to minimize the loss function. After training completion, the network was fine-tuned to predict the potential solutions S(x,t) and I(x,t) and the unknown parameters β,α. In the following two benchmark cases, we provide an intuitive evaluation of the accuracy of the Diffusion Disease Information Neural Network. Even in the presence of 5%, 10% non-correlated noisy data interference, the network still accurately identifies the unknown parameters and dynamics of the diffusion disease model.
To further evaluate the algorithm performance, we systematically studied the total amount of training data, neural network architecture, and different noise concentrations. The main observation here is that the method exhibits strong robustness to the noise level in the data, achieving reasonable identification accuracy even with noise levels as high as 10%. This enhanced robustness seems to outperform competing methods such as Gaussian process regression and methods relying on sparse regression [33].
In the inverse problem, two benchmark cases are considered from the forward problem. The difference lies on the fact that the disease control equations, and the initial and boundary conditions are unknown. Only a tiny, scattered, and possibly noisy dataset is given for the hidden states S(x,t) and I(x,t) to infer the unknown parameters and the unknown dynamics of the equation. The form of the model control equation is given by
{∂S(x,t)∂t−d1∂2S∂x2=B−λ1S(x,t)I(x,t)−μS(x,t),∂I(x,t)∂t−d2∂2I∂x2=λ1S(x,t)I(x,t)−λ2I(x,t)−μI(x,t), | (3.13) |
define f(x,t) and g(x,t) with
f:=∂S(x,t)∂t−d1∂2S∂x2−B+λ1S(x,t)I(x,t)+μS(x,t),g:=∂I(x,t)∂t−d2∂2I∂x2−λ1S(x,t)I(x,t)+λ2I(x,t)+μI(x,t). |
Consider the epidemiological parameters (β,α)=(λ1,λ2) be unknown, while d1,d2,B, and μ are known. The solutions S(x,t),I(x,t) are predicted through the RD-DINN method, and λ1 and λ2 are shared parameters with the neural network during training. The predicted solutions, reference solutions, and the pointwise absolute error between them are shown in Figure 6. The RD-DINN method accurately learns the latent solutions S(x,t),I(x,t) of the unknown disease system. The L2 error for S is 5.74e−04, and I is 7.89e−03.
Figure 6 illustrates the dynamics of the DSIR model across the spatiotemporal domain. The basic regeneration number R0=βBμ(μ+α)≈0.83<1 under this set of parameters means that the average number of people infected per infected person is less than 1. The spread of the disease will tend to decline. As infected individuals cannot sustain the transmission of the infection, the disease eventually fades away in the population. It can be observed that starting from the disease outbreak, the number of infected individuals I(x,t) gradually approaches 0, signifying the disappearance of the disease by t=10.
When employing the RD-DINN method for predicting disease dynamics, we simultaneously utilize a shared-weight deep neural network fθ(x,t),gθ(x,t) to invert the model's two key parameters: The propagation rate β and the recovery rate α. The relative error is calculated as (3.14).
RelativeError=‖λpred −λtrue ‖‖λtrue ‖×100%. | (3.14) |
In Figure 7, we can see that the two parameters fit well, and because of the initialization of the parameters of the neural network, β and α start from zero, and as we keep iterating, the parameters are approaching the actual value. Multiple experiments revealed parameter convergence to true values at 20,000 iterations.
We also conduct a systematic study by maintaining the architecture of the neural network unchanged while varying the noise level and the number of sampled data points to demonstrate the excellent robustness of the algorithm. Regarding the noise level in the data, the method is highly robust. We set 0%, 5%, and 10% noise to data points at 1000, 2000, and 3000, and the MSE of the key parameters of the equation increases with the noise concentration, demonstrating that even if the noise pollution is as high as 10%, the corresponding data points can achieve a reasonable recognition accuracy.
We can observe in Table 2 that none of the L2 errors of RD-DINN in parameter identification exceeds 1%, with 10 repeated trials demonstrating consistent error fluctuations within 0.5% and confirming robustness to neural architecture variations. In Table 3, for adding different concentrations of noise data, none of the L2 errors in parameter identification exceeds 2%, as validated through 10 replications with a maximum error deviation of 0.5%, confirming the method's noise resilience.
Layers | % error β | % error α | ||||||
40% | 60% | 80% | 100% | 40% | 60% | 80% | 100% | |
6 | 0.177 | 0.400 | 0.267 | 0.336 | 0.433 | 0.772 | 0.629 | 0.710 |
8 | 0.224 | 0.230 | 0.329 | 0.424 | 0.503 | 0.600 | 0.563 | 0.841 |
10 | 0.045 | 0.291 | 0.258 | 0.618 | 0.106 | 0.617 | 0.433 | 0.981 |
12 | 0.127 | 0.247 | 0.224 | 0.273 | 0.171 | 0.405 | 0.371 | 0.509 |
Nd | % error β | % error α | ||||
0% | 5% | 10% | 0% | 5% | 10% | |
1000 | 0.172 | 0.599 | 0.383 | 0.360 | 0.669 | 0.795 |
2000 | 0.259 | 0.465 | 0.393 | 0.535 | 0.853 | 1.089 |
3000 | 0.463 | 0.560 | 0.012 | 1.035 | 0.962 | 0.103 |
In Table 4, it is evident that the residual neural network has a remarkable ability to reduce prediction errors. Specifically, when the number of network layers rises, the inversion error of the architectural parameters of the residual neural network shows a marked decrease. This indicates that the residual neural network is highly effective in improving prediction accuracy and maintaining stability in the face of increased network complexity.
Network Layer | Predict Variable | Network Type | |
FCNN | ResNet | ||
7 | S | 2.34e−03 | 1.95e−03 |
I | 2.53e−02 | 2.40e−02 | |
β | 2.240% | 2.098% | |
α | 1.619% | 1.506% | |
9 | S | 1.94e−03 | 1.45e−03 |
I | 2.00e−02 | 1.91e−02 | |
β | 0.604% | 0.299% | |
α | 1.230% | 0.321% |
In this inverse problem, we employ the SEIR model from the second forward problem, where the parameters of the model β,σ,α and the dynamics are unknown. Only a small and scattered set of observable data with potential noises are provided for the hidden states S(x,t),E(x,t),I(x,t), and R(x,t). Unlike the first inverse problem case, we introduce an inversion for an additional parameter σ, which implies that we have less knowledge about the governing laws of the disease information presented by the model. Let
f(t,x):=∂S(x,t)∂t−d1∂2S∂x2−B+λ1S(x,t)I(x,t)+μS(x,t),g(t,x):=∂E(x,t)∂t−d2∂2E∂x2−λ1S(x,t)I(x,t)+(μ+λ2)E(x,t),h(t,x):=∂I(x,t)∂t−d3∂2I∂x2−λ2E(x,t)+(λ3+μ)I(x,t). |
Here, consistent with the inverse problem of the SIR model, we denote f(t,x),g(t,x),h(t,x) as surrogate models for the three equations. The disease dynamics parameters (β,σ,α)=(λ1,λ2,λ3) are all unknown and predicted through a deep neural network for the unknown dynamics of the compartments S(x,t),E(x,t), and I(x,t). During training, λ1,λ2, and λ3 are shared parameters with the neural network.
For the SEIR model, we give the representation (3.15) for the basic regeneration number, which is similar to the epidemic models thoroughly described by ordinary differential equations.
R0=Bβσ(μ+σ)(μ+α)μ≈8.26 | (3.15) |
Figure 8 shows the constant values observed at t=20 in the entire domain, representing their respective steady states. We compute the basic regeneration number R0≥1 by bringing in the parameters β,α, a state in which the populations in each partition have reached a stable equilibrium, signaling the persistence of the disease.
In Figure 9, the three-parameter fits are the same as the actual values, and the neural network parameters are initialized with a value of 1. At the beginning of the iteration, the neural network is randomly fitted, and the values of the fitted parameters are in a decreasing trend. The parameters are gradually approaching the actual values as we keep on iterating and they are the same as the actual parameters at the 50,000 epoch.
Table 5 demonstrates that RD-DINN achieves nearly zero L2 error in parameter identification across 10 experimental replications, confirming its accuracy under the basic neural network architecture. As shown in Table 6, while the parameter identification error increases with higher noise concentrations, all errors remain below 1% in 10 replicate experiments, demonstrating consistent performance under noisy conditions.
Layers | % error β | % error σ | % error α | ||||||
60% | 80% | 100% | 60% | 80% | 100% | 60% | 80% | 100% | |
6 | 0.003 | 0.004 | 0.001 | 0.014 | 0.001 | 0.005 | 0.013 | 0.009 | 0.001 |
8 | 0.004 | 0.004 | 0.002 | 0.016 | 0.008 | 0.006 | 0.006 | 0.002 | 0.003 |
10 | 0.002 | 0.001 | 0.002 | 0.014 | 0.004 | 0.009 | 0.010 | 0.002 | 0.008 |
12 | 0.002 | 0.001 | 0.004 | 0.003 | 0.005 | 0.010 | 0.003 | 0.000 | 0.008 |
Nd | % error β | % error σ | % error α | ||||||
0% | 5% | 10% | 0% | 5% | 10% | 0% | 5% | 10% | |
1000 | 0.003 | 0.352 | 0.526 | 0.003 | 0.092 | 0.152 | 0.002 | 0.432 | 0.369 |
2000 | 0.002 | 0.153 | 0.222 | 0.002 | 0.024 | 0.151 | 0.002 | 0.140 | 0.227 |
3000 | 0.001 | 0.004 | 0.087 | 0.001 | 0.037 | 0.078 | 0.001 | 0.037 | 0.049 |
Model predictions exhibit biases during rapid compartment population shifts, particularly in early outbreaks or unforeseen events, reflecting inherent disease unpredictability and estimation challenges.
Figure 10 shows that in the SIR inverse problem for α=0.5, the error rises significantly with increasing β, indicating that the model is sensitive to high propagation rates. β=0.5, the error decreases significantly with increasing α, indicating that the model is sensitive to low recovery rates. At high β, disease outbreak dynamics dominate and model predictions are more stable. The SEIR model inversion errors reached an overall order of magnitude of 1e−07, indicating a low sensitivity to different parameters. Regarding inter-parameter coupling effects (e.g., simultaneous changes in β and α), this can be extended to multi-dimensional parameter scans in the future.
To validate the model in a public health outbreak, we use data on new crown infections in five cities in mid-Michigan (Allegan, Barry, Eaton, Ingham, and Livings-Ton) in April 2020. We view the properties of a consistent and contiguous spatial distribution of city latitudes as a condition on the spatial x in Figure 11, and we determine whether an infected person is a member of the public through daily nucleic acid tests reported to the public to determine whether an individual is infected. We use population density to represent the current status of each category.
We construct the space of disease for the four cities by performing spatial penalization, considering only transmission at latitude (a one-dimensional spatial problem only). Due to the high degree of dispersion in the data, some degree of data smoothing was required; we compare the impact of the smoothing methods on the data, applying a moving average filter using 3, 4, and 5-day windows, and choosing to apply the 4-day window filter twice; smoothing had little impact on the trend of the data and did not negatively affect the data.
We can see in Figure 12 that the city of Livings-Ton shows an increasing trend in the density of susceptible people and a gradual decreasing trend in the density of infected people in April. Since we do not know this city's birth and death rates, the classical PDE-SIR model with a diffusion term is used to validate this set of accurate data.
By viewing the five cities with the same latitude and close to each other as a one-dimensional spatial distribution, we can see in Figure 13 that RD-DINN can reach an error of 5e−03 in predicting the spatial distribution. The infection rate is β of 0.037 with a float of 10% through the RD-DINN inversion parameter. The recovery rate is 0.070 with a float of 8%. By calculating the mostly regeneration number R0=βα<1, it can be concluded that the disease receded briefly in April, which is basically in line with the actual data we selected for the state of Michigan. In Figure 14, we verify the model's extrapolation ability, using the first 20 days of real data to train a pre-training model to reason about the growth of various types of population densities in the latter 10 days. It is not difficult to see that the inference stage and the pre-training stage of the RMSE basically maintains the same order of magnitude, which indicates that the model has the ability to reason about the future state of the disease. Here, the inference phase error has some growth because the model also has some empirical error.
In this paper, we introduce a data-driven deep learning approach based on PINN, which is RD-DINN. RD-DINN is a deep neural network method capable of learning various diseases and how they spread, predicting disease progression, and identifying parameters of disease spread. In the forward problem, where the control equation, initial conditions, and boundary conditions are known, they are incorporated into the neural network loss function. The proposed method achieves an absolute error with finite difference methods of 1e−03. In the inverse problem, where the inversion of model parameters is crucial, and due to the lack of exact solutions for partial differential equations, we extract a random subset of sampling points from numerical solutions as mini-batch data to minimize the equation and mini-batch data losses. The absolute error between the predicted results and numerical solutions reach 1e−03. Although RD-DINN demonstrates effective predictions in the forward problem, there is still some error compared to high-order finite differences. We are actively working on more advanced methods to encode disease information into deep neural networks in a more advanced way to improve the errors that occur in positive problems. The inverse problem for time-dependent parameters (e.g., a time-varying transmission rate due to interventions or seasonal effects) is also challenging.
This is the first application of the DINN method to diffusion-based infectious disease models, and the results from two sets of benchmark numerical experiments are promising. Traditional methods such as finite difference and finite element have proven effective in deducing model dynamics. However, RD-DINN not only demonstrates satisfactory performance in predicting model dynamics in the forward problem, but also excels in using mini-batch data for the inverse problem, where it successfully inverts the dynamics and unknown parameters of the model. It is essential to acknowledge that, given the lack of exact solutions for the discussed partial differential equations, the mini-batch data are sampled from the numerical solutions obtained through high-order finite differences.
In the two benchmark disease experiments, we investigate the forward problem of predicting the dynamics of diffusion diseases and the inverse problem of estimating unknown parameters. We also assess the robustness of the RD-DINN method under different neural network architectures and noise levels. We validate the model's effectiveness under new crown data from five cities in Michigan. In conclusion, our proposed Reaction-Diffusion Disease Information Neural Network shows promising prospects as a robust and reliable candidate for learning the dynamics and unknown parameters of compartmental models in infectious diseases, particularly in the context of inverse problems.
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
The authors are grateful to the National Natural Science Foundation of China (12471505, 12101576).
The authors declare there are no conflicts of interest.
Figure A1 illustrates the training process and convergence of neural networks in both direct and inverse problems. In the direct problem, the neural network solves for the system state variables using known partial differential equations, initial conditions, and boundary conditions. The total loss and its components, such as the initial condition loss, boundary condition loss, and function loss, gradually decrease as the number of training epochs increases. This trend indicates that the neural network progressively approximates the true solution. In the inverse problem, the neural network infers the governing equations, initial conditions, or boundary conditions from observed data. Similarly, the total loss and its components decrease with increasing training epochs, demonstrating that the neural network not only fits the observed data but also accurately infers unknown system parameters or dynamic behaviors. These results validate the effectiveness of neural networks in both direct and inverse problems, highlighting their powerful capabilities in modeling complex systems and performing parameter inversion. The total loss in a direct problem consists of three components
LTotal=LIC+LBC+LPDE, |
where LIC is mse for initial conditions, LBC is mse for boundary conditions, and LPDE is mse for PDE residuals. The total loss in a inverse problem consists of two components
LTotal=LData+LPDE, |
where LData is mse for truth and predict data, LPDE mse for PDE residuals.
[1] |
W. O. Kermack, A. G. McKendrick, A contribution to the mathematical theory of epidemics, Proc. R. Soc. Lond. A, 115 (1927), 700–721. http://doi.org/10.1098/rspa.1927.0118 doi: 10.1098/rspa.1927.0118
![]() |
[2] |
T. Sigler, S. Mahmuda, A. Kimpton, J. Loginova, P. Wohland, E. Charles-Edwards, et al., The socio-spatial determinants of COVID-19 diffusion: The impact of globalisation, settlement characteristics and population, Global. Health, 17 (2021), 56. https://doi.org/10.1186/s12992-021-00707-2 doi: 10.1186/s12992-021-00707-2
![]() |
[3] |
N. Sene, SIR epidemic model with Mittag–Leffler fractional derivative, Chaos, Solitons Fractals, 137 (2020), 109833. https://doi.org/10.1016/j.chaos.2020.109833 doi: 10.1016/j.chaos.2020.109833
![]() |
[4] |
J. A. Backer, A. Berto, C. McCreary, F. Martelli, W. H. M. van der Poel, Transmission dynamics of hepatitis E virus in pigs: Estimation from field data and effect of vaccination, Epidemics, 4 (2012), 86–92. https://doi.org/10.1016/j.epidem.2012.02.002 doi: 10.1016/j.epidem.2012.02.002
![]() |
[5] | R. Anderson, R. May, Infectious Diseases of Humans: Dynamics and Control, Oxford Academic, 1991. https://doi.org/10.1093/oso/9780198545996.001.0001 |
[6] | A. Nauman, M. Rafiq, M. A. Rehman, M. Ali, M. O. Ahmad, Numerical modeling of SEIR measles dynamics with diffusion, Commun. Math. Appl., 9 (2018), 315–326. |
[7] |
T. N. Sindhu, A. Shafiq, Z. Huassian, Generalized exponentiated unit Gompertz distribution for modeling arthritic pain relief times data: Classical approach to statistical inference, J. Biopharm. Stat., 34 (2024), 323–348. https://doi.org/10.1080/10543406.2023.2210681 doi: 10.1080/10543406.2023.2210681
![]() |
[8] |
T. N. Sindhu, A. Shafiq, M. B. Riaz, T. A. Abushal, H. Ahmad, E. M. Almetwally, et al., Introducing the new arcsine-generator distribution family: An in-depth exploration with an illustrative example of the inverse weibull distribution for analyzing healthcare industry data, J. Radiat. Res. Appl. Sci., 17 (2024), 100879. https://doi.org/10.1016/j.jrras.2024.100879 doi: 10.1016/j.jrras.2024.100879
![]() |
[9] |
T. N. Sindhu, A. Shafiq, Q. M. Al-Mdallal, Exponentiated transformation of Gumbel Type-Ⅱ distribution for modeling COVID-19 data, Alexandria Eng. J., 60 (2021), 671–689. https://doi.org/10.1016/j.aej.2020.09.060 doi: 10.1016/j.aej.2020.09.060
![]() |
[10] |
A. Shafiq, A. B. Çolak, T. N. Sindhu, S. A. Lone, A. Alsubie, F. Jarad, Comparative study of artificial neural network versus parametric method in COVID-19 data analysis, Results Phys., 38 (2022), 105613. https://doi.org/10.1016/j.rinp.2022.105613 doi: 10.1016/j.rinp.2022.105613
![]() |
[11] |
A. Shafiq, A. B. Çolak, T. N. Sindhu, S. A. Lone, T. A. Abushal, Modeling and survival exploration of breast carcinoma: A statistical, maximum likelihood estimation, and artificial neural network perspective, Artif. Intell. Life Sci., 4 (2023), 100082. https://doi.org/10.1016/j.ailsci.2023.100082 doi: 10.1016/j.ailsci.2023.100082
![]() |
[12] |
G. E. Karniadakis, I. G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, L. Yang, Physics-informed machine learning, Nat. Rev. Phys., 3 (2021), 422–440. https://doi.org/10.1038/s42254-021-00314-5 doi: 10.1038/s42254-021-00314-5
![]() |
[13] |
M. Raissi, P. Perdikaris, G. E. Karniadakis, Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, J. Comput. Phys., 378 (2019), 686–707. https://doi.org/10.1016/j.jcp.2018.10.045 doi: 10.1016/j.jcp.2018.10.045
![]() |
[14] |
S. Cai, Z. Mao, Z. Wang, M. Yin, G. E. Karniadakis, Physics-informed neural networks (PINNs) for fluid mechanics: A review, Acta Mech. Sin., 37 (2021), 1727–1738. https://doi.org/10.1007/s10409-021-01148-1 doi: 10.1007/s10409-021-01148-1
![]() |
[15] |
A. Yazdani, L. Lu, M. Raissi, G. E. Karniadakis, Systems biology informed deep learning for inferring parameters and hidden dynamics, PLOS Comput. Biol., 16 (2020), e1007575. https://doi.org/10.1371/journal.pcbi.1007575 doi: 10.1371/journal.pcbi.1007575
![]() |
[16] | M. Raissi, N. Ramezani, P. N. Seshaiyer, On parameter estimation approaches for predicting disease transmission through optimization, deep learning and statistical inference methods, Lett. Biomath., 6 (2019), 1–26. |
[17] | S. Shaier, M. Raissi, P. Seshaiyer, Data-driven approaches for predicting spread of infectious diseases through DINNs: Disease informed neural networks, Lett. Biomath., 9 (2022), 71–105. |
[18] |
S. Yin, J. Wu, P. Song, Optimal control by deep learning techniques and its applications on epidemic models, J. Math. Biol., 86 (2023), 36. https://doi.org/10.1007/s00285-023-01873-0 doi: 10.1007/s00285-023-01873-0
![]() |
[19] |
F. V. Difonzo, L. Lopez, S. F. Pellegrino, Physics informed neural networks for learning the horizon size in bond-based peridynamic models, Comput. Methods Appl. Mech. Eng., 436 (2025), 117727. https://doi.org/10.1016/j.cma.2024.117727 doi: 10.1016/j.cma.2024.117727
![]() |
[20] | F. V. Difonzo, L. Lopez, S. F. Pellegrino, Physics informed neural networks for an inverse problem in peridynamic models, Eng. Comput., 2024 (2024). https://doi.org/10.1007/s00366-024-01957-5 |
[21] |
D. Ouedraogo, I. Ibrango, A. Guiro, Global stability for reaction-diffusion SIR model with general incidence function, Malaya J. Mat., 10 (2022), 139–150. https://doi.org/10.26637/mjm1002/004 doi: 10.26637/mjm1002/004
![]() |
[22] |
J. P. C. Santos, E. Monteiro, J. C. Ferreira, N. H. T. Lemes, D. S. Rodrigues, Well-posedness and qualitative analysis of a SEIR model with spatial diffusion for COVID-19 spreading, Biomath, 12 (2023), 2307207. https://doi.org/10.55630/j.biomath.2023.07.207 doi: 10.55630/j.biomath.2023.07.207
![]() |
[23] |
S. Chinviriyasit, W. Chinviriyasit, Numerical modelling of an SIR epidemic model with diffusion, Appl. Math. Comput., 216 (2010), 395–409. https://doi.org/10.1016/j.amc.2010.01.028 doi: 10.1016/j.amc.2010.01.028
![]() |
[24] |
P. van den Driessche, J. Watmough, Reproduction numbers and sub-threshold endemic equilibria for compartmental models of disease transmission, Math. Biosci., 180 (2002), 29–48. https://doi.org/10.1016/S0025-5564(02)00108-6 doi: 10.1016/S0025-5564(02)00108-6
![]() |
[25] | Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature, 521 (2015), 436–444. https://doi.org/10.1038/nature14539 |
[26] | A. G. Baydin, B. A. Pearlmutter, A. A. Radul, J. M. Siskind, Automatic differentiation in machine learning: A survey, J. Mach. Learn. Res., 18 (2018), 1–43. |
[27] |
M. Raissi, G. E. Karniadakis, Hidden physics models: Machine learning of nonlinear partial differential equations, J. Comput. Phys., 357 (2018), 125–141. https://doi.org/10.1016/j.jcp.2017.11.039 doi: 10.1016/j.jcp.2017.11.039
![]() |
[28] | K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2016), 770–778. https://doi.org/10.1109/CVPR.2016.90 |
[29] | A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, et al., Attention is all you need, preprint, arXiv: 1706.03762. |
[30] |
M. D. McKay, R. J. Beckman, W. J. Conover, A comparison of three methods for selecting values of input variables in the analysis of output from a computer code, Technometrics, 21 (1979), 239–245. https://doi.org/10.2307/1268522 doi: 10.2307/1268522
![]() |
[31] |
E. Massad, M. N. Burattini, F. B. A. Coutinho, L. F. Lopez, The 1918 influenza A epidemic in the city of São Paulo Brazil, Med. Hypotheses, 68 (2007), 442–445. https://doi.org/10.1016/j.mehy.2006.07.041 doi: 10.1016/j.mehy.2006.07.041
![]() |
[32] |
N. Sene, 2-Numerical methods applied to a class of SEIR epidemic models described by the Caputo derivative, Methods Math. Model., 2022 (2022), 23–40. https://doi.org/10.1016/B978-0-323-99888-8.00003-6 doi: 10.1016/B978-0-323-99888-8.00003-6
![]() |
[33] |
S. L. Brunton, J. L. Proctor, J. N. Kutz, Discovering governing equations from data by sparse identification of nonlinear dynamical systems, Proc. Natl. Acad. Sci. U.S.A., 113 (2016), 3932–3937. https://doi.org/10.1073/pnas.1517384113 doi: 10.1073/pnas.1517384113
![]() |
Parameter | Notation | Value | |
Example 1 | Example 2 | ||
Recruitment rate | B | 0.1 | 1.0 |
Natural death rate | μ | 0.1 | 0.1 |
Infection rate | β | 0.5 | 1.0 |
Recovery rate | α | 0.5 | 1.0 |
Latent rate | σ | 0.0 | 1.0 |
S-diffusion capacity | d1 | 0.001 | 0.005 |
E-diffusion capacity | d2 | 0.000 | 0.005 |
I-diffusion capacity | d3 | 0.001 | 0.005 |
R-diffusion capacity | d4 | 0.001 | 0.005 |
Layers | % error β | % error α | ||||||
40% | 60% | 80% | 100% | 40% | 60% | 80% | 100% | |
6 | 0.177 | 0.400 | 0.267 | 0.336 | 0.433 | 0.772 | 0.629 | 0.710 |
8 | 0.224 | 0.230 | 0.329 | 0.424 | 0.503 | 0.600 | 0.563 | 0.841 |
10 | 0.045 | 0.291 | 0.258 | 0.618 | 0.106 | 0.617 | 0.433 | 0.981 |
12 | 0.127 | 0.247 | 0.224 | 0.273 | 0.171 | 0.405 | 0.371 | 0.509 |
Nd | % error β | % error α | ||||
0% | 5% | 10% | 0% | 5% | 10% | |
1000 | 0.172 | 0.599 | 0.383 | 0.360 | 0.669 | 0.795 |
2000 | 0.259 | 0.465 | 0.393 | 0.535 | 0.853 | 1.089 |
3000 | 0.463 | 0.560 | 0.012 | 1.035 | 0.962 | 0.103 |
Network Layer | Predict Variable | Network Type | |
FCNN | ResNet | ||
7 | S | 2.34e−03 | 1.95e−03 |
I | 2.53e−02 | 2.40e−02 | |
β | 2.240% | 2.098% | |
α | 1.619% | 1.506% | |
9 | S | 1.94e−03 | 1.45e−03 |
I | 2.00e−02 | 1.91e−02 | |
β | 0.604% | 0.299% | |
α | 1.230% | 0.321% |
Layers | % error β | % error σ | % error α | ||||||
60% | 80% | 100% | 60% | 80% | 100% | 60% | 80% | 100% | |
6 | 0.003 | 0.004 | 0.001 | 0.014 | 0.001 | 0.005 | 0.013 | 0.009 | 0.001 |
8 | 0.004 | 0.004 | 0.002 | 0.016 | 0.008 | 0.006 | 0.006 | 0.002 | 0.003 |
10 | 0.002 | 0.001 | 0.002 | 0.014 | 0.004 | 0.009 | 0.010 | 0.002 | 0.008 |
12 | 0.002 | 0.001 | 0.004 | 0.003 | 0.005 | 0.010 | 0.003 | 0.000 | 0.008 |
Nd | % error β | % error σ | % error α | ||||||
0% | 5% | 10% | 0% | 5% | 10% | 0% | 5% | 10% | |
1000 | 0.003 | 0.352 | 0.526 | 0.003 | 0.092 | 0.152 | 0.002 | 0.432 | 0.369 |
2000 | 0.002 | 0.153 | 0.222 | 0.002 | 0.024 | 0.151 | 0.002 | 0.140 | 0.227 |
3000 | 0.001 | 0.004 | 0.087 | 0.001 | 0.037 | 0.078 | 0.001 | 0.037 | 0.049 |
Parameter | Notation | Value | |
Example 1 | Example 2 | ||
Recruitment rate | B | 0.1 | 1.0 |
Natural death rate | μ | 0.1 | 0.1 |
Infection rate | β | 0.5 | 1.0 |
Recovery rate | α | 0.5 | 1.0 |
Latent rate | σ | 0.0 | 1.0 |
S-diffusion capacity | d1 | 0.001 | 0.005 |
E-diffusion capacity | d2 | 0.000 | 0.005 |
I-diffusion capacity | d3 | 0.001 | 0.005 |
R-diffusion capacity | d4 | 0.001 | 0.005 |
Layers | % error β | % error α | ||||||
40% | 60% | 80% | 100% | 40% | 60% | 80% | 100% | |
6 | 0.177 | 0.400 | 0.267 | 0.336 | 0.433 | 0.772 | 0.629 | 0.710 |
8 | 0.224 | 0.230 | 0.329 | 0.424 | 0.503 | 0.600 | 0.563 | 0.841 |
10 | 0.045 | 0.291 | 0.258 | 0.618 | 0.106 | 0.617 | 0.433 | 0.981 |
12 | 0.127 | 0.247 | 0.224 | 0.273 | 0.171 | 0.405 | 0.371 | 0.509 |
Nd | % error β | % error α | ||||
0% | 5% | 10% | 0% | 5% | 10% | |
1000 | 0.172 | 0.599 | 0.383 | 0.360 | 0.669 | 0.795 |
2000 | 0.259 | 0.465 | 0.393 | 0.535 | 0.853 | 1.089 |
3000 | 0.463 | 0.560 | 0.012 | 1.035 | 0.962 | 0.103 |
Network Layer | Predict Variable | Network Type | |
FCNN | ResNet | ||
7 | S | 2.34e−03 | 1.95e−03 |
I | 2.53e−02 | 2.40e−02 | |
β | 2.240% | 2.098% | |
α | 1.619% | 1.506% | |
9 | S | 1.94e−03 | 1.45e−03 |
I | 2.00e−02 | 1.91e−02 | |
β | 0.604% | 0.299% | |
α | 1.230% | 0.321% |
Layers | % error β | % error σ | % error α | ||||||
60% | 80% | 100% | 60% | 80% | 100% | 60% | 80% | 100% | |
6 | 0.003 | 0.004 | 0.001 | 0.014 | 0.001 | 0.005 | 0.013 | 0.009 | 0.001 |
8 | 0.004 | 0.004 | 0.002 | 0.016 | 0.008 | 0.006 | 0.006 | 0.002 | 0.003 |
10 | 0.002 | 0.001 | 0.002 | 0.014 | 0.004 | 0.009 | 0.010 | 0.002 | 0.008 |
12 | 0.002 | 0.001 | 0.004 | 0.003 | 0.005 | 0.010 | 0.003 | 0.000 | 0.008 |
Nd | % error β | % error σ | % error α | ||||||
0% | 5% | 10% | 0% | 5% | 10% | 0% | 5% | 10% | |
1000 | 0.003 | 0.352 | 0.526 | 0.003 | 0.092 | 0.152 | 0.002 | 0.432 | 0.369 |
2000 | 0.002 | 0.153 | 0.222 | 0.002 | 0.024 | 0.151 | 0.002 | 0.140 | 0.227 |
3000 | 0.001 | 0.004 | 0.087 | 0.001 | 0.037 | 0.078 | 0.001 | 0.037 | 0.049 |