A new hybrid approach based on genetic algorithm and support vector machine methods for hyperparameter optimization in synthetic minority over-sampling technique (SMOTE)

Pelin Akın; Pelin Akın

doi:10.3934/math.2023473

AIMS Mathematics

2023, Volume 8, Issue 4: 9400-9415. doi: 10.3934/math.2023473

Previous Article Next Article

Research article

A new hybrid approach based on genetic algorithm and support vector machine methods for hyperparameter optimization in synthetic minority over-sampling technique (SMOTE)

Pelin Akın ^,

Faculty of Science, Department of Statistics, Cankiri Karatekin University, Cankiri, Turkey

Received: 10 October 2022 Revised: 30 January 2023 Accepted: 30 January 2023 Published: 15 February 2023
MSC : 68Txx, 68w01, 68w40, 68wxx, 82-08

The crucial problem when applying classification algorithms is unequal classes. An imbalanced dataset problem means, particularly in a two-class dataset, that the group variable of one class is comparatively more dominant than the group variable of the other class. The issue stems from the fact that the majority class dominates the minority class. The synthetic minority over-sampling technique (SMOTE) has been developed to deal with the classification of imbalanced datasets. SMOTE algorithm increases the number of samples by interpolating between the clustered minority samples. The SMOTE algorithm has three critical parameters, "k", "perc.over", and "perc.under". "perc.over" and "perc.under" hyperparameters allow determining the minority and majority class ratios. The "k" parameter is the number of nearest neighbors used to create new minority class instances. Finding the best parameter value in the SMOTE algorithm is complicated. A hybridized version of genetic algorithm (GA) and support vector machine (SVM) approaches was suggested to address this issue for selecting SMOTE algorithm parameters. Three scenarios were created. Scenario 1 shows the evaluation of support vector machine SVM) results without using the SMOTE algorithm. Scenario 2 shows that the SVM was used after applying SMOTE algorithm without the GA algorithm. In the third scenario, the results were analyzed using the SVM algorithm after selecting the SMOTE algorithm's optimization method. This study used two imbalanced datasets, drug use and simulation data. After, the results were compared with model performance metrics. When the model performance metrics results are examined, the results of the third scenario reach the highest performance. As a result of this study, it has been shown that a genetic algorithm can optimize class ratios and k hyperparameters to improve the performance of the SMOTE algorithm.

Keywords:

Citation: Pelin Akın. A new hybrid approach based on genetic algorithm and support vector machine methods for hyperparameter optimization in synthetic minority over-sampling technique (SMOTE)[J]. AIMS Mathematics, 2023, 8(4): 9400-9415. doi: 10.3934/math.2023473

Related Papers:

[1]	Minseok Kim, Yeongjong Kim, Yeoneung Kim . Physics-informed neural networks for optimal vaccination plan in SIR epidemic models. Mathematical Biosciences and Engineering, 2025, 22(7): 1598-1633. doi: 10.3934/mbe.2025059
[2]	Meiyuan Du, Chi Zhang, Sheng Xie, Fang Pu, Da Zhang, Deyu Li . Investigation on aortic hemodynamics based on physics-informed neural network. Mathematical Biosciences and Engineering, 2023, 20(7): 11545-11567. doi: 10.3934/mbe.2023512
[3]	Yoon-gu Hwang, Hee-Dae Kwon, Jeehyun Lee . Feedback control problem of an SIR epidemic model based on the Hamilton-Jacobi-Bellman equation. Mathematical Biosciences and Engineering, 2020, 17(3): 2284-2301. doi: 10.3934/mbe.2020121
[4]	Shixuan Yao, Xiaochen Liu, Yinghui Zhang, Ze Cui . An approach to solving optimal control problems of nonlinear systems by introducing detail-reward mechanism in deep reinforcement learning. Mathematical Biosciences and Engineering, 2022, 19(9): 9258-9290. doi: 10.3934/mbe.2022430
[5]	Reymundo Itzá Balam, Francisco Hernandez-Lopez, Joel Trejo-Sánchez, Miguel Uh Zapata . An immersed boundary neural network for solving elliptic equations with singular forces on arbitrary domains. Mathematical Biosciences and Engineering, 2021, 18(1): 22-56. doi: 10.3934/mbe.2021002
[6]	Huiqing Wang, Sen Zhao, Jing Zhao, Zhipeng Feng . A model for predicting drug-disease associations based on dense convolutional attention network. Mathematical Biosciences and Engineering, 2021, 18(6): 7419-7439. doi: 10.3934/mbe.2021367
[7]	Sung Woong Cho, Sunwoo Hwang, Hyung Ju Hwang . The monotone traveling wave solution of a bistable three-species competition system via unconstrained neural networks. Mathematical Biosciences and Engineering, 2023, 20(4): 7154-7170. doi: 10.3934/mbe.2023309
[8]	Jing Cao, Dong Zhao, Chenlei Tian, Ting Jin, Fei Song . Adopting improved Adam optimizer to train dendritic neuron model for water quality prediction. Mathematical Biosciences and Engineering, 2023, 20(5): 9489-9510. doi: 10.3934/mbe.2023417
[9]	Han Ma, Qimin Zhang . Threshold dynamics and optimal control on an age-structured SIRS epidemic model with vaccination. Mathematical Biosciences and Engineering, 2021, 18(6): 9474-9495. doi: 10.3934/mbe.2021465
[10]	Dan Shi, Shuo Ma, Qimin Zhang . Sliding mode dynamics and optimal control for HIV model. Mathematical Biosciences and Engineering, 2023, 20(4): 7273-7297. doi: 10.3934/mbe.2023315

Abstract

1. Introduction

With the incredible success of neural networks in the field of machine learning tasks, including image recognition, computer vision, natural speech processing, and cognitive science as well as the prospect of harnessing the great computing power of specialized hardware, there has been much interest in investigating their suitability also for high-performance computing tasks. The result is now an exciting new research field known as scientific machine learning, where techniques such as deep neural networks and statistical learning are applied to classical problems of applied mathematics. Thanks to the general approximation property of the neural network, it is natural to consider using the neural network to obtain the approximate solution for governing PDEs. As the predominant method nowadays for data-driven problems, deep neural networks (DNN) are used as surrogate models of PDE solvers to accelerate optimization. DNN is generally adopted as training a supervised machine learning task to establish the nonlinear mapping from input to output data pairs ^{[1,2,3,4,5,6]}, that is, learning a specific model from training data and the algorithm defined in advance structure, its quality is closely related to training data or distribution. Such models have yielded remarkable success in data-rich domains, yet in many fields of physics and engineering, the training data is often implied some prior knowledge, such as the flow field data of fluid mechanics problems need to satisfy the conservation of mass and momentum, and this part of prior knowledge is not utilized in the classic machine learning algorithms.

Physics-informed neural networks (PINNs) ^[6,7], which combine data-driven machine learning and the advantages of physical models, can train models that automatically satisfy physical constraints with a small amount of training data. The PINN algorithm has better generalization performance and can predict the important physical parameters of the model. The PINN can accurately solve the forward problems by minimizing the mean squared error loss function, such that one can get the numerical solutions of the PDEs by soving optimization problem which requires that the loss is close to zero. The loss function of PINN contains initial and boundary loss term as well as the residual from the governing equation given by the physics-informed part. It should be noted that different boundary condition setting methods used in PINN training have a significant impact on the training results. Usually, the boundary loss term is set by soft boundary in the loss function, and the weight of boundary loss term is controlled by penalty coefficient to accelerate the convergence of optimization problem. Furthermore, the selection of penalty coefficient often depends on experience to adjust, improper penalty coefficient is easy to lead to abnormal solution. Wang et al. ^[8] proposed an adaptive learning rate annealing algorithm, which utilizes the back-propagated gradient statistics during model training to assign appropriate weight to each term in the composite loss functions, that aims to balance the interplay between data-fit and regularization. This empirical parameter adjustment technique does not explain why some times PINNs fail to train. To investigated this question, Wang et al. utilized the neural tangent kernel (NTK) in their subsequent work ^[9], and proved that NTK of PINNs converges to a deterministic kernel that stays constant during training in the infinite width limit of fully-connected networks. They developed a novel adaptive training strategy that exploits the eigenvalues of the NTK to adaptively calibrate the convergence rate of the total training error. When using PINN to solve stiff ODE systems, Ji et al. ^[10] noticed that stiffness could cause the failure of the regular PINN. Thus he developed stiff-PINN approach which applies PINN to non/mild-stiff systems obtained by employing quasi-steady-state-assumptions (QSSA) to reduce the stiffness.

The neural networks can be regarded as a combination of linear transformation and nonlinear transformation, in which the activation function only determines the nonlinear approximation effect of the neural networks. The activation function plays an important role in the PINN training process as a result of the dependence of the derivative of the loss function on optimization parameters, in fact, depends on the derivative of the activation function. In the PINN algorithm various activation functions such as tanh, sin etc are used to solve various problems. There is no unified selection criterion for the activation function since it is often related to the specific problem. For ordinary neural networks, lots of literature ^[11,12] have confirmed that well-designed adaptive activation function can accelerate the convergence process. Jagtap et al. ^[13] introduced a scalable hyper-parameter in the activation function, which can be optimized and updated synchronously with neural networks parameters, and could dynamically adjust the derivative value of the activation function. Compared with the fixed activation function, this method can significantly accelerate the convergence rate and improve the accuracy of PINN training. To further accelerate the training convergence rate, different scalable parameters are introduced into the activation function of each neural layer/neuron separately in PINN ^[14]. Lu et al. ^[7] developed a Python library for PINN, i.e., DeepXDE, which can solve forward problems with initial and boundary conditions, as well as inverse problems. DeepXDE contributes to the rapid popularization and extensive applications of PINN, such as in the fields of fluid mechanics ^[15], biomedicine ^[16], cardiovascular flow ^[17] and so on. For more applications, the interested readers are referred to the review ^[18]. In ^[16], Sahli et al. improved diagnostic predictability in diagnosing atrial fibrillation by using PINN to solve an nonlinear wave dynamics equation satisfied by cardiac activation mapping. Kissas et al. ^[17] trained the PINN model on noisy and scattered clinical data of flow and wall displacement to predict blood flow in the cardiovascular system. In addition to the applications mentioned above, PINN is actively explored and improved in localized wave solutions^[19,20], high-dimensional integrable systems ^[21], porous flow ^[22], seepage equation ^[23] and so on.

The partial differential equation governs several important phenomena in physics, engineering and biology, and has a wide range of applications in the fields of epidemiological transmission^[24], tumor growth and wound healing ^[25,26], bacterial aggregation ^[27,28], the cardiomyocyte potential propagation model ^[29,30] and other fields. As the fundamental equation in such areas, Hamilton-Jacobi equations are numerically solved by many high-order accurate numerical methods. These works include, but are not limited to, central schemes ^[31,32], Godunov-type central schemes ^[33,34,35], WENO schemes ^[36,37,38]. Using neural networks to represent the viscosity solution of certain Hamilton-Jacobi (HJ) PDEs is not by itself a new idea, and some neural network architectures have led to promising results ^[39,40]. Graber et al. ^[39] presented optimal control problems on generalized networks that the controllability assumptions are not satisfied around the junctions, the Value Function is characterized as the unique solution to a system of HJ equations in a bilateral viscosity sense. In ^[40], the high-dimensional Hamilton-Jacobi-Bellman (HJB) PDEs is solved by Deep Galerkin Method which approximates the solution by a deep neural network trained to satisfy the differential operator, boundary condition, and initial conditions. The adaptive deep learning networks is proposed to model semi-global solutions for high-dimensional HJB PDEs in ^[41]. In ^[42,43], the authors proposed shallow neural network architectures to express the viscosity solution of certain HJ PDEs with particular form of convex initial data and specific Hamiltonians, where physical constraints that satisfy certain conditions are naturally encoded into the neural networks. Moreover, ^[43] investigated that two network architectures exactly represent the Lax-Oleinik formula solution of certain HJ PDEs whose initial data and convex Hamiltonian satisfy certain assumptions. Note that the Lax-Oleinik formula solution is given as the viscosity solution of the Hamiltonian which does not depend on the state variable ${x}$ and the time variable ${t}$ . However, the performance of PINN is not yet fully investigated in solving HJ PDEs which have non-convex Hamiltonian. It sparks our interest to utilize the PINN algorithm to solve more fundamental HJ PDEs.

The paper is structured as follows. In Section 2, we will discuss the problem setup for HJ PDEs. Section 3 explores the generality of the PINN training algorithm for solving HJ equations, exactly embedding Dirichlet or periodic boundary conditions and physical constraints in neural networks architecture. To further improve the predictive accuracy, a physics-informed neural networks based on adaptive weighted loss functions (AW-PINN) is trained to solve unsupervised learning tasks for HJ PDEs with fewer training data while physical information constraints are imposed during the training process. In Section 4, we demonstrate the effectiveness and convergence of the AW-PINN training algorithm for convex and non-convex Hamiltonian. The series of numerical experiments illustrate that the proposed algorithm effectively achieves noticeable improvements in predictive accuracy and the convergence rate of the total training error. A comparison between the proposed algorithm and the original PINN algorithm for HJ equations indicates that the proposed AW-PINN algorithm can train the solutions more accurately with fewer iterations. Finally, we summarize and discuss our results.

2. Hamilton-Jacobi equations

We consider the time-dependent Hamilton-Jacobi (HJ) equations

$\begin{equation} \begin{cases} \varphi_t+H(\nabla_{\boldsymbol{x}} \varphi(\boldsymbol{x}, t)) = 0, & \boldsymbol{x}\in \Omega \in \mathbb{R}^n, t \in (0, +\infty)\\ \varphi(\boldsymbol{x}, 0) = h(\boldsymbol{x}), & \boldsymbol{x}\in \Omega, \\ \varphi(\boldsymbol{x}, t) = g(\boldsymbol{x}, t), & \boldsymbol{x}\in \partial\Omega, t \in (0, +\infty) \end{cases} \end{equation}$

(2.1)

with Dirichlet or periodic boundary conditions on $\partial\Omega$ . The partial derivative with respect to $t$ and the gradient vector with respect to $\boldsymbol{x}$ of solution $\varphi(\boldsymbol{x}, t)$ are denoted by $\varphi_t$ and $\nabla_{\boldsymbol{x}} \varphi(\boldsymbol{x}, t) = \left(\frac{\partial\varphi(\boldsymbol{x}, t)}{\partial x_1}, \dots, \frac{\partial\varphi(\boldsymbol{x}, t)}{\partial x_n} \right)$ , respectively. The Hamilton $H$ depends on $\nabla_{\boldsymbol{x}} \varphi(\boldsymbol{x}, t)$ and possibly on $x$ and $t$ . The solution of an HJ equation may have a discontinuity even when the initial data is smooth. As in conservation laws, the unique physically relevant solution can be singled out by the consideration of viscosity solutions which provides a consistent definition of a weak solution of Eq (2.1). Thus, we want to design a neural networks to approximate the viscosity solution of HJ equations from small training data. Here we draw motivation from the PINN algorithm ^[6,8,9] and some neural network architectures to solve some HJ PDEs ^[43]. We construct the PINN training method for HJ PDEs with different kinds of Hamiltonian and boundary conditions, and then optimize the weight coefficients to each term in the loss function such that their gradients during back-propagation are similar in magnitude. The problem of solving a PDE is converted into the multi-objective optimization problem where certain constraints are introduced in loss functional minimizing.

3. Physics-informed neural networks based on adaptive weighted loss functions (AW-PINN)

3.1. Network structure

We consider $N^L:\mathbb{R}^{D_i} \rightarrow \mathbb{R}^{D_o}$ to be fully connected feed-forward neural networks of $L$ layers and $N_k$ neurons in $k\text{th}$ layer $\left(N_0 = D_i, \text{and}\; N_L = D_o\right)$ . The input vector is denoted by $\boldsymbol{z}\in \mathbb{R}^{D_i}$ and the output vector at $k\text{th}$ layer is denoted by $N^k(\boldsymbol{z})$ and $N^0(\boldsymbol{z}) = \boldsymbol{z}$ . Each hidden layer of the network receives an output $N^{k-1}(\boldsymbol{z})\in \mathbb{R}^{N_{k-1}}$ from the previous layer where an affine transformation of the form

$\begin{equation} N^k(\boldsymbol{z}): = \boldsymbol{W}^k N^{k-1}(\boldsymbol{z})+\boldsymbol{b}^k, \end{equation}$

(3.1)

is performed. The weight matrix and bias vector in the $k\text{th}$ layer $\left(1 \leq k \leq L\right)$ are denoted by $\boldsymbol{W}^k \in \mathbb{R}^{N_k\times N_{k-1}}$ and $\boldsymbol{b}^k \in \mathbb{R}^{N_k}$ respectively, which are initialized from independent and identically distributed samplings. Such a linear model is simple to solve but is limited in its capacity to solve complex problems. We should use a smooth nonlinear activation function, in order to efficiently compute arbitrary-order derivatives in the back propagation processe; in this study, we choose the hyperbolic tangent ( $tanh$ ). The nonlinear activation function $\sigma(\cdot)$ is applied to each component of the transformed vector before sending it as an input to the next layer. Then the $L$ layers fully connected feed-forward neural networks is defined as

$\begin{equation} N^k(\boldsymbol{z}) = \sigma \left( \boldsymbol{W}^k N^{k-1}(\boldsymbol{z})\right)+\boldsymbol{b}^k, \; 1 \leq k \leq L. \end{equation}$

(3.2)

The activation function is an identity function in the last hidden-layer. By taking $\boldsymbol{\theta} = \{\boldsymbol{W}^k, \boldsymbol{b}^k\}$ as the collection of all weights and biases that represent the trainable parameters in the networks, We can write the neural network as follows

$\begin{equation} \tilde{\varphi}(\boldsymbol{z}): = N^L(\boldsymbol{z};\boldsymbol{\theta}). \end{equation}$

(3.3)

3.2. Training data

The distribution of training points has a certain impact on the flexibility of PINN. In unsupervised learning for solving PDEs, training data only consist of the initial and boundary conditions and the residual point locations in the domain, which is done without using true solution information. Figure 1 shows two different ways to select the residual point locations in the domain. The lattice-like training points are the same as the finite difference grid points, which are equispaced in the spatio-temporal domain. The scattered training points can be taken from certain quasi-random sequences, such as the Sobol sequences or the Latin hypercube sampling. The available sets of measurements can also be used to train the neural network model for some practical problems.

Figure 1. Two distributions of training points for computing equation residual.

DownLoad: Full-Size Img PowerPoint

3.3. Loss function and optimization algorithm

Let $F(\boldsymbol{x}, t)$ denote the left-hand-side of the first equation in (2.1), i.e.,

$\begin{equation} F(\boldsymbol{x}, t): = \varphi_t+H(\nabla_{\boldsymbol{x}} \varphi(\boldsymbol{x}, t)). \end{equation}$

(3.4)

Following the original idea of PINN in ^[6], we then approximate $\varphi(\boldsymbol{x}, t)$ by the neural network denoted by $\tilde{\varphi}(\boldsymbol{x}, t)$ , in which the parameters $\boldsymbol{\theta}$ consist of the weights $\boldsymbol{W}^k$ and biases $\boldsymbol{b}^k$ . The schematic diagram for the neural network with multiple hidden layer is shown in Figure 2. The residual of (2.1) defines as

$\begin{equation} r(\boldsymbol{x}, t;\boldsymbol{\theta}): = \frac{\partial}{\partial t}\tilde{\varphi}(\boldsymbol{x}, t)+H(\nabla_{\boldsymbol{x}} \tilde{\varphi}(\boldsymbol{x}, t)), \end{equation}$

(3.5)

Figure 2. Schematic of the AW-PINN for the HJ PDEs. Along with neural networks, PDE parts and initial/boundary conditions, AW-PINN training algorithm adaptively update weights to loss terms.

DownLoad: Full-Size Img PowerPoint

where the partial derivatives of the neural networks with respect to the space and time coordinates can be readily computed by automatic differentiation ^[44]. To learn a good set candidate parameters $\boldsymbol{\theta}$ in neural networks $\tilde{\varphi}(\boldsymbol{x}, t)$ , we minimize the mean squared error via gradient descent for the following composite loss function in the general form

$\begin{equation} L(\boldsymbol{\theta}) = \lambda_r L_r(\boldsymbol{\theta})+\sum\limits_{i = 1}^{M} \lambda_i L_i(\boldsymbol{\theta}) \end{equation}$

(3.6)

where $\lambda_r$ and $\lambda_i$ are hyper-parameters, which used to balance the interplay between the different loss terms. Here, $L_r(\boldsymbol{\theta})$ is a loss term that penalizes the PDE residual, and $L_i(\boldsymbol{\theta}), i = 1, \dotsc, M,$ correspond to data-fit terms (e. g., measurements, initial or boundary conditions, etc.). For a typical initial and boundary value problem, these loss functions would take the specific form as

$\begin{equation} L_0(\boldsymbol{\theta}) = \frac{1}{N_0} \sum\limits_{i = 1}^{N_0} \bigl|\tilde{\varphi}(\boldsymbol{x}_0^i, 0)-h(\boldsymbol{x}_0^i)\bigr|^2, \end{equation}$

(3.7)

$\begin{equation} L_b(\boldsymbol{\theta}) = \frac{1}{N_b} \sum\limits_{i = 1}^{N_b} \bigl|\tilde{\varphi}(\boldsymbol{x}_b^i, t_b^i)-g(\boldsymbol{x}_b^i, t_b^i)\bigr|^2, \end{equation}$

(3.8)

$\begin{equation} L_r(\boldsymbol{\theta}) = \frac{1}{N_r} \sum\limits_{i = 1}^{N_r} \bigl|r(\boldsymbol{x}_r^i, t_r^i;\boldsymbol{\theta})\bigr|^2, \end{equation}$

(3.9)

where $\{(\boldsymbol{x}_0^i, 0), h(\boldsymbol{x}_0^i)\}_{i = 1}^{N_0}$ denotes the initial data, $\{(\boldsymbol{x}_b^i, t_b^i), g(\boldsymbol{x}_b^i)\}_{i = 1}^{N_b}$ denotes the boundary data, and $\{(\boldsymbol{x}_r^i, t_r^i), 0\}_{i = 1}^{N_r}$ denotes a set of collocation points that are randomly placed inside the domain $\Omega$ in order to minimize the PDE residual. Consequently, $L_r$ penalizes the equation for not being satisfied on a finite set of collocation points, which constitutes the physics-informed part of the neural networks. The loss terms $L_0(\boldsymbol{\theta})$ and $L_b(\boldsymbol{\theta})$ correspond to the initial and boundary data, which must be satisfied by the neural networks solution.

The resulting optimization problem leads to finding the minimum of a loss function by optimizing the parameters, i.e., we seek to find

$\begin{equation} \boldsymbol{\theta^*} = \mathop{\text{arg min}}\limits_{ \boldsymbol{\theta}}(L(\boldsymbol{\theta})). \end{equation}$

(3.10)

One can solve this minimization problem by the stochastic gradient descent (SGD) algorithm which is widely used in the machine learning community, i.e.,

$\begin{equation} \begin{aligned} \boldsymbol{\theta}_{n+1} & = \boldsymbol{\theta}_{n}-\eta\nabla_{\boldsymbol{\theta}}L(\boldsymbol{\theta}_{n})\\ & = \boldsymbol{\theta}_{n}-\eta \lambda_r\nabla_{\boldsymbol{\theta}}L_r(\boldsymbol{\theta}_{n})-\eta\sum\limits_{i = 1}^{M}\lambda_i\nabla_{\boldsymbol{\theta}}L_i(\boldsymbol{\theta}_{n}). \end{aligned} \end{equation}$

(3.11)

Here $\eta > 0$ is the learning rate and $L(\boldsymbol{\theta}_{n})$ is the loss function at $n\text{th}$ iteration while the SGD methods can be initialized with some starting value $\boldsymbol{\theta}_0$ . We see how the constants $\lambda_r$ and $\lambda_i$ can effectively introduce a rescaling of the learning rate corresponding to each loss term. In particular, the weights are updated as

$\begin{equation} \boldsymbol{W}_{n+1} = \boldsymbol{W}_{n}-\eta \lambda_r\nabla_{ \boldsymbol{W}}L_r( \boldsymbol{W}_{n})-\eta\sum\limits_{i = 1}^{M}\lambda_i\nabla_{\boldsymbol{W}}L_i( \boldsymbol{W}_{n}). \end{equation}$

(3.12)

The appropriate SGD optimizer such as Adam ^[45], AdaGrad ^[46] and L-BFGS ^[47] can be used according to the features of the neural networks.

3.3.1. Adaptive weighted loss function

The weight coefficients of loss function play an important role in improving the trainability of NN, which can be user-defined or tuned automatically. One can obtain $\lambda_i$ arbitrarily via a trial-and-error procedure, yet this manual hyperparameter tuning may not produce satisfying consequences. However, the optimal weights need to be reconstructed for different governing equations, which means we cannot find a fixed empirical formula that is transferable across different problems. Most importantly, the loss function needs to be tailored according to the form of PDEs. It is impractical to set the optimal weights for different loss terms without enough prior knowledge. Wang et al. have made some frontier exploration for adaptive weight by utilizing the back-propagated gradient statistics ^[8] and exploited the eigenvalues of the NTK during training ^[9]. Meer et al. introduced a scaling parameter as loss weight which balances the relative importance of the different constraints ^[48].

Here we draw motivation from the above research works and the Adam algorithm ^[45] to derive an adaptive estimate for choosing the weights during the training process. The Adam algorithm adaptively tunes the learning rate associated with each parameter in the $\boldsymbol{\theta}$ vector, based on the track of the first- and second-order moments of the back-propagated gradients during training. Following a similar idea, our goal is to adaptively design appropriate weight to each loss term that their gradients are similar in magnitude during back-propagation. We define $\lambda_r = 1$ such that the residual loss generally dominate the other loss terms. For given initial and boundary loss terms $L_i$ , find $\hat{\lambda}_i$ satisfying

$\begin{equation} \hat{\lambda}_i \text{mean}\{\bigl|\nabla_{\boldsymbol{\theta}_n} L_i(\boldsymbol{\theta}_{n})\bigr|\} = \max\{\bigl|\nabla_{\boldsymbol{\theta}_n}L_r(\boldsymbol{\theta}_{n})\bigr|\}, \quad i = 0, b, \end{equation}$

(3.13)

where $\bigl|\cdot\bigr|$ denotes the elementwise absolute value and the mean function denotes the average of all the elementwise value for $\bigl|\nabla_{\boldsymbol{\theta}_n} L_i(\boldsymbol{\theta}_{n})\bigr|$ . Therefore, it follows that

$\begin{equation} \hat{\lambda}_i = \frac{\max\{\bigl|\nabla_{\boldsymbol{\theta}_n}L_r(\boldsymbol{\theta}_{n})\bigr|\}}{\text{mean}\{\bigl|\nabla_{\boldsymbol{\theta}_n}L_i(\boldsymbol{\theta}_{n})\bigr|\}}, \quad i = 0, b. \end{equation}$

(3.14)

Then update the weight coefficients $\lambda_i$ using the logarithmic mean ^[49] of the form

$\begin{equation} \lambda_i^{n+1} = \frac{\lambda_i^{n}-\hat{\lambda}_i}{\text{ln}(\lambda_i^{n})-\text{ln}(\hat{\lambda}_i)} , \quad i = 0, b, \end{equation}$

(3.15)

which include a numerically stable implementation and the details can be seen in Appendix 6. What's more, this updating method can also avoid the additional hyperparameter. This optimal choice of $\lambda_r$ and $\lambda_i$ leads to an adaptively weighted loss function

$\begin{equation} L(\boldsymbol{\theta}) = \frac{1}{N_r} \sum\limits_{i = 1}^{N_r} \bigl|r(\boldsymbol{x}_r^i, t_r^i)\bigr|^2+\frac{\lambda_0}{N_0} \sum\limits_{i = 1}^{N_0} \bigl|\tilde{\varphi}^i-\varphi(\boldsymbol{x}_0^i, t_0^i)\bigr|^2+\frac{\lambda_b}{N_b} \sum\limits_{i = 1}^{N_b} \bigl|\tilde{\varphi}^i-\varphi(\boldsymbol{x}_b^i, t_b^i)\bigr|^2. \end{equation}$

(3.16)

Thus, We propose the physics-informed neural networks based on adaptive weighted loss function, which adaptively assign appropriate weight to each term in the loss function during model training, an illustrative schematic is shown in Figure 2. Moreover, the logarithmic mean is employed in updating the weights, which can avoid the additional hyperparameter than learning rate annealing for PINN (LRA-PINN) ^[8].

Algorithm 1 Algorithm 1:AW-PINN algorithm

Step 1: Specify the training set over all domain

Initial and boundary training data:

$\{\boldsymbol{x}_{\varphi}^i, t_{\varphi}^i, \varphi^i\}_{i = 1}^{N_{\varphi}}$ .

Residual training points:

$\{\boldsymbol{x}_r^i, t_r^i\}_{i = 1}^{N_r}$ .

Step 2: Construct neural networks

$NN(\boldsymbol{x}, t;\boldsymbol{\theta})$ with the initialization of parameters

$\boldsymbol{\theta}$ .

Step 3: Construct the residual neural network

$r$ by substituting surrogate

$\tilde{\varphi}$ into the governing equations using automatic differentiation.

Step 4: Specify the adaptively weighted loss function as shown in Eq (3.16), initialize the weight parameters

$\lambda_i$ by 1. Then use a gradient descent algorithm to update the parameters

$\boldsymbol{\theta}$ as :

for

$n = 1, \dots, S$ do

(a) Compute the weights

$\hat{\lambda}_i$ by (3.14).

(b) Update the adaptive weight coefficients

$\lambda_i$ using the logarithmic mean (3.15).

(c) Update the parameters

$\boldsymbol{\theta}$ via gradient descent Eq (3.11).

end for

As summarized in Algorithm 1, our proposed AW-PINN algorithm assigns appropriate weight to each term in the loss function such that the learning rate is adaptively tuned as shown in Eq (3.11), and the gradients of each term in the loss function are similar in magnitude. The proposed AW-PINN algorithm is a modification of the original PINN algorithm ^[6] and learning rate annealing for PINN ^[8]. we remark that one can update the adaptive weights according to the Eqs (3.14) and (3.15) either at every iteration of the gradient descent loop or at a frequency specified by the user. The proposed AW-PINN algorithm can be easily extended to loss functions consisting of multiple terms such as multiple boundary conditions for multivariate problems, while only the gradient statistics in Eqs (3.14) and (3.15) need to be calculated. Moreover, the AW-PINN algorithm can be used to compute the solution of HJ PDEs with different kinds of Hamiltonian $H$ , initial data, and boundary conditions, which further confirms the generality of physics-informed neural networks. Specifically, if take the adaptive weights $\lambda_r, \; \lambda_i$ to be 1 in (3.6), then the AW-PINN becomes the conventional one, i.e.

$\begin{equation} L(\boldsymbol{\theta}) = L_r(\boldsymbol{\theta})+\sum\limits_{i = 1}^{M}L_i(\boldsymbol{\theta}) . \end{equation}$

(3.17)

Here we also explore the generality of the PINN algorithm for solving Hamilton-Jacobi equations, exactly embedding Dirichlet or periodic boundary conditions and physical constraints in neural networks architecture.

4. Numerical experiments

In this section, we provide a series of numerical examples to capture the viscosity solution and illustrate the capacity of the proposed AW-PINN algorithm for solving HJ equations with both convex and nonconvex Hamiltonian. The original method introduced in ^[6] will also be explored in our experiments for comparison. For the sake of generality, only fully connected feedforward neural networks are considered. Unless otherwise stated, the proposed AW-PINN algorithm has the following set up of hyper-parameters: 4 hidden layers with 100 neurons in each layer, and the optimizing procedure is the Adam optimizer with an initial learning rate of 0.001 for 50000 iterations followed by the L-BFGS-B optimizer, in which training process would stop if the relative error between two neighboring training steps is less than $\varepsilon = 10^{-8}$ . The additional L-BFGS-B training process is used to accelerate the convergence rate during the training process. Moreover, the AW-PINN algorithm is initialized using the Glorot scheme ^[50] and implemented in TensorFlow.

4.1. One dimensional case

For one dimensional case, unless otherwise stated, all randomly sampled collocation points inside the computational region are generated using a space filling Latin Hypercube Sampling strategy, such as Figure 3 for Example 4.1.1.

Figure 3. Distributions of the randomly distributed training points.

DownLoad: Full-Size Img PowerPoint

Example 4.1.1 Variable coefficient linear equation along with periodic boundary condition is given by

$\left\{ \begin{array}{l}\varphi_t+\sin(x)\varphi_x = 0, \; x\in[0, 2\pi], \; t\in[0, 1], \\ \varphi(x, 0) = \sin(x), \\ \varphi(0, t) = \varphi(2\pi, t). \end{array} \right.$

(4.1)

The exact solution is given as

$\begin{equation} \varphi(x, t) = \sin(2\arctan(e^{-t}\tan(\frac{x}{2}))). \end{equation}$

(4.2)

Our goal here is to use this canonical benchmark problem to systematically analyze the performance of the AW-PINN algorithm. A neural networks $\tilde{\varphi}$ approximating the solution of (4.1) can now be trained by minimizing the mean squared error loss

$\begin{equation} L(\boldsymbol{\theta}) = L_r(\boldsymbol{\theta}) + \lambda_0 L_0(\boldsymbol{\theta})+\lambda_b L_b(\boldsymbol{\theta}), \end{equation}$

(4.3)

where $L_0(\boldsymbol{\theta})$ and $L_r(\boldsymbol{\theta})$ are defined by Eqs (3.7) and (3.9). The periodic boundary condition can strictly impose the periodicity requirement on the function and its derivative up to a finite order. Thus, the boundary loss term $L_b(\boldsymbol{\theta})$ for periodic boundary condition can be defined as

$\begin{equation} L_b(\boldsymbol{\theta}) = \frac{1}{N_b} \biggl(\sum\limits_{i = 1}^{N_b} \bigl|\tilde{\varphi}(0, t_b^i)-\tilde{\varphi}(2\pi, t_b^i)\bigr|^2+\sum\limits_{i = 1}^{N_b} \bigl|\frac{\partial}{\partial t}\tilde{\varphi}(0, t_b^i)-\frac{\partial}{\partial t}\tilde{\varphi}(2\pi, t_b^i)\bigr|^2\biggr), \end{equation}$

(4.4)

where $\{x_b^i, t_b^i\}_{i = 1}^{N_b}$ denotes the boundary training data. The training point set consists of $N_b = 50$ boundary data randomly sampled from a uniform distribution $\delta t = 0.01$ in $[0, 1]$ , $N_0 = 50$ initial data randomly parsed from a uniform distribution $\delta x = 0.01$ in $[0, 2\pi]$ , as well as $N_r = 2000$ randomly sampled collocation points to enforce Eq (4.1) inside the solution domain. presents the approximate solutions obtained by the proposed AW-PINN algorithm with 50000 gradient descent iterations. The predicted solution obtained by the AW-PINN algorithm shown in is well consistent with the exact solution in . From the comparison of the exact, PINN, LRA-PINN and AW-PINN solution at time $t = 1$ shown in , it is observed that the predicted solutions obtained by the three algorithms agree well with the exact one. However, after 50000 Adam iterations and 9 L-BFGS-B iterations in about 1667.092 seconds, the relative error defined by $\Vert{\varphi(x, t)-\tilde{\varphi}(x, t)}\Vert_{L^2}/\Vert\varphi(x, t)\Vert_{L^2}$ for our AW-PINN is 6.1164e-04 which is smaller than 1.4195e-02 for traditional PINN after 50000 Adam iterations and 278 L-BFGS-B iterations in about 852.0633 seconds as shown in . The LRA-PINN algorithm achieves the relative error of 1.4967e-03 after 50000 Adam iterations and 31 L-BFGS-B iterations in about 1994.6314 seconds. shows the loss history over the number of iterations, where the loss of the AW-PINN is decreasing faster than others. It can be deduced that the learning capabilities of AW-PINN are better as it improves greatly the convergence rate (accelerate the training), especially at the early stage. It is easy to see from that the weight coefficients $\lambda_0$ and $\lambda_b$ adaptively changed with the iteration. Besides, from the evolution of loss terms $L_0(\boldsymbol{\theta}), L_b(\boldsymbol{\theta}), L_r(\boldsymbol{\theta})$ over the number of iterations shown in , we obtain that all of the amplitude gradually decrease with iterations and the loss term of PDE residual is dominant. summarizes our results with different methods and different neural network architectures. Evidently, the relative $L^2$ errors of the AW-PINN is smaller than the conventional PINN and LRA-PINN. In contrast, the proposed AW-PINN algorithm appear to be better robust with respect to network architectures.

Figure 4. Comparison of the conventional PINN and AW-PINN predicted solution and the loss value over iterations during the training of conventional PINN and AW-PINN with 4 hidden layers and 100 neurons per layer for 50000 iterations using the Adam optimizer.

DownLoad: Full-Size Img PowerPoint

Table 1. Iteration times and relative

$L^2$ and

$L^{\infty}$ errors of the approximations that were obtained by the three methods.

	Conventional PINN	LRA-PINN	AW-PINN
Iteration times (Adam; L-BFGS-B)	50000; 278	50000; 31	50000; 9
Relative $L^2$ error	1.4195e-02	1.4967e-03	6.1164e-04
Relative $L^{\infty}$ error	3.9156e-02	3.4360e-03	1.9243e-03

| Show Table

DownLoad: CSV

Figure 5. Evolution of the adaptive weights

$\lambda_0$ and

$\lambda_b$ , and loss terms

$L_0(\boldsymbol{\theta}), L_b(\boldsymbol{\theta}), L_r(\boldsymbol{\theta})$ of the AW-PINN algorithm with 4 hidden layers and 100 neurons per layer for 50000 iterations.

DownLoad: Full-Size Img PowerPoint

Table 2. The relative

$L^2$ errors for the different methods, and for different neural architectures obtained by varying the number of hidden layers and different number of neurons per layer.

	Conventional PINN	LRA-PINN	AW-PINN
30 neurons / 2 hidden layers	2.8067e-03	2.4658e-03	1.1468e-03
60 neurons / 2 hidden layers	1.3770e-03	3.3325e-03	1.6954e-03
120 neurons / 2 hidden layers	1.8578e-03	3.9267e-03	1.8192e-03
30 neurons / 4 hidden layers	1.5972e-03	9.1343e-04	1.1114e-03
60 neurons /4 hidden layers	7.1538e-03	6.3312e-04	3.3524e-04
120 neurons /4 hidden layers	1.3858e-02	1.6538e-03	9.3797e-04
30 neurons / 6 hidden layers	2.6337e-03	3.2978e-03	2.0124e-03
60 neurons /6 hidden layers	1.2910e-02	1.7492e-03	1.2216e-03
120 neurons /6 hidden layers	2.4746e-02	2.6474e-03	8.3961e-04

| Show Table

DownLoad: CSV

Example 4.1.2 The strictly convex Hamiltonian is given by

$\left\{ \begin{array}{l} \varphi_t+\frac{1}{2}(\varphi_x+1)^2 = 0, \; x\in[0, 2], \; t\in[0, 1.5/\pi^2], \\ \varphi(x, 0) = -\cos(\pi x), \\ \varphi(0, t) = \varphi(2, t), \end{array} \right.$

(4.5)

with periodic boundary conditions. The singularity occurs at about $t = 1/\pi^2.$ The change of variables, $v = \varphi_x +1$ , transforms the Eq (4.5) into a conservation law, which can be easily solved via the method of characteristics ^[51]. Here we randomly choose initial training subset with $N_0 = 50$ from a uniform distribution with $n_x = 201$ in the space domain, boundary training subset with $N_b = 50$ from a uniform distribution with $n_t = 101$ in the time domain, and the collocation points with $N_r = 2000$ using a space filling Latin Hypercube Sampling strategy. shows the comparison of the exact solution and the predicted solution for the strictly convex HJ PDEs (4.5) trained by the conventional PINN, LRA-PINN and AW-PINN algorithms. Compared with the exact solution given in , the neural networks presented in trained by our AW-PINN algorithm predict the solution very well. After the formation of the singularity at $t = 1.5/\pi^2$ , the comparison of exact, conventional PINN, LRA-PINN and AW-PINN solutions is given in Figure 6(c), it can be seen that three algorithms accurately capture singularity of the solution. contains the Iterations and relative errors of the approximations. shows the comparison of the loss history for the conventional PINN, LRA-PINN and AW-PINN algorithm, the loss of the AW-PINN algorithm converges faster to the global minimum after 20000 iterations without occasional large spikes occurring. provides a more detailed visual evolution of the adaptive weights $\lambda_0$ and $\lambda_b$ used to scale the loss of initial and boundary conditions during the training procedure of the AW-PINN. shows the evolution of mean squared error loss terms $L_0(\boldsymbol{\theta}), L_b(\boldsymbol{\theta}), L_r(\boldsymbol{\theta})$ over the number of iterations, where all loss terms are convergent gradually. Evidently, the residual loss terms dominate the others.

Figure 6. Comparison of the conventional PINN and AW-PINN predicted solution and the loss value over iterations during the training procedure with 4 hidden layers and 100 neurons per layer for 50000 iterations using the Adam optimizer.

DownLoad: Full-Size Img PowerPoint

Table 3. Iteration times and relative

$L^2$ and

$L^{\infty}$ errors of the approximations that were obtained by the three methods.

	Conventional PINN	LRA-PINN	AW-PINN
Iteration times (Adam; L-BFGS-B)	50000; 6349	50000; 40	50000; 22
Relative $L^2$ error	4.7554e-03	3.6710e-03	6.7551e-03
Relative $L^{\infty}$ error	2.6565e-02	1.7579e-02	2.2131e-02

| Show Table

DownLoad: CSV

Figure 7. Evolution of the adaptive weights

$\lambda_0$ and

$\lambda_b$ and the mean squared error loss terms

$L_0(\boldsymbol{\theta}), L_b(\boldsymbol{\theta}), L_r(\boldsymbol{\theta})$ of the AW-PINN training algorithm with 4 hidden layers and 100 neurons per layer for 50000 iterations.

DownLoad: Full-Size Img PowerPoint

Example 4.1.3 The one-dimensional Eikonal equation is given by

$\left\{ \begin{array}{l} \varphi_t+\left|\varphi_x \right| = 0, \; x\in[0, 2\pi], \; t\in[0, 1], \\ \varphi(x, 0) = \sin(x), \\ \varphi(0, t) = \varphi(2\pi, t), \end{array} \right.$

(4.6)

with periodic boundary conditions. Despite its origins in both geometric optics and wave propagation theory, the Eikonal equations are widely applied in science and engineering disciplines. For example, it is used to infer 3D surface shapes, calculate the distance fields, image denoising, segmentation in image processing and optimal path planning in robotics. As well, the Eikonal equation is used to compute geodesic distances in computer graphics and calculate travel time fields in seismology and is also used for etching, deposition, and lithography simulations in semi-conductor manufacturing. It is of great practical significance to study this equation. The viscosity solution to this equation has a shock forming in $\varphi_x$ at $x = \pi/2$ and a rarefaction wave at $x = 3\pi/2$ . The exact solution can be obtained via the Lax-Hopf formula ^[52]. Our training data is composed of the initial data $N_0 = 50$ randomly parsed from a uniform distribution with $n_x = 201$ in $[0, 2\pi]$ , the boundary points $N_b = 50$ randomly sampled from a uniform distribution with $n_t = 101$ in the time domain, as well as the collocation points $N_r = 2000$ using a space filling Latin Hypercube Sampling strategy. demonstrates the contour plot of the solution of the Eikonal equation on the $x-t$ domain in the top row. provides a more detailed visual comparison of exact, conventional PINN, LRA-PINN and AW-PINN predicted solution at time $t = 1$ . One can observe that the AW-PINN algorithm is able to capture the kink formed at $\frac{\pi}{2}$ and the rarefaction wave at $\frac{3\pi}{2}$ . After 50000 Adam iterations and various number of L-BFGS iterations, the relative prediction error of AW-PINN is 7.6622e-03, improved by one order of magnitude compared to the conventional PINN (2.2495e-02) as shown in . shows that the loss of the AW-PINN algorithm converges faster towards global minimum, which indicates that the proposed AW-PINN algorithm seems to have the ability to train the superior solution more quickly, and have good stability also. provides the evolution process for the adaptive weights $\lambda_0$ and $\lambda_b$ used to scale the initial and boundary conditions loss term during model training in the AW-PINN Algorithm. It can be seen that the weight $\lambda_0$ is larger than $\lambda_b$ throughout the iterations and they all change more slowly after 20000 iterations. shows the mean squared error loss terms $L_0(\boldsymbol{\theta}), L_b(\boldsymbol{\theta}), L_r(\boldsymbol{\theta})$ over the number of iterations. Compared with the results obtained by the conventional PINN and LRA-PINN, the loss terms of the AW-PINN converge to a small deterministic number as the iterations increasing. Obviously, the proposed AW-PINN training algorithm can properly balance the interplay between the initial, boundary, and residual loss terms, and avoid oscillations at extreme points.

Figure 8. Comparison of the conventional PINN and AW-PINN predicted solution and the loss value over iterations during the training procedure with 4 hidden layers and 100 neurons per layer for 50000 iterations using the Adam optimizer.

DownLoad: Full-Size Img PowerPoint

Table 4. Iteration times and relative

$L^2$ and

$L^{\infty}$ errors of the approximations that were obtained by the three methods.

	Conventional PINN	LRA-PINN	AW-PINN
Iteration times (Adam; L-BFGS-B)	50000; 3636	50000; 19	50000; 16
Relative $L^2$ error	2.2495e-02	3.8931e-02	7.6622e-03
Relative $L^{\infty}$ error	1.2236e-01	1.8352e-01	2.9233e-02

| Show Table

DownLoad: CSV

Figure 9. Evolution of the adaptive weights

$\lambda_0$ and

$\lambda_b$ and the mean squared error loss terms

$L_0(\boldsymbol{\theta}), L_b(\boldsymbol{\theta}), L_r(\boldsymbol{\theta})$ of the AW-PINN algorithm with 4 hidden layers and 100 neurons per layer for 50000 iterations.

DownLoad: Full-Size Img PowerPoint

Example 4.1.4 The nonconvex Hamiltonian is given by

$\left\{ \begin{array}{l} \varphi_t-\cos(\varphi_x+1) = 0, \; x\in[-1, 1], \; t\in[0, 1.5/\pi^2], \\ \varphi(x, 0) = -\cos(\pi x), \\ \varphi(-1, t) = \varphi(1, t), \end{array} \right.$

(4.7)

with periodic boundary conditions. We note that the exact solution can be given by $\varphi(x, t) = -\cos(\pi x_0)+t[(v-1)\sin v+\cos v], v = 1+\pi \sin(\pi x_0)$ using characteristic methods. We approximate the solution by fully-connected neural networks $NN(x, t;\boldsymbol{\theta})$ of 8 hidden layers with 100 neurons in each layer. Here we choose randomly sampled training points with $N_0 = 50$ , $N_b = 50$ and $N_r = 2000$ as the training set. The contour plot of the solution obtained by the AW-PINN algorithm shown in is well consistent with the exact solution in . provides a more detailed visual comparison of the exact, conventional PINN, LRA-PINN and AW-PINN predicted solution at time $t = \frac{1.5}{\pi^2}$ . The predicted solution of the AW-PINN training algorithm outperforms the other two, and the $L_2$ relative error is 3.7065e-03, improved by one order of magnitude compared to the conventional PINN (5.9719e-02). The number of iterations and relative error of the three training algorithms are given in . In contrast, the proposed can train the solution more accurately with fewer iterations. The loss of the AW-PINN decreases faster and accelerates convergence as shown in . illustrates the evolution process for the adaptive weights $\lambda_0$ and $\lambda_b$ , and the mean squared error loss terms $L_0(\boldsymbol{\theta}), L_b(\boldsymbol{\theta}), L_r(\boldsymbol{\theta})$ . We can observe that the loss terms converge, and the residual loss term dominates the others. This example shows that the AW-PINN algorithm can approximate the solution even when the Hamiltonian is not convex.

Figure 10. Comparison of the conventional PINN and AW-PINN predicted solution and the loss value over iterations during the training procedure with 8 hidden layers and 100 neurons per layer for 50000 iterations using the Adam optimizer.

DownLoad: Full-Size Img PowerPoint

Table 5. Iteration times and relative

$L^2$ and

$L^{\infty}$ errors of the approximations that were obtained by the three methods.

	Conventional PINN	LRA-PINN	AW-PINN
Iteration times (Adam; L-BFGS-B)	50000; 19342	50000; 17106	50000; 9402
Relative $L^2$ error	5.9719e-02	3.8731e-02	3.7065e-03
Relative $L^{\infty}$ error	2.6706e-01	2.0716e-01	2.6123e-02

| Show Table

DownLoad: CSV

Figure 11. Evolution of the adaptive weights

$\lambda_0$ and

$\lambda_b$ and the mean squared error loss terms

$L_0(\boldsymbol{\theta}), L_b(\boldsymbol{\theta}), L_r(\boldsymbol{\theta})$ of AW-PINN training algorithm with 8 hidden layers and 100 neurons per layer for 50000 iterations.

DownLoad: Full-Size Img PowerPoint

displays the relative $L^2$ error of the AW-PINN method for different examples, and for different neural architectures obtained by varying the number of hidden layers and the different number of neurons per layer. The proposed AW-PINN algorithm appear to be better robust with respect to network architectures and show a consistent trend in improving the prediction accuracy as the number of hidden layers and neural units is increased. As can be seen from Figure 4(d)–Figure 10(d) for the loss value over iterations, the loss trained in the first step are very large, the possible reason is the Xavier initialization. However, the loss decreases to a stable interval in a few steps. The numerical examples verify that our AW-PINN algorithm is stable, and the random initialization has tiny effect on numerical results.

Table 6. The relative

$L^2$ error of the AW-PINN method for different examples, and for different neural architectures obtained by varying the number of hidden layers and the different number of neurons per layer.

	Example 4.1.1	Example 4.1.2	Example 4.1.3	Example 4.1.4
30 neurons / 2 hidden layers	1.1468e-03	3.6212e-02	6.8436e-03	4.8704e-02
60 neurons / 2 hidden layers	1.6954e-03	1.1317e-02	1.6412e-02	9.2931e-03
120 neurons / 2 hidden layers	1.8192e-03	1.4429e-02	1.0010e-02	2.4058e-02
30 neurons / 4 hidden layers	1.1114e-03	5.4070e-03	1.3936e-02	8.9544e-02
60 neurons /4 hidden layers	3.3524e-04	2.6192e-03	5.8579e-03	1.5446e-02
120 neurons /4 hidden layers	9.3797e-04	3.0318e-03	9.4430e-03	5.4431e-02
30 neurons / 6 hidden layers	2.0124e-03	8.7984e-03	4.5297e-03	1.2776e-02
60 neurons /6 hidden layers	1.2216e-03	2.1817e-03	7.4221e-03	3.4884e-03
120 neurons /6 hidden layers	8.3961e-04	4.0882e-03	3.3320e-03	3.8391e-02

| Show Table

DownLoad: CSV

4.2. Two dimensional case

Example 4.2.1 The two-dimensional convex Hamiltonian is given by

$\left\{ \begin{array}{l} \varphi_t+\frac{1}{2}(\varphi_x+\varphi_y+1)^2 = 0, \; (x, y)\in\Omega, \; t\in[0, 1.5/\pi^2], \\ \varphi(x, y, 0) = -\cos(\pi(x+y)/2), \; (x, y)\in\Omega, \end{array} \right.$

(4.8)

with periodic boundary conditions on the domain $\Omega = [0, 2\pi]\times[0, 2\pi]$ . This problem can be reduced to a one-dimensional problem via the coordinate transformation $\zeta = (x+y)/2, \eta = (x-y)/2$ , we can thus use the one-dimensional exact solution to analyze our predicted results. We approximate the solution by fully-connected neural networks $NN(x, t;\boldsymbol{\theta})$ of 5 hidden layers with 200 neurons in each layer. Small data set consists of the uniformly spaced grid points in the domain $\Omega$ with the number of space and time grid points to be $n_x = n_y = 41$ and $n_t = 41$ . Here we choose the randomly sampled collocation point with $N_0 = 400, N_b = 4000$ and $N_r = 30000$ . We present the contour plot of the solution after the singularity formation in . By using the same neural architecture hyperparameter, the trained solutions obtained by the AW-PINN and the conventional PINN algorithms are consistent with the exact one. However, the $L_2$ relative error 9.8475e-03 for the AW-PINN is smaller than 1.2731e-02 for the conventional PINN. What's more, the loss decreases faster and accelerates convergence as shown in . Finally, presents the evolution process for the adaptive weights $\lambda_0$ and $\lambda_b$ , and the mean squared error loss terms $L_0(\boldsymbol{\theta}), L_b(\boldsymbol{\theta}), L_r(\boldsymbol{\theta})$ over iterations during the training of AW-PINN with 50000 iterations using the Adam optimizer. It can be seen from that although the weights $\lambda_0$ and $\lambda_b$ trained in the first step are very large, they decrease to a stable interval in a few steps. Figure 13(b) shows that the residual loss term plays a dominant role in the total loss.

Figure 12. Comparison of the conventional PINN and AW-PINN predicted solution at time

$t = 1.5/\pi^2$ , and the loss value over iterations during the training procedure with 5 hidden layers and 200 neurons per layer for 50000 iterations using the Adam optimizer.

DownLoad: Full-Size Img PowerPoint

Figure 13. Evolution of the adaptive weights

$\lambda_0$ and

$\lambda_b$ and the mean squared error loss terms

$L_0(\boldsymbol{\theta}), L_b(\boldsymbol{\theta}), L_r(\boldsymbol{\theta})$ of the AW-PINN training algorithm with 5 hidden layers and 100 neurons per layer for 50000 iterations using the Adam optimizer.

DownLoad: Full-Size Img PowerPoint

Example 4.2.2 The two-dimensional nonlinear equation is given by

$\left\{ \begin{array}{l} \varphi_t+\varphi_x\varphi_y = 0, \; (x, y)\in\Omega, \; t\in[0, 0.5], \\ \varphi(x, y, 0) = \sin(x)+\cos(y), \; (x, y)\in\Omega, \end{array} \right.$

(4.9)

with Dirichlet boundary conditions on the domain $\Omega = [-\pi, \pi]^2$ . This is a genuinely nonlinear problem with a nonconvex Hamiltonian. The exact solution is given implicitly by $\varphi(x, y, t) = -\cos(q)\sin(r)+\sin(q)+\cos(r)$ where $x = q-t\sin(r)$ and $y = r+t\cos(q)$ . We approximate the solution by fully-connected neural networks $NN(x, t;\boldsymbol{\theta})$ of 5 hidden layers with 200 neurons in each layer. Small data set consists of the uniformly spaced grid points in the domain $\Omega$ , where the numbers of grid points are $n_x = n_y = 41$ and $n_t = 41$ . Here we choose $N_0 = 1681, N_b = 4000$ and $N_r = 30000$ randomly sampled collocation points. We also present the detailed visual comparison of the exact solution, predicted solution using conventional PINN and AW-PINN at time $t = 0.5$ in . Compared with the conventional PINN, the loss of the AW-PINN training algorithm decreases faster and accelerates convergence after 30000 iterations as shown in . What's more, the $L_2$ relative error is $1.2702e-01$ , which is smaller than $1.2937e-01$ for the conventional PINN. Finally, shows the evolution process for the adaptive weights $\lambda_0$ and $\lambda_b$ , and the mean squared error loss terms $L_0(\boldsymbol{\theta}), L_b(\boldsymbol{\theta}), L_r(\boldsymbol{\theta})$ for 50000 iterations. One can obtain that the variation range of the weights become smaller as the number of iterations increases and the residual loss term dominant the others.

Figure 14. Comparison of the conventional PINN and AW-PINN predicted solution at time

$t = 0.5$ , and the loss value over iterations during the training procedure with 5 hidden layers and 200 neurons per layer for 50000 iterations using the Adam optimizer.

DownLoad: Full-Size Img PowerPoint

Figure 15. Evolution of the adaptive weights

$\lambda_0$ and

$\lambda_b$ and the mean squared error loss terms

$L_0(\boldsymbol{\theta}), L_b(\boldsymbol{\theta}), L_r(\boldsymbol{\theta})$ of the AW-PINN algorithm with 5 hidden layers and 100 neurons per layer for 50000 iterations using the Adam optimizer.

DownLoad: Full-Size Img PowerPoint

Example 4.2.3 The shape-from-shading problem is given by

$\left\{ \begin{array}{l} \varphi_t+I(x, y)\sqrt{1+\varphi_x^2+\varphi_y^2}-1 = 0, \; (x, y)\in\Omega, \; t\in[0, 1], \\ \varphi(x, y, 0) = (4096/9)xy(1-x)(1-y)(x-1/2)(y-1/2), \; (x, y)\in\Omega, \end{array} \right.$

(4.10)

with $\varphi(x, y, t) = 0$ at the boundary of the domain $\Omega = [0, 1]^2$ . Here I(x, y) is the brightness value with $0 < I(x, y)\leq 1$ , specifically, we take

$\begin{equation} I(x, y) = 1/\sqrt{1+(2\pi \cos(2\pi x)\sin(2\pi y))^2+(2\pi \sin(2\pi x)\cos(2\pi y))^2}. \end{equation}$

(4.11)

The steady-state solution of equation (4.10) is the shape function, which has the brightness $I$ under vertical lighting, see ^[53]. The shape-from-shading problem has multiple solutions to Eqs (4.10) and (4.11), and all satisfy the homogeneous boundary condition, according to ^[54]. Similar to traditional numerical methods ^[55], the additional "boundary conditions" are introduced at points where $I(x, y) = 1$ . In our problem, we consider such "boundary conditions":

$\begin{equation} \varphi\Big(\frac14, \frac14\Big) = \varphi\Big(\frac34, \frac34\Big) = 1, \varphi\Big(\frac14, \frac34\Big) = \varphi\Big(\frac34, \frac14\Big) = -1, \varphi\Big(\frac12, \frac12\Big) = 0, \end{equation}$

(4.12)

then the exact solution will be $\varphi(x, y) = \sin(2\pi x)\sin(2\pi y)$ . We approximate the solution by a fully-connected neural networks $NN(x, t;\boldsymbol{\theta})$ of 5 hidden layers with 200 neurons in each layer. Small data set consists of the uniformly spaced grid points in the domain $\Omega$ , where $n_x = n_y = 41$ and $n_t = 161$ are the number of space and time grid points. Here we take randomly sampled collocation points with $N_0 = 1681, N_b = 4000$ and $N_r = 30000$ . We also show the detailed visual comparison of the exact solution, predicted solution using the conventional PINN and the AW-PINN at time $t = 1$ in . The loss of AW-PINN training algorithm decreases faster and accelerates convergence after 30000 iterations as shown in , and the $L_2$ relative error is $1.7686e-02$ , improved by one order of magnitude compared to the conventional PINN (1.5982e-01). Finally, presents the evolution process for the adaptive weights $\lambda_0$ and $\lambda_b$ , and the mean squared error loss terms $L_0(\boldsymbol{\theta}), L_b(\boldsymbol{\theta}), L_r(\boldsymbol{\theta})$ for 50000 iterations. For this problem, the residual loss term dominant the others as usual. However, to make the total loss decreases as the number of iterations increases, the weights for loss terms corresponding to the initial and boundary conditions change more frequently here.

Figure 16. Comparison of the conventional PINN and AW-PINN predicted solution at time

$t = 1$ , and the loss value over iterations during the training procedure with 5 hidden layers and 200 neurons per layer for 50000 iterations using the Adam optimizer.

DownLoad: Full-Size Img PowerPoint

Figure 17. Evolution of the adaptive weights

$\lambda_0$ and

$\lambda_b$ and the mean squared error loss terms

DownLoad: Full-Size Img PowerPoint

5. Summary and discussion

Although there are some very valuable works related to developing specific neural network architectures to solve some sets of HJ PDEs ^[42,43], whose Hamiltonian is convex and the viscosity solution can be defined by the Lax-Oleinik formula. The performance of physics-informed neural networks algorithm is not yet fully investigated in the literature in solving time-dependent HJ PDEs such as nonconvex Hamiltonian equations that can capture shock and rarefaction waves. The original PINN formulation ^[6] is trained to solve unsupervised learning tasks by minimizing the mean squared error loss with physics-informed constraint. This training method is suitable for solving certain types of problems, such as the curse of dimensionality problems. However, the original PINN also has difficulties in representing an accurate approximation for nonconvex Hamiltonian. To further improve the predictive accuracy, we have proposed the AW-PINN algorithm such that the weights for different loss terms can be adaptively updated, in which the residual loss terms keep dominating the others. This approach is an improvement of the learning rate annealing for physics-informed neural networks of Wang, Teng, Karniadakis ^[8], and updates the weights using the log average and provides better predictive accuracy for HJ PDEs. We examined our proposed training algorithm for solving time-dependent HJ PDEs, including variable coefficient linear equation, strictly convex Hamiltonian, Eikonal equation, nonconvex Hamiltonian, two-dimensional Burgers-type Hamiltonian, and two-dimensional nonlinear equation with a nonconvex Hamiltonian and the shape-from-shading problem. The series of numerical experiments show that the proposed algorithm effectively achieves noticeable improvements in predictive accuracy and increases the convergence rate. Although a series of numerical results have verified that this training algorithm can learn accurate approximation solutions of HJ PDEs, and yield practical improvements, the theoretical analysis of the proposed AW-PINN algorithm is worth further research. We will investigate many critical applications in computational science and engineering to better understand PINN. These numerical results may provide some inspiration for the subsequent theoretical research. And we notice that the AW-PINN algorithm can enforce exactly periodic boundary conditions by imposing the periodicity requirement on the function and all its derivatives. The neural network architecture can also be modified to exactly impose Dirichlet and periodic boundary conditions.

Acknowledgements

This work is supported by National Natural Science Foundation of China (Grant Nos. 11871399, 11901460) and China Postdoctoral Science Foundation (Grant No. 2022M712600), which are gratefully acknowledged.

Conflict of interest

The authors declare there is no conflict of interest.

Appendix A. Logarithmic mean

The logarithmic mean of $a$ is defined as

$\begin{equation} a^{\ln}(l, r) = \frac{a_l-a_r}{\ln(a_l)-\ln(a_r)} \nonumber \end{equation}$

However, this is not numerically well-posed when $a_l\to a_r$ . To overcome this, let us write the logarithmic mean in another form. Let $\zeta = \frac{a_l}{a_r}$ , so that

$\begin{equation} a^{\ln}(l, r) = \frac{a_l+a_r}{\ln\zeta} \frac{\zeta-1}{\zeta+1}, \nonumber \end{equation}$

where $\ln(\zeta) = 2\Bigl(\frac{\zeta-1}{\zeta+1}+\frac{1}{3}\frac{(\zeta-1)^3}{(\zeta+1)^3}+\frac{1}{5}\frac{(\zeta-1)^5}{(\zeta+1)^5}+\frac{1}{7}\frac{(\zeta-1)^7}{(\zeta+1)^7}+o((\zeta-1)^9)\Bigr)$ is used to obtain a numerically well-formed logarithmic mean. The subroutine for computing the logarithmic mean is as following, let

$\begin{equation} \zeta = \frac{a_L}{a_R}, \; f = \frac{\zeta-1}{\zeta+1}, \; u = f\ast f, \nonumber \end{equation}$

and

$\begin{equation} F = \begin{cases} 1.0+u/3.0+u\ast u/5.0+u\ast u\ast u/7.0, & \text{if } u < \varepsilon; \\ \ln(\zeta)/2.0/(f), & \text{otherwise}, \nonumber \end{cases} \end{equation}$

so that $a^{\ln}(l, r) = \frac{a_l+a_r}{2F}$ with $\varepsilon = 10^{-2}$ .

References

[1]	A. Fernández, S. García, F. Herrera, Addressing the classification with imbalanced data: open problems and new challenges on class distribution, In: Lecture Notes in Computer Science, Heidelberg: Springer, 6678 (2011). https://doi.org/10.1007/978-3-642-21219-2_1
[2]	M. Liuzzi, P. A. Pelizari, C. Geiß, A. Masi, V. Tramutoli, H. Taubenböck, A transferable remote sensing approach to classify building structural types for seismic risk analyses: the case of Val d'Agri area (Italy), Bull. Earthq. Eng., 17 (2019), 4825–4853.
[3]	D. Devarriya, C. Gulati, V. Mansharamani, A. Sakalle, A. Bhardwaj, Unbalanced breast cancer data classification using novel fitness functions in genetic programming, Expert Syst. Appl., 140 (2020), 112866. https://doi.org/10.1016/j.eswa.2019.112866 doi: 10.1016/j.eswa.2019.112866
[4]	S. Katoch, S. S. Chauhan, V. Kumar, A review on genetic algorithm: past, present, and future, Multimed. Tools Appl., 80 (2021), 8091–8126. https://doi.org/10.1007/s11042-020-10139-6 doi: 10.1007/s11042-020-10139-6
[5]	Y. L. Yuan, J. J. Ren, S. Wang, Z. X. Wang, X. K. Mu, W. Zhao, Alpine skiing optimization: A new bio-inspired optimization algorithm, Adv. Eng. Softw., 170 (2022), 103158 https://doi.org/10.1016/j.advengsoft.2022.103158 doi: 10.1016/j.advengsoft.2022.103158
[6]	J. F. Goycoolea, M. Inostroza-Ponta, M. Villalobos-Cid, M. Marín, Single-solution based metaheuristic approach to a novel restricted clustering problem, 2021. https://doi.org/10.1109/SCCC54552.2021.9650429
[7]	J. H. Holland, Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence, Bradford Books, 1992.
[8]	S. N. Sivanandam, S. N. Deepa, Introduction to Genetic Algorithms, Heidelberg: Springer Berlin, 2010.
[9]	F. Ortiz, J. R. Simpson, J. Pignatiello, A. Heredia-Langner, A genetic algorithm approach to multiple-response optimization, J. Qual. Technol., 36 (2004), 432–450. https://doi.org/10.1080/00224065.2004.11980289 doi: 10.1080/00224065.2004.11980289
[10]	H. I. Calvete, C. Gale, P. M. Mateo, A new approach for solving linear bilevel problems using genetic algorithms, European J. Oper. Res., 188 (2008), 14–28 https://doi.org/10.1016/j.ejor.2007.03.034 doi: 10.1016/j.ejor.2007.03.034
[11]	S. S. Nimankar, D. Vora, Designing a model to handle imbalance data classification using SMOTE and optimized classifier, In: Data Management, Analytics and Innovation, Singapore: Springer, 2020,323–334.
[12]	K. Jiang, J. Lu, K. L. Xia, A novel algorithm for imbalance data classification based on genetic algorithm improved SMOTE, Arab. J. Sci. Eng., 41 (2016), 3255–3266. http://doi.org/10.1007/s13369-016-2179-2 doi: 10.1007/s13369-016-2179-2
[13]	R. Obiedat, R. Qaddoura, A. M. Al-Zoubi, L. Al-Qaisi, O. Harfoushi, M. Alrefai, et al., Sentiment analysis of customers' reviews using a hybrid evolutionary SVM based approach in an imbalanced data distribution, IEEE Access, 10 (2022), 22260–22273. https://doi.org/10.1109/ACCESS.2022.3149482 doi: 10.1109/ACCESS.2022.3149482
[14]	L. Wang, Imbalanced credit risk prediction based on SMOTE and multi-kernel FCM improved by particle swarm optimization, Appl. Soft Comput., 114 (2022), 108153. https://doi.org/10.1016/j.asoc.2021.108153 doi: 10.1016/j.asoc.2021.108153
[15]	L. Demidova, I. Klyueva, SVM classification: Optimization with the SMOTE algorithm for the class imbalance problem, 2017. https://doi.org/10.1109/MECO.2017.7977136
[16]	S. Sreejith, H. K. Nehemiah, A. Kannan, Clinical data classification using an enhanced SMOTE and chaotic evolutionary feature selection, Comput. Biol. Med., 126 (2020), 103991. https://doi.org/10.1016/j.compbiomed.2020.103991 doi: 10.1016/j.compbiomed.2020.103991
[17]	Y.-C. Wang, C.-H. Cheng, A multiple combined method for rebalancing medical data with class imbalances, Comput. Biol. Med., 134 (2021), 104527. https://doi.org/10.1016/j.compbiomed.2021.104527 doi: 10.1016/j.compbiomed.2021.104527
[18]	B. Zorić, D. Bajer, G. Martinović, Employing different optimisation approaches for SMOTE parameter tuning, 2016. https://doi.org/10.1109/SST.2016.7765657
[19]	E. Sara, C. Laila, I. Ali, The impact of SMOTE and grid search on maintainability prediction models, 2019. https://doi.org/10.1109/AICCSA47632.2019.9035342
[20]	J. J. Ren, Z. X. Wang, Y. Pang, Y. L. Yuan, Genetic algorithm-assisted an improved AdaBoost double-layer for oil temperature prediction of TBM, Adv. Eng. Inform., 52 (2022), 101563. https://doi.org/10.1016/j.aei.2022.101563 doi: 10.1016/j.aei.2022.101563
[21]	Y. L. Yuan, X. K. Mu, X. Y. Shao, J. J. Ren, Y. Zhao, Z. X. Zhao, Optimization of an auto drum fashioned brake using the elite opposition-based learning and chaotic k-best gravitational search strategy based grey wolf optimizer algorithm, Appl. Soft Comput., 123 (2022), 108947. https://doi.org/10.1016/j.asoc.2022.108947 doi: 10.1016/j.asoc.2022.108947
[22]	M. L. Shi, S. Wang, W. Sun, L. Y. Lv, X. G. Song, A support vector regression-based multi-fidelity surrogate model, 2019.
[23]	D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning, Addison-Wesley Professional, 1989.
[24]	S. Panda, N. P. Padhy, Comparison of particle swarm optimization and genetic algorithm for FACTS-based controller design, Appl. Soft Comput., 8 (2008), 1418–1427. https://doi.org/10.1016/j.asoc.2007.10.009 doi: 10.1016/j.asoc.2007.10.009
[25]	D. Orvosh, L. Davis, Using a genetic algorithm to optimize problems with feasibility constraints, IEEE World Congress on Computational Intelligence, 1994. https://doi.org/10.1109/ICEC.1994.350001 doi: 10.1109/ICEC.1994.350001
[26]	E. C. Gonçalves, A. Plastino, A. A. Freitas, A genetic algorithm for optimizing the label ordering in multi-label classifier chains, In: 2013 IEEE 25th International Conference on Tools with Artificial Intelligence, 2013. https://doi.org/10.1109/ICTAI.2013.76
[27]	J. Han, M. Kamber, J. Pei, Data Mining: Concepts and Techniques, 2011.
[28]	V. Vapnik, Principles of risk minimization for learning theory, In: Proceedings of the 4th International Conference on Neural Information Processing Systems, 1991,831–838.
[29]	T. Koc, P. Akın, Estimation of high school entrance examination success rates using machine learning and beta regression models, J. Intell. Syst. Theory Appl., 5 (2022), 9–15. http://doi.org/10.38016/jista.922663 doi: 10.38016/jista.922663
[30]	D. Guleryuz, Estimation of soil temperatures with machine learning algorithms-Giresun and Bayburt stations in Turkey, Theor. Appl. Climatol., 147 (2022), 109–125.
[31]	Q. Quan, Z. Hao, X. F. Huang, J. C. Lei, Research on water temperature prediction based on improved support vector regression, Neural Comput. Appl., 2020, 1–10. https://doi.org/10.1007/S00521-020-04836-4 doi: 10.1007/S00521-020-04836-4
[32]	N. V. Chawla, K. W. Bowyer, L. O. Hall, W. P. Kegelmeyer, SMOTE: synthetic minority over-sampling technique, J. Artif. Intell. Res., 16 (2002), 321–357. https://doi.org/10.5555/1622407.1622416 doi: 10.5555/1622407.1622416
[33]	J. Brandt, E. Lanzén, A comparative review of SMOTE and ADASYN in imbalanced data classification, In: Bachelor's Thesis, Uppsala: Uppsala University, 2021.
[34]	H. Al Majzoub, I. Elgedawy, Ö. Akaydın, M. K. Ulukök, HCAB-SMOTE: A hybrid clustered affinitive borderline SMOTE approach for imbalanced data binary classification, Arab. J. Sci. Eng., 45 (2020), 3205–3222.
[35]	P. Akin, Y. Terzi, Comparison of unbalanced data methods for support vector machines, Turkiye Klinikleri J. Biostat., 13 (2021), 138–146. http://doi.org/10.5336/biostatic.2020-80268 doi: 10.5336/biostatic.2020-80268
[36]	S. Uğuz, Makine öğrenmesi teorik yönleri ve Python uygulamaları ile bir yapay zeka ekolü, Nobel Yayıncılık Ankara, 2019.
[37]	R. E. Wright, Logistic regression, In: Reading and Understanding Multivariate Statistics, 1995,217–244.
[38]	T. Koc, H. Koc, E. Ulas, Üniversite öğrencilerinin kötü alışkanlıklarının bayesci ağ yöntemi ile belirlenmesi, Çukurova Üniversitesi Sosyal Bilimler Enstitüsü Dergisi, 26 (2017), 230–240.
[39]	S. V. Buuren, K. Groothuis-Oudshoorn, Mice: Multivariate imputation by chained equations in R, J. Statist. Softw., 45 (2011), 1–68.

This article has been cited by:

1.	Youqiong Liu, Li Cai, Yaping Chen, Pengfei Ma, Qian Zhong, Variable separated physics-informed neural networks based on adaptive weighted loss functions for blood flow model, 2024, 153, 08981221, 108, 10.1016/j.camwa.2023.11.018
2.	Jianpeng Ran, Xiaoxiang Hu, Xiaoqian Yuan, Aoyi Li, Peng Wei, 2023, Physics-Informed neural networks based low thrust orbit transfer design for spacecraft, 979-8-3503-3775-4, 1, 10.1109/SAFEPROCESS58597.2023.10295814
3.	Fangrui Xiu, Zengan Deng, A dynamically adaptive Langmuir turbulence parameterization scheme for variable wind wave conditions: Model application, 2024, 192, 14635003, 102453, 10.1016/j.ocemod.2024.102453
4.	Sandor M. Molnar, Joseph Godfrey, Binyang Song, Balance Equations for Physics-Informed Machine Learning, 2024, 24058440, e38799, 10.1016/j.heliyon.2024.e38799
5.	Yanjie Song, He Wang, He Yang, Maria Luisa Taccari, Xiaohui Chen, Loss-attentional physics-informed neural networks, 2024, 501, 00219991, 112781, 10.1016/j.jcp.2024.112781
6.	Ikhyun Ryu, Gyu-Byung Park, Yongbin Lee, Dong-Hoon Choi, Physics-informed neural network for engineers: a review from an implementation aspect, 2024, 38, 1738-494X, 3499, 10.1007/s12206-024-0624-9
7.	Vikas Yadav, Mario Casel, Abdulla Ghani, RF-PINNs: Reactive Flow Physics-Informed Neural Networks for Field Reconstruction of Laminar and Turbulent Flames using Sparse Data, 2024, 00219991, 113698, 10.1016/j.jcp.2024.113698
8.	Filip Rękas, Marcin Chutkowski, Krzysztof Kaczmarski, Application of Physics-Informed Neural Networks to predict concentration profiles in gradient liquid chromatography, 2025, 00219673, 465831, 10.1016/j.chroma.2025.465831
9.	Daolun Li, Qian Wang, Wenshu Zha, Luhang Shen, Yuxiang Hao, Xiang Li, Shuaijun Lv, Inversion of Multiple Reservoir Parameters Based on Deep Neural Networks Guided by Lagrange Multipliers, 2025, 1086-055X, 1, 10.2118/225442-PA
10.	Minseok Kim, Yeongjong Kim, Yeoneung Kim, Physics-informed neural networks for optimal vaccination plan in SIR epidemic models, 2025, 22, 1551-0018, 1598, 10.3934/mbe.2025059
11.	Zhengyi Li, Ting Zhang, Lei Cheng, Adaptive physics-informed neural networks for underwater acoustic field prediction, 2025, 5, 2691-1191, 10.1121/10.0036834
12.	Zhuang Zhao, Ye Wang, Weijian Zhang, Zhenggang Ba, Lin Sun, Physics-informed neural networks in heat transfer-dominated multiphysics systems: A comprehensive review, 2025, 157, 09521976, 111098, 10.1016/j.engappai.2025.111098
13.	Tianhao Wang, Guirong Liu, Eric Li, Xu Xu, Adaptive deep physics-informed neural network with dual-nested activation for solving complex partial differential equations, 2025, 444, 00457825, 118125, 10.1016/j.cma.2025.118125

Reader Comments

Your name:*

Email:*
© 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

AIMS Mathematics

1.8 3.1

Metrics

Article views(2643) PDF downloads(142) Cited by(5)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Figures(2) / Tables(7)

AIMS Mathematics

A new hybrid approach based on genetic algorithm and support vector machine methods for hyperparameter optimization in synthetic minority over-sampling technique (SMOTE)

Related Papers:

Abstract

1. Introduction

2. Hamilton-Jacobi equations

3. Physics-informed neural networks based on adaptive weighted loss functions (AW-PINN)

3.1. Network structure

3.2. Training data

3.3. Loss function and optimization algorithm

3.3.1. Adaptive weighted loss function

4. Numerical experiments

4.1. One dimensional case

4.2. Two dimensional case

5. Summary and discussion

Acknowledgements

Conflict of interest

Appendix A. Logarithmic mean

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

AIMS Mathematics

A new hybrid approach based on genetic algorithm and support vector machine methods for hyperparameter optimization in synthetic minority over-sampling technique (SMOTE)

Related Papers:

Abstract

1. Introduction

2. Hamilton-Jacobi equations

3. Physics-informed neural networks based on adaptive weighted loss functions (AW-PINN)

3.1. Network structure

3.2. Training data

3.3. Loss function and optimization algorithm

3.3.1. Adaptive weighted loss function

4. Numerical experiments

4.1. One dimensional case

4.2. Two dimensional case

5. Summary and discussion

Acknowledgements

Conflict of interest

Appendix A. Logarithmic mean

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog