Criteria of existence and stability of an n-coupled system of generalized Sturm-Liouville equations with a modified ABC fractional derivative and an application to the SEIR influenza epidemic model

Elkhateeb S. Aly; Mohammed A. Almalahi; Khaled A. Aldwoah; Kamal Shah; Elkhateeb S. Aly; Mohammed A. Almalahi; Khaled A. Aldwoah; Kamal Shah

doi:10.3934/math.2024691

AIMS Mathematics

2024, Volume 9, Issue 6: 14228-14252. doi: 10.3934/math.2024691

Previous Article Next Article

Research article Special Issues

Criteria of existence and stability of an n-coupled system of generalized Sturm-Liouville equations with a modified ABC fractional derivative and an application to the SEIR influenza epidemic model

1.
Department of Mathematics, College of Science, Jazan University, P.O. Box 114, Jazan 45142, Kingdom of Saudi Arabia
2.
Nanotechnology Research Unit, College of Science, Jazan University, P.O. Box 114, Jazan 45142, Kingdom of Saudi Arabia
3.
Department of Mathematics, Hajjah University, Hajjah, Yemen
4.
Department of Mathematics, Faculty of Science, Islamic University of Madinah, Al Madinah 42351, Saudi Arabia
5.
Department of Mathematics, University of Malakand, Chakdara Dir (Lower), 18000 Khyber Pakhtunkhwa, Pakistan
6.
Department of Computer Science and Mathematics, Lebanese American University, Byblos, Lebanon

Received: 29 February 2024 Revised: 09 April 2024 Accepted: 11 April 2024 Published: 18 April 2024
MSC : 34A08, 49J15, 65P99

The primary objective of this study was to explore the behavior of an n-coupled system of generalized Sturm-Liouville (GSL) and Langevin equations under a modified ABC fractional derivative. We aimed to analyze the dynamics of the system and gain insights into how this operator influences the conditions for the existence and uniqueness of solutions. We established the existence and uniqueness of solutions by employing the Banach contraction principle and Leray-Schauder's alternative fixed-point theorem. We also investigated the Hyers-Ulam stability of the system. This analysis allows us to understand the stability properties of the solutions and evaluate their sensitivity to perturbations. Furthermore, we employed Lagrange's interpolation polynomials to produce a numerical scheme for the influenza epidemic model. By combining theoretical analysis, mathematical principles, and numerical simulations, this study contributes to enriching our understanding of the behavior of the system and offers insights into its dynamics and practical applications in epidemiology.

Keywords:

Citation: Elkhateeb S. Aly, Mohammed A. Almalahi, Khaled A. Aldwoah, Kamal Shah. Criteria of existence and stability of an n-coupled system of generalized Sturm-Liouville equations with a modified ABC fractional derivative and an application to the SEIR influenza epidemic model[J]. AIMS Mathematics, 2024, 9(6): 14228-14252. doi: 10.3934/math.2024691

Related Papers:

[1]	Isabel Marrero . Relaxed conditions for universal approximation by radial basis function neural networks of Hankel translates. AIMS Mathematics, 2025, 10(5): 10852-10865. doi: 10.3934/math.2025493
[2]	Mahmut Karakuş . Spaces of multiplier $\sigma$ -convergent vector valued sequences and uniform $\sigma$ -summability. AIMS Mathematics, 2025, 10(3): 5095-5109. doi: 10.3934/math.2025233
[3]	Yumao Li, K. Vijaya, G. Murugusundaramoorthy, Huo Tang . On new subclasses of bi-starlike functions with bounded boundary rotation. AIMS Mathematics, 2020, 5(4): 3346-3356. doi: 10.3934/math.2020215
[4]	Tareq M. Al-shami, Zanyar A. Ameen, Abdelwaheb Mhemdi . The connection between ordinary and soft $\sigma$ -algebras with applications to information structures. AIMS Mathematics, 2023, 8(6): 14850-14866. doi: 10.3934/math.2023759
[5]	Ahmad A. Abubaker, Khaled Matarneh, Mohammad Faisal Khan, Suha B. Al-Shaikh, Mustafa Kamal . Study of quantum calculus for a new subclass of $q$ -starlike bi-univalent functions connected with vertical strip domain. AIMS Mathematics, 2024, 9(5): 11789-11804. doi: 10.3934/math.2024577
[6]	Haesung Lee . Cholesky decomposition and well-posedness of Cauchy problem for Fokker-Planck equations with unbounded coefficients. AIMS Mathematics, 2025, 10(6): 13555-13574. doi: 10.3934/math.2025610
[7]	Yanlin Li, Nasser Bin Turki, Sharief Deshmukh, Olga Belova . Euclidean hypersurfaces isometric to spheres. AIMS Mathematics, 2024, 9(10): 28306-28319. doi: 10.3934/math.20241373
[8]	Alexander G. Ramm . When does a double-layer potential equal to a single-layer one?. AIMS Mathematics, 2022, 7(10): 19287-19291. doi: 10.3934/math.20221058
[9]	Xiying Zheng, Bo Kong, Yao Yu . Quantum codes from $\sigma$ -dual-containing constacyclic codes over $\mathfrak{R}_{l, k}$ . AIMS Mathematics, 2023, 8(10): 24075-24086. doi: 10.3934/math.20231227
[10]	Abdulmtalb Hussen, Mohammed S. A. Madi, Abobaker M. M. Abominjil . Bounding coefficients for certain subclasses of bi-univalent functions related to Lucas-Balancing polynomials. AIMS Mathematics, 2024, 9(7): 18034-18047. doi: 10.3934/math.2024879

Abstract

1. Introduction

The Sigma-Pi-Sigma neural network (SPSNN) is a feed-forward neural network composed of Sigma-Pi units, which can be used to achieve a static mapping of multi-layer neural networks ^[1,2,3]. In ^[4], a new model which can be regarded as a subclass of networks on Sigma-Pi units was considered, and the authors showed the origin of the Kronecker product representation from the classical Sigma-Pi units. In ^[5], a Sigma-Pi network and a new arithmetic were proposed, by which the output representation self-organizes to form a topographic map, whose main contribution was to solve the frame of reference transformation issues through unsupervised learning. Furthermore, in ^[6,7,8], the approximation, convergence performance, and generalization ability of sparse Sigma-Pi network functions were studied, and it was shown that the new algorithms were more efficient than those in the existing literatures.

It is widely recognized that neural network optimization has emerged as a highly significant research topic in recent times. Study on neural network optimization consists of two main topics: One is weight optimization, that is, for a given network structure, select the appropriate learning method to seek the optimal weight such that the training error and the generalization error are small enough ^[9,10]. The second is structural optimization, namely the selection of appropriate activation function, network layer number, connection mode, and so on ^[11,12,13]. But, the research on neural network structure optimization is far less rich than that on weight optimization. On the other hand, there is no literature showing that more hidden layer neurons yield better generalization ability.

When it comes to neural networks, the number of neurons is a crucial factor. There are two common methods for determining the size of networks. The first method is the growing method, where the network starts with a smaller size and new hidden neurons are added during the training process ^[14]. Another method is the pruning method ^{[15,16,17,18,19]}, which begins with a larger network and eventually removes redundant nodes.

These kind of algorithms separates weight learning and weight training, which is inefficient. There are also many slightly more complex algorithms, which further introduce various mechanisms like particle swarm optimization ^[20], genetic algorithms ^[21], eigenvalue analysis ^[22], statistical analysis, and synthetic minority over-sampling techniques ^[23,24], so as to enhance the sparsification efficiency. The disadvantage of these algorithms is that the program is complex and the calculation is large.

An appropriately sized network structure is instrumental in enhancing efficiency. Overfitting poses a significant challenge during network training and is particularly problematic for deep neural network learning ^[25]. Consequently, researchers have extensively explored various forms of sparse regularization techniques, highlighting their indispensability.

Recent years, $L_{p}$ regularization is diffusely used to solve variable selection and parameter estimation problems in machine learning. This regularization method takes the form

$\begin{equation*} E(W) = \widehat{E}(W)+\tau\|W\|^p_{p}, \end{equation*}$

where $\widehat{E}$ is the normal error function, $\|W\|_{p} = (\sum_{i}|W_{i}|^{p})^{1/p}$ denotes the p-norm, $\tau$ is the penalty coefficient, and $\|\cdot\|$ represents the euclidean norm. This regularization term is also named the penalty term.

In general, there are several common forms of regularization: weight elimination, weight decay, and approximate smoothing ^{[26,27,28,29,30,31,32]}. Among them, weight elimination is widely used as the penalty term in pruning feedforward neural networks, mainly to reduce unnecessary connections or optimize the network weights ^[33]. A little more detailed introduction for different penalty terms are as follows.

For different values of $p$ , the $L_{2}$ norm to the standard error function makes it more optimized ^{[34,35,36,37]}. This practice is called $L_{2}$ regularization, and the form is shown as follows:

$\begin{equation*} E(W) = \widehat{E}(W)+\tau\|W\|^2_{2}, \end{equation*}$

where $\|W\|^2_{2}$ denotes the penalty term, and the $L_{2}$ norm solution is popular because of its special relationship with the normal distributions. It can serve as brute force to avoid excessive weights. But, unfortunately, it is not sparse. This means that, during the training process, the $L_{2}$ regularization can not drive unnecessary weights to zero.

The $L_{0}$ regularization term is the ordinary method for feature extraction and variable selection. Constrained by the number of coefficients, the $L_{0}$ regularizer produces the most sparse solutions, which are difficult to calculate, and it is a combinatory optimization problem ^[38,39].

As an alternative to the $L_{0}$ regularization term, the increasingly important $L_{1}$ regularization term (Lasso) has become popular since it just needs to resolve a quadratic programming problem ^[40]. In ^[41], the $L_1$ -norm was combined with the capped $L_1$ -norm to indicate the amount of information collected by the filter and the control regularization. The $L_1$ regularization penalty function is generally denoted as follows:

$\begin{equation*} E(W) = \widehat{E}(W)+\tau\|W\|_{1}. \end{equation*}$

It was shown in ^[42,43] that, although these algorithms can generate an alternate neural network structure, they do not offer a unified neural network framework to solve a class of problems.

In order to explore more appropriate neural network structures to get over the obstacles posed by suboptimal models, we propose a new SPSNN algorithm based on the $L_1$ penalty and the $L_2$ penalty to deal with complex and varied tasks within a unified framework to improve the robustness and generality of the model. Based on the $L_1$ norm, an $L_2$ norm is presented in SPSNNs to promote the population sparsity effect to select the relevant hidden node population. Therefore, our proposed variant algorithm benefits from ridge regression and the tendency towards sparse solutions of the $L_1$ penalty, which generates a more suitable neural network structure than using one of the regular terms alone, and the elastic net algorithm can also be used to solve for these hybrid penalties ^[44].

In this paper, the penalty method is considered in the case of selecting the weights, which not only overcomes the shortcomings of the $L_1$ and $L_2$ norms, (both $L_1$ and $L_2$ penalties are used in the same minimization problem), but also has good generalization ability and sparsity. Thus, the mathematical expression for this hybrid penalty as an error function can be expressed as follows:

$\begin{equation*} E(W) = \widehat{E}(W)+\tau_{1}\|W\|_{1}+\tau_{2}\|W\|_{2}^{2}, \end{equation*}$

where the tuning constants $\tau_{1}$ and $\tau_{2}$ are fixed and non-negative. The role of $\tau_{1}$ provides a choice of variables through a sparse vector, and $\tau_{2}$ ensures a unique solution and leads to a grouping effect. A penalty term is added that is a convex combination of the $L_1$ norm $\|W\|_{1}$ and the $L_2$ norm $\|W\|_{2}^{2}$ of the parameter $W$ .

During the training progress, the usual mixed regularization terms are not differentiable at the origin, which usually give rise to oscillation phenomenon. Therefore, we propose a new smoothing algorithm to get over these difficulties. That is, we can use the smoothing technique of the weight in the neighborhood of the origin of the error function instead of the absolute value of the weight. The main contribution and novelty of this article are as follows:

(1) In this article, in order to obtain the optimal architecture with good generalization performance, we will eliminate the weights in the hidden layer by using the idea of elastic net regularization. It means that, by incorporating the $L_1$ and $L_2$ regularization terms, this novel algorithm not only eliminates unnecessary weights but also performs pruning of the network structure in the hidden layers. This pruning reduces the size of the network, leading to an effective optimization of the network structure.

(2) We propose an SPSNN based on elastic net regularized batch gradient methods to obtain an optimal architecture with good generalization performance. By means of smoothing technology, we effectively get over the oscillation phenomenon in the process of network learning.

(3) To test the accuracy and robustness, we apply the proposed method to a large number of regression and classification tasks and compare it with other algorithms that have good sparsity and generalization capabilities.

The rest of the article is arranged as follows: In Section 2 the SPSNN is presented, the batch gradient algorithm for this model is depicted in Section 3, and the novel pruning algorithm based on $L_1$ plus $L_2$ penalty term is depicted for more details. Some numerical experiments are provided in Section 4, and finally some conclusions are made.

2. Sigma-Pi-Sigma neural networks

Next, we mainly depict the fabric of SPSNNs, which are composed of an input layer, a hidden layer of summing nodes( $\sum_{1}$ ), a hidden layer of product nodes( $\prod$ ), and output layer( $\sum_{2}$ ). P, N, Q, and 1 represent the number of the nodes of the input layer, $\sum_{1}$ layer, $\prod$ layer, and $\sum_{2}$ layer, respectively (see Fig. 1)

Figure 1. Structure of an SPSNN.

DownLoad: Full-Size Img PowerPoint

We can use $W_{0} = (w_{01}, w_{02}, \cdots w_{0Q})^T\in{\mathbb{R}^Q}$ to express the weight vector which connects $\sum_{2}$ layer and the $\prod$ layer. Let the weight vector connecting the $\prod$ layer and the $\sum_{1}$ layer be fixed as 1. Meanwhile, using $W_{n} = (w_{n1}, w_{n2}, \cdots w_{nP})^T$ as the weight vector which connects the input layer and the $nth$ node of the $\sum_{1}$ layer, we can set

$\begin{equation*} W = ({W_0}^T, {W_1}^T, \cdots {W_N}^T)^T\in{\mathbb{R}^{Q+NP}}. \end{equation*}$

Let us use $X = (x_1, x_2, \cdots x_P)\in \mathbb{R}^P$ to describe the input of the networks. Assume that $g:\mathbb{R}\longrightarrow \mathbb{R}$ is a given sigmoid activation function for the $\sum_{1}$ layer. We denote the output vector $\xi\in{\mathbb{R}^N}$ of the $\sum_{1}$ layer as

$\begin{equation*} \xi = (g(W_1\cdot X), g(W_2\cdot X), \cdots g(W_N\cdot X))^T, \end{equation*}$

where $\cdot$ denotes the inner product between vectors.

Similarly, we take $\delta = (\delta_1, \delta_2, \cdots \delta_Q)^T$ to indicate the output vector of the $\prod$ layer. The nodes of the network are connected in two different ways. One is fully connected between the $\sum_{1}$ layer and the $\prod$ layer, and the other one is sparsely connected. The difference between them lies in the number of product nodes: the former has $2^N$ product nodes, and the latter is less than $2^N$ . We can clearly see the structure of the SPSNNs in Figure 1.

Here, $\mathcal{H}_q(1\leq q\leq Q)$ represents the set of nodes in the $\sum_{1}$ layer connected with the q-th product node, while the set of all product nodes connected with the n-th node in the $\sum_{1}$ layer is represented by $\mathcal{F}_n(1\leq n\leq N)$ . We can suppose an arbitrary muster italicize, and $\varphi(a)$ is the number of elements in muster italicize, and we have

$\begin{equation*} \label{4} \sum\limits_{q = 1}^Q \varphi(\mathcal{H}_q) = \sum\limits_{n = 1}^N \varphi(\mathcal{F}_n), \end{equation*}$

which will be used later.

We can compute the output vector $\delta\in \mathbb{R}^Q$ in the $\prod$ layer by

$\begin{equation*} \label{5} \delta_q = \prod\limits_{i\in \mathcal{H}_q} {\xi_i}, 1\leq q\leq Q. \end{equation*}$

To make convention, we denote $\prod_{i\in\mathcal{H} q}\xi_i = 1$ when $\mathcal{H}_q = \varnothing$ . In the SPSNNs, the final output can be written as

$\begin{equation*} \label{6} y = g(W_0\cdot \delta). \end{equation*}$

3. Batch gradient algorithm for Sigma-Pi-Sigma neural networks with elastic net regularization

3.1. Batch gradient algorithm for Sigma-Pi-Sigma neural networks

We introduce the batch gradient algorithm for SPSNNs. Let $\{X^l, O^l\}_{l = 1}^L\subset \mathbb{R}^P\times \mathbb{R}$ be the given set of the training samples, where $X^l$ denotes the $lth$ input sample and the $O^l$ is the $lth$ corresponding ideal output. Let $y^l\in R$ be the real output for each input $X^l$ . The conventional square error function can be given as

$\begin{equation*} \label{7} \begin{split} \widehat{E}(W)& = \frac{1}{2}\sum\limits_{l = 1}^L(g(W_{0}\cdot\delta^{l})-O^{l})^{2}\\ & = \sum\limits_{l = 1}^L\frac{1}{2}(g(W_{0}\cdot\delta^{l})-O^{l})^{2}\\ & = \sum\limits_{l = 1}^L g_l(W_0\cdot \delta^l), \end{split} \end{equation*}$

where $g_l(z) = \frac{1}{2}(g(z)-O^l)^2, z\in\mathbb{R}, 1\leq l\leq L.$

For convenience, we need to provide the following forms:

$\begin{equation} \begin{split} \delta^l& = (\delta_{1}^l, \delta_{2}^l, \cdots \delta_{Q}^l)\\ & = (\prod\limits_{i\in\mathcal{H}_1}\xi_{i}, \prod\limits_{i\in\mathcal{H}_2}\xi_{i}, \cdots \prod\limits_{i\in\mathcal{H}_Q}\xi_{i})\\ & = (\prod\limits_{i\in\mathcal{H}_1}g(W_{i}\cdot X^{l}), \prod\limits_{i\in\mathcal{H}_2}g(W_{i}\cdot X^{l}), \cdots, \prod\limits_{i\in\mathcal{H}_Q}g(W_{i}\cdot X^{l})), \end{split} \end{equation}$

(3.1)

and thus, by virtue of

$\begin{equation} \begin{split} \widehat{E}(W) & = \sum\limits_{l = 1}^L g_{l}(\sum\limits_{q = 1}^{Q}w_{0q}\cdot\prod\limits_{i\in\mathcal{H}_Q}g(W_{i}\cdot X^{l})) \end{split} \end{equation}$

(3.2)

and some calculation, we can gain

$\begin{equation*} \label{11L} \widehat{E}_{W_{0}}(W) = \sum\limits_{l = 1}^L g'_{l}(W_{0}\cdot\delta^{l})\cdot\delta^{l}. \end{equation*}$

It follows from $\delta_{q}^{l} = \prod_{i\in\mathcal{H}_Q}g(W_{i}\cdot X^{l})$ that

$\begin{equation*} \label{L11} \frac{\partial\delta_{q}^{l}}{\partial W_{n}} = \begin{cases} (\prod_{i\in \mathcal{H}_{q}\setminus n}\xi_{i})\cdot g'({W_n}\cdot X^{l})X^{l}, &if\; { q\neq 1 \; and \; n\in\mathcal{H}_q , }\\ 0, &if\; { q\equiv 1 \; or\; n\not\in\mathcal{H}_q .} \end{cases} \end{equation*}$

Then, from the above equality and (3.2), we get

$\begin{equation*} \label{12} \begin{split} \widehat{E}_{W_{n}}(W)& = \sum\limits_{l = 1}^L[g'_{l}(W_0 \cdot \delta^l)\sum\limits_{q = 1}^Q(w_{0q}\cdot\frac{\partial\delta_q^l}{\partial W_{n}})].\\ \end{split} \end{equation*}$

3.2. Batch gradient algorithm for Sigma-Pi-Sigma neural networks with elastic net regularization

To simplify the network structure, we add the elastic net regularization to optimize the network on the group level. So, we can get the corresponding form of the error function

$\begin{equation} \begin{split} E(W) & = \sum\limits_{l = 1}^L g_l(W_0\cdot \delta^l)+\tau(\alpha(\|W_0\|_{1}+\sum\limits_{n = 1}^{N}\|W_n\|_{1}) \\ &\ \ \ \ +(1-\alpha)(\|W_0\|^2_{2}+\sum\limits_{n = 1}^{N}\|W_n\|^2_2)), \end{split} \end{equation}$

(3.3)

where $\tau$ and $\alpha$ are the tuning parameters that control the performance of the penalty term. With the gradual increase of $\alpha$ , the $L_1$ regular term dominates, and the error function gets closer to Lasso regression; with the gradual decrease of $\alpha$ , the $L_2$ regular term dominates, and the error function gets closer to ridge regression. In particular, the error function is equivalent to that of ridge regression at $\alpha = 0$ , and the cost function is equivalent to that of Lasso regression at $\alpha = 1$ . Let $\tau_1 = \tau\cdot\alpha$ and $\tau_2 = 2\tau\cdot(1-\alpha)$ . Then, we have

$\begin{equation} \begin{split} E(W) & = \sum\limits_{l = 1}^L g_l(W_0\cdot \delta^l)+\tau_1(\|W_0\|_{1}+\sum\limits_{n = 1}^{N}\|W_n\|_1) \\ &\ \ \ \ +\frac{\tau_2}{2}(\|W_0\|^2_{2}+\sum\limits_{n = 1}^{N}\|W_n\|^2_2). \end{split} \end{equation}$

(3.4)

The gradient of the error function with respect to $W_0$ and $W_n$ are, respectively, given as

$\begin{equation} \begin{split} E_{W_{0}}(W) & = \sum\limits_{l = 1}^L g'_l(W_0\cdot \delta^l)\cdot\delta^l+\tau_1\frac{W_{0}}{\|W_0\|}+\tau_2W_0 \end{split} \end{equation}$

(3.5)

and

$\begin{equation} \begin{split} E_{W_{n}}(W) & = \sum\limits_{l = 1}^L[g'_l(W_0\cdot \delta^l)\sum\limits_{q\in\mathcal{F}_n}[w_{0q}(\prod\limits_{i\in\mathcal{H}_q \setminus {n}} \xi_i^l)\\ &\ \ \ \ \times g'(W_n\cdot X^l)\cdot X^l]]+\tau_1\frac{W_{n}}{\|W_n\|}+\tau_2W_n. \end{split} \end{equation}$

(3.6)

We notice that elastic net regularization in (3.4) is combined with $L_1$ norm regularization and $L_2$ norm regularization. It is clear that (3.4) is not differentiable at the origin, which will yield the oscillation phenomenon, and we propose a smoothing approximation method to overcome this problem caused by the non-smoothness. For any limited dimensional vector $\mathbf{u}$ and a fixed constant $\gamma > 0$ , we can define a smoothing function of $\|\mathbf{u}\|$ as follows:

$\begin{equation} h(\mathbf{u}, \gamma) = \begin{cases} \|\mathbf{u}\|, &if\; { \|\mathbf{u}\| > \gamma }, \\ \frac{\|u\|^2}{2\gamma}+\frac{\gamma}{2}, &if\; { \|\mathbf{u}\|\leq\gamma }. \end{cases} \end{equation}$

(3.7)

We use (3.7) to approximate the elastic net regularization in (3.4). Furthermore, the gradient of $h(\mathbf{u}, \gamma)$ with respect to the vector $\mathbf{u}$ is given as follows:

$\begin{equation*} \nabla_{\mathbf{u}}h(\mathbf{u}, \gamma) = \begin{cases} \frac{\mathbf{u}}{\|\mathbf{u}\|}, &if\; { \|\mathbf{u}\| > \gamma }, \\ \frac{\mathbf{u}}{\gamma}, &if\; { \|\mathbf{u}\|\leq\gamma }. \end{cases} \end{equation*}$

Accordingly, the error function (3.4) can be rewritten as

$\begin{equation} \begin{split} E(W) = &\sum\limits_{l = 1}^L g_l(W_0\cdot \delta^l)+\tau_1[h(W_{0}, \gamma)+\sum\limits_{n = 1}^N h(W_n, \gamma)]\\ &+\frac{\tau_2}{2}(\|W_0\|^2_{2}+\sum\limits_{n = 1}^{N}\|W_n\|^2_2). \end{split} \end{equation}$

(3.8)

According to (3.8), we can get a smoothing elastic net Sigma-Pi-Sigma neural network as

$\begin{equation} E_{W_{0}}(W) = \sum\limits_{l = 1}^L g'_l(W_{0}\cdot \delta ^l)\cdot \delta_q^{l}+\tau_1 \nabla_{W_{0}}h(W_0, \gamma)+\tau_2W_0, \end{equation}$

(3.9)

$\begin{equation} \begin{split} E_{W_{n}}(W)& = \sum\limits_{l = 1}^L g'_l(W_0\cdot \delta ^l)\sum\limits_{q\in \mathcal{F}_n}w_{0q}(\prod\limits_{i\in \mathcal{H}_q\setminus n}\xi_{i} ^l)\\ &\ \ \ \ \times g'(W_{n}\cdot X^l)\cdot X^{l}+\tau_1 \nabla_{W_{n}}h(W_{n}, \gamma)+\tau_2W_n, \end{split} \end{equation}$

(3.10)

where $l = 1, 2, \cdots, L$ .

Beginning with an arbitrary initial weight vector $W^0$ , by the following iterative formula we define the weight sequence

$\begin{equation} W^{k+1} = W^k+\bigtriangleup W^{k}, \end{equation}$

(3.11)

$\begin{equation} \bigtriangleup W_{0}^k = -\eta E_{W_{0}}(W^k), \end{equation}$

(3.12)

$\begin{equation} \bigtriangleup W_{n}^k = -\eta E_{W_{n}}(W^k), \end{equation}$

(3.13)

where $\eta$ represents the learning rate.

4. Simulation results

In this section, the performance of the models with no regularizer, the $L_{2}$ regularizer, the original $L_{1/2}$ regularizer $(OL_{1/2})$ , the smoothing $L_{1/2}$ regularizer $(SL_{1/2})$ , and the original group lasso regularizer $(OGL)$ algorithms are compared with the smoothing elastic net regularizer algorithm $(SGL)$ by using four examples: classification problem, parity problem, function approximation problem, and prediction problem.

4.1. Classification problem

In this example, we choose 8 benchmark data sets from the UCI machine learning repository to test the performance of the new algorithm $(SGL)$ , and compare it with the no regularizer, the $L_{2}$ regularizer, the $OL_{1/2}$ , the $SL_{1/2}$ , and the $OGL$ algorithms.

Table 1 presents the main characteristics of the relevant data sets, which includes the size of datasets, attributes, categories, and sizes of the training and testing sets, where the dataset is randomly partitioned into two subsets: 70% for training and 30% for testing.

Table 1. Detailed description of the classification data sets.

Dataset	Dataset Size	Training set	Testing set	Attributes	Classes
Ecoli	336	224	112	8	7
Olitos	120	80	40	25	4
Seeds	210	147	63	7	3
Iris	150	105	45	4	3
Wine	178	120	58	13	3
Liver	345	240	105	7	2
Sonar	208	138	70	60	2
Diabetes	768	526	242	8	2

| Show Table

DownLoad: CSV

As described at the beginning of this paper, we learn the structure of SPSNNs (see ). Then, we select ${P = 13}$ , ${N = 4}$ , ${Q = 16}$ , and 1, representing the number of the nodes of the input, the ${\sum_{1}}$ , the ${\prod}$ , and the ${\sum_{2}}$ layers, respectively. For each learning algorithm, the initial weights are randomly selected in the interval ${[-0.5, 0.5]}$ , the learning rate ${\eta}$ is 0.0028, and the regular factor ${\tau}$ is 0.001, and we conduct 20 trials for every data set to compare the performance of different algorithms.

To assess the performance of the smoothing elastic net regularizer, based on each data set, we compare the number of remaining hidden neurons after pruning $(RNN)$ , the training accuracy testing accuracy, and training time for each algorithm, and all experimental results are recorded in . From the table, it can be observed that the training accuracy has slightly improved, while the testing accuracy has increased by approximately 1% to 3%. We can find our proposed smoothing elastic net regularizer is superior to the no regularizer, the $L_{2}$ regularizer, the original $L_{1/2}$ regularizer, the smoothing $L_{1/2}$ regularizer, and the original group Lasso regularizer algorithms.

Table 2. Performance comparison for classification problems.

Dataset	Algorithm	$RNN$	Training accuracy	Testing accuracy	Training time(s)
Ecoli	$NoPenalty$	13.30	0.9761	0.9082	17.2756
	$L_{2}$	13.00	0.9747	0.8980	${\textbf {16.7845}}$
	$OL_{1/2}$	12.50	0.9749	0.9191	18.1372
	$SL_{1/2}$	11.80	0.9771	0.9304	16.8050
	$OGL$	12.20	0.9749	0.9191	18.3792
	$SGL$	${\textbf {11.50}}$	${\textbf {0.9780}}$	${\textbf {0.9314}}$	17.3763
Olitos	$NoPenalty$	13.00	0.9573	0.8914	29.9742
	$L_{2}$	12.33	0.9592	0.9053	${\textbf {29.3393}}$
	$OL_{1/2}$	12.00	0.9604	0.9160	30.6111
	$SL_{1/2}$	11.67	0.9589	0.9245	30.3067
	$OGL$	12.00	0.9617	0.9264	33.0595
	$SGL$	${\textbf {11.00}}$	${\textbf {0.9628}}$	${\textbf {0.9355}}$	30.5098
Seeds	$NoPenalty$	13.33	0.9737	0.9522	13.4081
	$L_{2}$	12.67	0.9761	0.9554	13.5387
	$OL_{1/2}$	12.33	0.9791	0.9582	13.8205
	$SL_{1/2}$	11.67	0.9797	0.9676	13.2730
	$OGL$	12.33	0.9792	0.9629	14.7718
	$SGL$	${\textbf {11.33}}$	${\textbf {0.9813}}$	${\textbf {0.9749}}$	${\textbf {12.7701}}$
Iris	$NoPenalty$	13.67	0.9715	0.9296	13.4447
	$L_{2}$	13.00	0.9719	0.9390	${\textbf {13.4420}}$
	$OL_{1/2}$	12.67	0.9723	0.9458	14.3637
	$SL_{1/2}$	12.33	0.9743	0.9554	13.5318
	$OGL$	12.33	0.9748	0.9522	16.1023
	$SGL$	${\textbf {11.00}}$	${\textbf {0.9791}}$	${\textbf {0.9629}}$	14.2630
Wine	$NoPenalty$	12.67	0.9872	0.9729	20.4322
	$L_{2}$	12.67	0.9892	0.9753	20.4173
	$OL_{1/2}$	12.00	0.9896	0.9770	21.3012
	$SL_{1/2}$	11.50	0.9911	0.9814	20.5545
	$OGL$	11.67	0.9906	0.9798	21.1104
	$SGL$	${\textbf {10.50}}$	${\textbf {0.9915}}$	${\textbf {0.9833}}$	${\textbf {20.1607}}$
Liver	$NoPenalty$	13.33	0.9937	0.9823	15.5081
	$L_{2}$	12.67	0.9943	0.9838	${\textbf {15.4588}}$
	$OL_{1/2}$	12.33	0.9947	0.9858	16.4440
	$SL_{1/2}$	11.33	0.9951	0.9861	15.9798
	$OGL$	11.00	0.9948	0.9868	17.4374
	$SGL$	${\textbf {10.33}}$	${\textbf {0.9962}}$	${\textbf {0.9902}}$	16.1645
Sonar	$NoPenalty$	12.67	0.9825	0.9756	12.6535
	$L_{2}$	12.67	0.9831	0.9787	12.1906
	$OL_{1/2}$	12.33	0.9860	0.9830	12.9125
	$SL_{1/2}$	12.00	0.9909	0.9852	12.5690
	$OGL$	12.33	0.9892	0.9849	13.7648
	$SGL$	${\textbf {11.67}}$	${\textbf {0.9918}}$	${\textbf {0.9860}}$	${\textbf {11.7338}}$
Diabetes	$NoPenalty$	13.00	0.9925	0.9937	17.7933
	$L_{2}$	12.50	0.9936	0.9947	${\textbf {17.1156}}$
	$OL_{1/2}$	11.67	0.9954	0.9950	17.4822
	$SL_{1/2}$	11.33	0.9967	0.9955	17.1170
	$OGL$	11.67	0.9961	0.9953	18.4761
	$SGL$	${\textbf {11.33}}$	${\textbf {0.9978}}$	${\textbf {0.9961}}$	17.3682

| Show Table

DownLoad: CSV

In addition, we have also compared our approach with other existing methods. In ^[45], the authors considered the group Lasso regularization method on the Sigma-Pi-Sigma neural network. In ^[46], the authors applied the group $L_{1/2}$ regularization term on high-order neural networks. Our proposed elastic net regularization method is on par with these approaches.

4.2. 5-bit parity problem

For the parity problem, there is an input set of ${2^n}$ samples in n-dimensional space, and every sample is an n-bit binary vector. We consider a 5-bit parity problem which has an input set with ${2^5}$ samples in 5-dimensional space; the ideal output equals to 1 if the number of 1 in the samples is odd, otherwise it equals to zero. Here, using the above method, we test the performance of our proposed smoothing elastic net regularizer.

Similarly, we can study the structure of SPSNNs. We select ${P = 13}$ , ${N = 4}$ , ${Q = 16}$ , and 1 for the number of the nodes of the input, ${\sum_{1}}$ , ${\prod}$ , and ${\sum_{2}}$ layers, separately. In the interval ${[-0.5, 0.5]}$ , the initial weights are randomly selected, the learning rate ${\eta}$ is 0.0045, and the regular factor ${\tau}$ is 0.001. For each learning algorithm we carry out 20 experiments, and we train up to 40, 000 steps or we stop once the error is less than ${1e-4}$ . So, as to assess the sparsity and convergence of the smoothing elastic net regularizer, we compare the average error $(AVE)$ and the number of remaining hidden neurons after pruning $(RNN)$ with the no regularizer, the $L_{2}$ regularizer, the original $L_{1/2}$ regularizer, the smoothing $L_{1/2}$ regularizer, the original group Lasso regularizer, and the smoothing elastic net regularizer, which are listed in Table 3.

Table 3. Emulation results for 5-bit parity problem.

Learning Methods	AVE	RNN
$NoPenalty$	0.004433	17.00
$L_{2}$	0.003967	17.00
$OL_{1/2}$	0.003929	17.14
$SL_{1/2}$	0.004033	17.33
$OGL$	0.003925	17.00
$SGL$	0.003471	16.71

| Show Table

DownLoad: CSV

The results show that the proposed smooth elastic net regularizer outperforms the no regularizer, the $L_{2}$ regularizer, the original $L_{1/2}$ regularizer, the smoothing $L_{1/2}$ regularizer, and the original group Lasso regularizer.

Figure 2(a) shows the error performance of the original group Lasso regularizer and smoothing elastic net regularizer via the 5-bit parity problem. Figure 2(b) shows that the norm of the gradient curve of the error function, based on the 5-bit parity problem, approaches a small positive constant. This indicates that the smoothing elastic net regularizer removes the oscillation of occurring in the original group Lasso regularizer in the learning process.

Figure 2. The performance results based on 5-bit parity problem with

$OGL$ and

$SGL$ .

DownLoad: Full-Size Img PowerPoint

4.3. Function approximation problem

In this example, we study the multi-dimensional Gabor function to compare the approximation performance of the above algorithms.

$\begin{equation*} k(x, y) = \frac{1}{2\pi(0.5)^2}exp(-\frac{x^2+y^2}{2(0.5)^2})cos(2\pi(x+y)). \end{equation*}$

As described at the beginning of this paper, we learn the structure of the SPSNNs. Then, we select ${P = 13}$ , ${N = 4}$ , ${Q = 16}$ , and 1 for the number of the nodes of the input layer, ${\sum_{1}}$ , ${\prod}$ , and ${\sum_{2}}$ layers, separately.

In this experiment, we select 169 training samples from an evenly spaced ${6\times6}$ grid on the square ${-0.5\leq x \leq 0.5}$ and ${-0.5\leq y \leq 0.5}$ . In the interval ${[-0.5, 0.5]}$ , the initial weights are randomly selected, the learning rate ${\eta}$ is 0.0028, and the regular factor ${\tau}$ is 0.001. For each learning algorithm we carry out 20 experiments, and we train up to 40, 000 times or until the error is less than ${1e-4}$ stop iterations.

To assess the sparsity and convergence of the smoothing elastic net regularizer, we compare the average error $(AVE)$ and the number of remaining hidden neurons after pruning $(RNN)$ with the no regularizer, the $L_{2}$ regularizer, the original $L_{1/2}$ regularizer, the smoothing $L_{1/2}$ regularizer, the original group Lasso regularizer, and the smoothing elastic net regularizer, which are shown in . We can see our proposed smoothing elastic net regularizer is superior to the no regularizer, the $L_{2}$ regularizer, the original $L_{1/2}$ regularizer, the smoothing $L_{1/2}$ regularizer, and the original group Lasso regularizer.

Table 4. Emulation results for identifying the Gabor function.

Learning Methods	AVE	RNN
$NoPenalty$	0.003940	18.00
$L_{2}$	0.003560	18.00
$OL_{1/2}$	0.003783	17.67
$SL_{1/2}$	0.004486	17.37
$OGL$	0.003523	16.87
$SGL$	0.003286	16.14

| Show Table

DownLoad: CSV

For each learning algorithm, we show the error function and the norm of gradient of one of the 20 experiments after 40, 000 epochs in Figures 3–. shows the oscillation phenomenon of no regularizer, the $L_{2}$ regularizer, the original $L_{1/2}$ regularizer and the original group lasso regularizer. shows the error curve of the smoothing $L_{1/2}$ regularizer and the smoothing elastic net regularizer. shows the norm of gradient curve of the no regularizer, the $L_{2}$ regularizer, the original $L_{1/2}$ regularizer, and the original group Lasso regularizer. shows the norm of the gradient curve of the smoothing $L_{1/2}$ regularizer, and the smoothing elastic net regularizer. Obviously, it approaches to a small positive constant. Figure 5 shows a typical performance for one of 20 experiments, and we can see that it has a good approximation effect compared with other algorithms. In each learning algorithm for the same parameters, we get the corresponding results. We can see the learning method with the smoothing elastic net regularizer converges faster than other learning methods, and the smoothing elastic net regularizer method overcomes the numerical oscillation phenomenon. It also shows that during the iterative process the error function curves are monotonically decreasing and converge to zero, which also validates our theoretical results.

Figure 3. The performance results of the error function for the above algorithms.

DownLoad: Full-Size Img PowerPoint

Figure 4. The performance results of the gradient norm for the above algorithms.

DownLoad: Full-Size Img PowerPoint

Figure 5. The approximation result of the smoothing elastic net regularizer.

DownLoad: Full-Size Img PowerPoint

4.4. Prediction problem

To verify the effectiveness of the algorithm further, this part takes an interval shield tunneling project of Metro Line 9 in Zhengzhou City, China, as an example. In order to monitor the impact of the subway shield tunneling process on the surface buildings and structures, 10-meter intervals are used to set up settlement observation points in advance in each shield tunneling section, and the surrounding structures and buildings are monitored. JGC1, the closest settlement observation point of the signal tower to the right line of the shield structure, is selected as the objective of study, and the Leica DNA03 level is used to collect data 10 days before the shield structure is excavated to the closest point of JGC1, for a total of 40 days, and the frequency of observation is once a day. In this experiment, we use the data of the first 30 days as the training data set and the data of the last 10 days as the test data set (see Table 5).

Table 5. Measured settlement of metro shield at point JGC1.

Time	JGC1 (mm)	Time	JGC1 (mm)
2015.3.18	+0.21000	2015.3.31	0.003286
2015.3.19	-0.13000	2015.4.2	+0.10000
2015.3.20	+0.11000	2015.4.3	-0.02000
2015.3.21	-0.06000	2015.4.4	+0.08000
2015.3.22	-0.23000	2015.4.5	-0.16000
2015.3.23	-0.23000	2015.4.6	+0.06000
2015.3.24	-0.15000	2015.4.7	-0.18000
2015.3.25	-0.03000	2015.4.8	+0.28000
2015.3.26	0.003286	2015.4.9	-0.07000
2015.3.27	0.003286	2015.4.10	+0.07000
Time	JGC1 (mm)	Time	JGC1 (mm)
2015.4.11	+0.01000	2015.4.22	+0.01000
2015.4.12	-0.01000	2015.4.23	0.00000
2015.4.13	-0.15000	2015.4.24	+0.19000
2015.4.14	+0.08000	2015.4.25	-0.21000
2015.4.15	-0.09000	2015.4.26	+0.10000
2015.4.16	+0.14000	2015.4.27	-0.12000
2015.4.17	-0.02000	2015.4.28	+0.05000
2015.4.18	-0.26000	2015.4.29	+0.15000
2015.4.19	+0.42000	2015.4.30	-0.17000
2015.4.20	-0.22000	2015.5.1	-0.07000

| Show Table

DownLoad: CSV

In this experiment, we learn the structure of the SPSNNs. Then, we select $P = 5$ , $N = 4$ , $Q = 16$ , and $1$ for the number of the nodes of the input layer, $\sum_1$ , $\sum_2$ , and the output layer, respectively. We use the sigmod activation function at the $\sum_1$ and output layers, respectively, and our stopping criteria in this experiment is an error of less than $1\times10^{-5}$ or $5000$ iterations.

is the error curve of the training set with $5000$ iterations, in which red is the error curve without the regularization term and blue is the error curve with the smoothing elastic net regularization, it can be obtained that the error of the network with the smoothing elastic net regularization decreases faster, and after $500$ iterations, the error is smaller than that without the regularization term, which precisely verifies the theoretical results of this paper and the effectiveness of the proposed algorithm.

Figure 6. The curve of the error function for no penalty and for the smoothing elastic net.

DownLoad: Full-Size Img PowerPoint

5. Conclusions

In this paper, a new batch gradient algorithm for SPSNNs with an $L_1$ plus $L_2$ regularization algorithm is proposed as an effective weight pruning technique. It can handle multi-output regression and multi-class classification problems within a unified framework. This algorithm obtains good performance in both Lasso and ridge regression, penalizing the weights by reducing the weight vectors to zero, which is more efficient than other various pruning strategies. Moreover, the theoretical results and the advantages of this algorithm are also illustrated by numerical experiments.

Use of AI tools declaration

The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

Author contribution

Conceptualization, methodology, original draft preparation, J. Jiao; software, editing, K. Su.

Conflict of interest

All authors declare there is no conflict of interest.

References

[1]	H. Ahmad, M. N. Khan, I. Ahmad, M. Omri, M. F. Alotaibi, A meshless method for numerical solutions of linear and nonlinear time-fractional Black-Scholes models, AIMS Mathematics, 8 (2023), 19677–19698. http://dx.doi.org/10.3934/math.20231003 doi: 10.3934/math.20231003
[2]	H. Ahmad, D. U. Ozsahin, U. Farooq, M. A. Fahmy, M. D. Albalwi, H. Abu-Zinadah, Comparative analysis of new approximate analytical method and Mohand variational transform method for the solution of wave-like equations with variable coefficients, Results Phys., 51 (2023), 106623. http://dx.doi.org/10.1016/j.rinp.2023.106623 doi: 10.1016/j.rinp.2023.106623
[3]	A. A. H. Ahmadini, M. Khuddush, S. N. Rao, Multiple positive solutions for a system of fractional order BVP with p-Laplacian operators and parameters, Axiom, 12 (2023), 974. http://doi.org/10.3390/axioms12100974 doi: 10.3390/axioms12100974
[4]	K. A. Aldwoah, M. A. Almalahi, M. A. Abdulwasaa, K. Shah, S. V. Kawale, M. Awadalla, et al., Mathematical analysis and numerical simulations of the piecewise dynamics model of Malaria transmission: A case study in Yemen, AIMS Mathematics, 9 (2024), 4376–4408. http://dx.doi.org/10.3934/math.2024216 doi: 10.3934/math.2024216
[5]	K. A. Aldwoah, M. A. Almalahi, K. Shah, Theoretical and numerical simulations on the hepatitis B virus model through a piecewise fractional order, Fractal Fract., 7 (2023), 844. http://dx.doi.org/10.3390/fractalfract7120844 doi: 10.3390/fractalfract7120844
[6]	M. A. Almalahi, S. K. Panchal, W. Shatanawi, M. S. Abdo, K. Shah, K. Abodayeh, Analytical study of transmission dynamics of 2019-nCoV pandemic via fractal fractional operator, Results Phys., 24 (2021), 104045. http://dx.doi.org/10.1016/j.rinp.2021.104045 doi: 10.1016/j.rinp.2021.104045
[7]	M. Al-Refai, Proper inverse operators of fractional derivatives with nonsingular kernels, Rend. Circ. Mat. Palermo, II. Ser., 71 (2022), 525–535. http://dx.doi.org/10.1007/s12215-021-00638-2 doi: 10.1007/s12215-021-00638-2
[8]	M. Al-Refai, D. Baleanu, On an extension of the operator with Mittag-Leffler kernel, Fractals, 30 (2022), 2240129. http://dx.doi.org/10.1142/S0218348X22401296 doi: 10.1142/S0218348X22401296
[9]	S. M. Alzahrani, R. Saadeh, M. A. Abdoon, A. Qazza, F. El Guma, M. Berir, Numerical simulation of an influenza epidemic: Prediction with fractional SEIR and the ARIMA model, Appl. Math. Inf. Sci., 18 (2024), 1–12. http://dx.doi.org/10.18576/amis/180101 doi: 10.18576/amis/180101
[10]	A. Atangana, D. Baleanu, New fractional derivatives with nonlocal and non-singular kernel: Theory and application to heat transfer model, Thermal Sci., 20 (2016), 763–769. http://dx.doi.org/10.2298/TSCI160111018A doi: 10.2298/TSCI160111018A
[11]	D. Baleanu, J. Alzabut, J. M. Jonnalagadda, Y. Adjabi, M. M. Matar, A coupled system of generalized Sturm-Liouville problems and Langevin fractional differential equations in the framework of nonlocal and nonsingular derivatives, Adv. Differ. Equ., 239 (2020), 239. http://dx.doi.org/10.1186/s13662-020-02690-1 doi: 10.1186/s13662-020-02690-1
[12]	I. M. Batiha, A. Ouannas, R. Albadarneh, A. A. Al-Nana, S. Momani, Existence and uniqueness of solutions for generalized Sturm-Liouville and Langevin equations via Caputo-Hadamard fractional-order operator, Eng. Comput., 39 (2022), 2581–2603. https://doi.org/10.1108/EC-07-2021-0393 doi: 10.1108/EC-07-2021-0393
[13]	A. Boutiara, M. S. Abdo, M. A. Almalahi, K. Shah, B. Abdalla, T. Abdeljawad, Study of Sturm-Liouville boundary value problems with p-Laplacian by using generalized form of fractional order derivative, AIMS Mathematics, 7 (2022), 18360–18376. http://dx.doi.org/10.3934/math.20221011 doi: 10.3934/math.20221011
[14]	A. Boutiara, M. Benbachir, S. Etemad, S. Rezapour, Kuratowski MNC method on a generalized fractional Caputo Sturm-Liouville-Langevin q-difference problem with generalized Ulam-Hyers stability, Adv. Differ. Equ., 2021 (2021), 454. http://dx.doi.org/10.1186/s13662-021-03619-y doi: 10.1186/s13662-021-03619-y
[15]	A. Ercan, Comparative analysis for fractional nonlinear Sturm-Liouville equations with singular and non-singular kernels, AIMS Mathematics, 7 (2022), 13325–13343. http://dx.doi.org/10.3934/math.2022736 doi: 10.3934/math.2022736
[16]	A. Berhail, N. Tabouche, M. M. Matar, J. Alzabut, Boundary value problem defined by system of generalized Sturm-Liouville and Langevin Hadamard fractional differential equations, Math. Methods Appl. Sci., 2020. http://dx.doi.org/10.1002/mma.6507
[17]	K. S. Eiman, M. Sarwar, T. Abdeljawad, On rotavirus infectious disease model using piecewise modified ABC fractional order derivative, Netw. Heterog. Media, 19 (2024), 214–234. http://dx.doi.org/10.3934/nhm.2024010 doi: 10.3934/nhm.2024010
[18]	R. Gorenflo, A. A. Kilbas, F. Mainardi, S. V. Rogosin, Mittag-Leffler functions, related topics and applications, Heidelberg: Springer Berlin, 2014.
[19]	A. Granas, J. Dugundji, Fixed point theory, New York: Springer, 2003. https://doi.org/10.1007/978-0-387-21593-8
[20]	T. Guo, O. Nikan, Z. Avazzadeh, W. Qiu, Efficient alternating direction implicit numerical approaches for multi-dimensional distributed-order fractional integro differential problems, Comput. Appl. Math., 41 (2022), 236. http://dx.doi.org/10.1007/s40314-022-01934-y doi: 10.1007/s40314-022-01934-y
[21]	R. Khalil, M. Al Horani, A. Yousef, M. Sababheh, A new definition of fractional derivative, J. Comput. Appl. Math., 264 (2014), 65–70. https://doi.org/10.1016/j.cam.2014.01.002 doi: 10.1016/j.cam.2014.01.002
[22]	H. Khan, J. Alzabut, J. F. Gómez-Aguilar, A. Alkhazan, Essential criteria for existence of solution of a modified-ABC fractional order smoking model, Ain Shams Eng. J., 15 (2024), 102646. http://dx.doi.org/10.1016/j.asej.2024.102646 doi: 10.1016/j.asej.2024.102646
[23]	H. Khan, J. Alzabut, W. F. Alfwzan, H. Gulzar, Nonlinear dynamics of a piecewise modified ABC fractional-order leukemia model with symmetric numerical simulations, Symmetry, 15 (2023), 1338. http://dx.doi.org/10.3390/sym15071338 doi: 10.3390/sym15071338
[24]	H. Khan, J. Alzabut, H. Gulzar, Existence of solutions for hybrid modified ABC-fractional differential equations with p-Laplacian operator and an application to a waterborne disease model, Alex. Eng. J., 70 (2023), 665–672. http://dx.doi.org/10.1016/j.aej.2023.02.045 doi: 10.1016/j.aej.2023.02.045
[25]	H. Khan, J. Alzabut, D. Baleanu, G. Alobaidi, M. U. Rehman, Existence of solutions and a numerical scheme for a generalized hybrid class of $n$ -coupled modified ABC-fractional differential equations with an application, AIMS Mathematics, 8 (2023), 6609–6625. http://dx.doi.org/10.3934/math.2023334 doi: 10.3934/math.2023334
[26]	A. A. Kilbas, H. M. Srivastava, J. J. Trujillo, Theory and applications of fractional differential equations, Elsevier, 204 (2006), 1–523.
[27]	W. Li, J. Ji, L. Huang, L. Zhang, Global dynamics and control of malicious signal transmission in wireless sensor networks, Nonlinear Anal. Hybrid Syst., 48 (2023), 101324. http://dx.doi.org/10.1016/j.nahs.2022.101324 doi: 10.1016/j.nahs.2022.101324
[28]	L. J. Muhammad, E. A. Algehyne, S. S. Usman, A. Ahmad, C. Chakraborty, I. A. Mohammed, Supervised machine learning models for prediction of COVID-19 infection using epidemiology dataset, SN Comput. Sci., 2 (2021), 11. http://dx.doi.org/10.1007/s42979-020-00394-7 doi: 10.1007/s42979-020-00394-7
[29]	I. Podlubny, Fractional differential equations, Elsevier, 198 (1998), 1–340.
[30]	M. Rafiq, K. Muhammad, H. Ahmad, A. Saliu, Critical analysis for nonlinear oscillations by least square, Sci. Rep., 14 (2024), 1456. http://dx.doi.org/10.1038/s41598-024-51706-3 doi: 10.1038/s41598-024-51706-3
[31]	M. ur Rahman, M. Yavuz, M. Arfan, A. Sami, Theoretical and numerical investigation of a modified ABC fractional operator for the spread of polio under the effect of vaccination, AIMS Biophys., 11 (2024), 97–120. http://dx.doi.org/10.3934/biophy.2024007 doi: 10.3934/biophy.2024007
[32]	S. N. Rao, M. Alesemi, On a coupled system of fractional differential equations with nonlocal non-separated boundary conditions, Adv. Differ. Equ., 2019 (2019), 97. https://doi.org/10.1186/s13662-019-2035-2 doi: 10.1186/s13662-019-2035-2
[33]	H. O. Sidi, M. J. Huntul, M. O. Sidi, H. Emadifar, Identifying an unknown coefficient in the fractional parabolic differential equation, Results Appl. Math., 19 (2023), 100356. https://doi.org/10.1016/j.rinam.2023.100386 doi: 10.1016/j.rinam.2023.100386
[34]	D. R. Smart, Fixed point theorems, CUP Archive, 1980.
[35]	C. Urs, Coupled fixed point theorems and applications to periodic boundary value problems, Miskolc Math. Notes, 14 (2013), 323–333. http://dx.doi.org/10.18514/MMN.2013.598 doi: 10.18514/MMN.2013.598
[36]	B. Wang, Q. Zhu, S. Li, Stabilization of discrete-time hidden semi-Markov jump linear systems with partly unknown emission probability matrix, IEEE Trans. Automat. Control, 99 (2023), 1952–1959. http://dx.doi.org/10.1109/TAC.2023.3272190 doi: 10.1109/TAC.2023.3272190

This article has been cited by:

1.	Khidir Shaib Mohamed, Ibrhim M. A. Suliman, Mahmoud I. Alfeel, Abdalilah Alhalangy, Faiza A. Almostafa, Ekram Adam, A Modified High-Order Neural Network with Smoothing L1 Regularization and Momentum Terms, 2025, 19, 1863-1703, 10.1007/s11760-025-03973-4
2.	Khidir Shaib Mohamed, Suhail Abdullah Alsaqer, Tahir Bashir, Ibrhim. M. A. Suliman, Convergence analysis of gradient descent based on smoothing L0 regularization and momentum terms, 2025, 1598-5865, 10.1007/s12190-024-02353-4

Reader Comments

Your name:*

Email:*
© 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)