
The primary objective of this study was to explore the behavior of an n-coupled system of generalized Sturm-Liouville (GSL) and Langevin equations under a modified ABC fractional derivative. We aimed to analyze the dynamics of the system and gain insights into how this operator influences the conditions for the existence and uniqueness of solutions. We established the existence and uniqueness of solutions by employing the Banach contraction principle and Leray-Schauder's alternative fixed-point theorem. We also investigated the Hyers-Ulam stability of the system. This analysis allows us to understand the stability properties of the solutions and evaluate their sensitivity to perturbations. Furthermore, we employed Lagrange's interpolation polynomials to produce a numerical scheme for the influenza epidemic model. By combining theoretical analysis, mathematical principles, and numerical simulations, this study contributes to enriching our understanding of the behavior of the system and offers insights into its dynamics and practical applications in epidemiology.
Citation: Elkhateeb S. Aly, Mohammed A. Almalahi, Khaled A. Aldwoah, Kamal Shah. Criteria of existence and stability of an n-coupled system of generalized Sturm-Liouville equations with a modified ABC fractional derivative and an application to the SEIR influenza epidemic model[J]. AIMS Mathematics, 2024, 9(6): 14228-14252. doi: 10.3934/math.2024691
[1] | Isabel Marrero . Relaxed conditions for universal approximation by radial basis function neural networks of Hankel translates. AIMS Mathematics, 2025, 10(5): 10852-10865. doi: 10.3934/math.2025493 |
[2] | Mahmut Karakuş . Spaces of multiplier σ-convergent vector valued sequences and uniform σ-summability. AIMS Mathematics, 2025, 10(3): 5095-5109. doi: 10.3934/math.2025233 |
[3] | Yumao Li, K. Vijaya, G. Murugusundaramoorthy, Huo Tang . On new subclasses of bi-starlike functions with bounded boundary rotation. AIMS Mathematics, 2020, 5(4): 3346-3356. doi: 10.3934/math.2020215 |
[4] | Tareq M. Al-shami, Zanyar A. Ameen, Abdelwaheb Mhemdi . The connection between ordinary and soft σ-algebras with applications to information structures. AIMS Mathematics, 2023, 8(6): 14850-14866. doi: 10.3934/math.2023759 |
[5] | Ahmad A. Abubaker, Khaled Matarneh, Mohammad Faisal Khan, Suha B. Al-Shaikh, Mustafa Kamal . Study of quantum calculus for a new subclass of q-starlike bi-univalent functions connected with vertical strip domain. AIMS Mathematics, 2024, 9(5): 11789-11804. doi: 10.3934/math.2024577 |
[6] | Haesung Lee . Cholesky decomposition and well-posedness of Cauchy problem for Fokker-Planck equations with unbounded coefficients. AIMS Mathematics, 2025, 10(6): 13555-13574. doi: 10.3934/math.2025610 |
[7] | Yanlin Li, Nasser Bin Turki, Sharief Deshmukh, Olga Belova . Euclidean hypersurfaces isometric to spheres. AIMS Mathematics, 2024, 9(10): 28306-28319. doi: 10.3934/math.20241373 |
[8] | Alexander G. Ramm . When does a double-layer potential equal to a single-layer one?. AIMS Mathematics, 2022, 7(10): 19287-19291. doi: 10.3934/math.20221058 |
[9] | Xiying Zheng, Bo Kong, Yao Yu . Quantum codes from σ-dual-containing constacyclic codes over Rl,k. AIMS Mathematics, 2023, 8(10): 24075-24086. doi: 10.3934/math.20231227 |
[10] | Abdulmtalb Hussen, Mohammed S. A. Madi, Abobaker M. M. Abominjil . Bounding coefficients for certain subclasses of bi-univalent functions related to Lucas-Balancing polynomials. AIMS Mathematics, 2024, 9(7): 18034-18047. doi: 10.3934/math.2024879 |
The primary objective of this study was to explore the behavior of an n-coupled system of generalized Sturm-Liouville (GSL) and Langevin equations under a modified ABC fractional derivative. We aimed to analyze the dynamics of the system and gain insights into how this operator influences the conditions for the existence and uniqueness of solutions. We established the existence and uniqueness of solutions by employing the Banach contraction principle and Leray-Schauder's alternative fixed-point theorem. We also investigated the Hyers-Ulam stability of the system. This analysis allows us to understand the stability properties of the solutions and evaluate their sensitivity to perturbations. Furthermore, we employed Lagrange's interpolation polynomials to produce a numerical scheme for the influenza epidemic model. By combining theoretical analysis, mathematical principles, and numerical simulations, this study contributes to enriching our understanding of the behavior of the system and offers insights into its dynamics and practical applications in epidemiology.
The Sigma-Pi-Sigma neural network (SPSNN) is a feed-forward neural network composed of Sigma-Pi units, which can be used to achieve a static mapping of multi-layer neural networks [1,2,3]. In [4], a new model which can be regarded as a subclass of networks on Sigma-Pi units was considered, and the authors showed the origin of the Kronecker product representation from the classical Sigma-Pi units. In [5], a Sigma-Pi network and a new arithmetic were proposed, by which the output representation self-organizes to form a topographic map, whose main contribution was to solve the frame of reference transformation issues through unsupervised learning. Furthermore, in [6,7,8], the approximation, convergence performance, and generalization ability of sparse Sigma-Pi network functions were studied, and it was shown that the new algorithms were more efficient than those in the existing literatures.
It is widely recognized that neural network optimization has emerged as a highly significant research topic in recent times. Study on neural network optimization consists of two main topics: One is weight optimization, that is, for a given network structure, select the appropriate learning method to seek the optimal weight such that the training error and the generalization error are small enough [9,10]. The second is structural optimization, namely the selection of appropriate activation function, network layer number, connection mode, and so on [11,12,13]. But, the research on neural network structure optimization is far less rich than that on weight optimization. On the other hand, there is no literature showing that more hidden layer neurons yield better generalization ability.
When it comes to neural networks, the number of neurons is a crucial factor. There are two common methods for determining the size of networks. The first method is the growing method, where the network starts with a smaller size and new hidden neurons are added during the training process [14]. Another method is the pruning method [15,16,17,18,19], which begins with a larger network and eventually removes redundant nodes.
These kind of algorithms separates weight learning and weight training, which is inefficient. There are also many slightly more complex algorithms, which further introduce various mechanisms like particle swarm optimization [20], genetic algorithms [21], eigenvalue analysis [22], statistical analysis, and synthetic minority over-sampling techniques [23,24], so as to enhance the sparsification efficiency. The disadvantage of these algorithms is that the program is complex and the calculation is large.
An appropriately sized network structure is instrumental in enhancing efficiency. Overfitting poses a significant challenge during network training and is particularly problematic for deep neural network learning [25]. Consequently, researchers have extensively explored various forms of sparse regularization techniques, highlighting their indispensability.
Recent years, Lp regularization is diffusely used to solve variable selection and parameter estimation problems in machine learning. This regularization method takes the form
E(W)=ˆE(W)+τ‖W‖pp, |
where ˆE is the normal error function, ‖W‖p=(∑i|Wi|p)1/p denotes the p-norm, τ is the penalty coefficient, and ‖⋅‖ represents the euclidean norm. This regularization term is also named the penalty term.
In general, there are several common forms of regularization: weight elimination, weight decay, and approximate smoothing [26,27,28,29,30,31,32]. Among them, weight elimination is widely used as the penalty term in pruning feedforward neural networks, mainly to reduce unnecessary connections or optimize the network weights [33]. A little more detailed introduction for different penalty terms are as follows.
For different values of p, the L2 norm to the standard error function makes it more optimized [34,35,36,37]. This practice is called L2 regularization, and the form is shown as follows:
E(W)=ˆE(W)+τ‖W‖22, |
where ‖W‖22 denotes the penalty term, and the L2 norm solution is popular because of its special relationship with the normal distributions. It can serve as brute force to avoid excessive weights. But, unfortunately, it is not sparse. This means that, during the training process, the L2 regularization can not drive unnecessary weights to zero.
The L0 regularization term is the ordinary method for feature extraction and variable selection. Constrained by the number of coefficients, the L0 regularizer produces the most sparse solutions, which are difficult to calculate, and it is a combinatory optimization problem [38,39].
As an alternative to the L0 regularization term, the increasingly important L1 regularization term (Lasso) has become popular since it just needs to resolve a quadratic programming problem [40]. In [41], the L1-norm was combined with the capped L1-norm to indicate the amount of information collected by the filter and the control regularization. The L1 regularization penalty function is generally denoted as follows:
E(W)=ˆE(W)+τ‖W‖1. |
It was shown in [42,43] that, although these algorithms can generate an alternate neural network structure, they do not offer a unified neural network framework to solve a class of problems.
In order to explore more appropriate neural network structures to get over the obstacles posed by suboptimal models, we propose a new SPSNN algorithm based on the L1 penalty and the L2 penalty to deal with complex and varied tasks within a unified framework to improve the robustness and generality of the model. Based on the L1 norm, an L2 norm is presented in SPSNNs to promote the population sparsity effect to select the relevant hidden node population. Therefore, our proposed variant algorithm benefits from ridge regression and the tendency towards sparse solutions of the L1 penalty, which generates a more suitable neural network structure than using one of the regular terms alone, and the elastic net algorithm can also be used to solve for these hybrid penalties [44].
In this paper, the penalty method is considered in the case of selecting the weights, which not only overcomes the shortcomings of the L1 and L2 norms, (both L1 and L2 penalties are used in the same minimization problem), but also has good generalization ability and sparsity. Thus, the mathematical expression for this hybrid penalty as an error function can be expressed as follows:
E(W)=ˆE(W)+τ1‖W‖1+τ2‖W‖22, |
where the tuning constants τ1 and τ2 are fixed and non-negative. The role of τ1 provides a choice of variables through a sparse vector, and τ2 ensures a unique solution and leads to a grouping effect. A penalty term is added that is a convex combination of the L1 norm ‖W‖1 and the L2 norm ‖W‖22 of the parameter W.
During the training progress, the usual mixed regularization terms are not differentiable at the origin, which usually give rise to oscillation phenomenon. Therefore, we propose a new smoothing algorithm to get over these difficulties. That is, we can use the smoothing technique of the weight in the neighborhood of the origin of the error function instead of the absolute value of the weight. The main contribution and novelty of this article are as follows:
(1) In this article, in order to obtain the optimal architecture with good generalization performance, we will eliminate the weights in the hidden layer by using the idea of elastic net regularization. It means that, by incorporating the L1 and L2 regularization terms, this novel algorithm not only eliminates unnecessary weights but also performs pruning of the network structure in the hidden layers. This pruning reduces the size of the network, leading to an effective optimization of the network structure.
(2) We propose an SPSNN based on elastic net regularized batch gradient methods to obtain an optimal architecture with good generalization performance. By means of smoothing technology, we effectively get over the oscillation phenomenon in the process of network learning.
(3) To test the accuracy and robustness, we apply the proposed method to a large number of regression and classification tasks and compare it with other algorithms that have good sparsity and generalization capabilities.
The rest of the article is arranged as follows: In Section 2 the SPSNN is presented, the batch gradient algorithm for this model is depicted in Section 3, and the novel pruning algorithm based on L1 plus L2 penalty term is depicted for more details. Some numerical experiments are provided in Section 4, and finally some conclusions are made.
Next, we mainly depict the fabric of SPSNNs, which are composed of an input layer, a hidden layer of summing nodes(∑1), a hidden layer of product nodes(∏), and output layer(∑2). P, N, Q, and 1 represent the number of the nodes of the input layer, ∑1 layer, ∏ layer, and ∑2 layer, respectively (see Fig. 1)
We can use W0=(w01,w02,⋯w0Q)T∈RQ to express the weight vector which connects ∑2 layer and the ∏ layer. Let the weight vector connecting the ∏ layer and the ∑1 layer be fixed as 1. Meanwhile, using Wn=(wn1,wn2,⋯wnP)T as the weight vector which connects the input layer and the nth node of the ∑1 layer, we can set
W=(W0T,W1T,⋯WNT)T∈RQ+NP. |
Let us use X=(x1,x2,⋯xP)∈RP to describe the input of the networks. Assume that g:R⟶R is a given sigmoid activation function for the ∑1 layer. We denote the output vector ξ∈RN of the ∑1 layer as
ξ=(g(W1⋅X),g(W2⋅X),⋯g(WN⋅X))T, |
where ⋅ denotes the inner product between vectors.
Similarly, we take δ=(δ1,δ2,⋯δQ)T to indicate the output vector of the ∏ layer. The nodes of the network are connected in two different ways. One is fully connected between the ∑1 layer and the ∏ layer, and the other one is sparsely connected. The difference between them lies in the number of product nodes: the former has 2N product nodes, and the latter is less than 2N. We can clearly see the structure of the SPSNNs in Figure 1.
Here, Hq(1≤q≤Q) represents the set of nodes in the ∑1 layer connected with the q-th product node, while the set of all product nodes connected with the n-th node in the ∑1 layer is represented by Fn(1≤n≤N). We can suppose an arbitrary muster italicize, and φ(a) is the number of elements in muster italicize, and we have
Q∑q=1φ(Hq)=N∑n=1φ(Fn), |
which will be used later.
We can compute the output vector δ∈RQ in the ∏ layer by
δq=∏i∈Hqξi,1≤q≤Q. |
To make convention, we denote ∏i∈Hqξi=1 when Hq=∅. In the SPSNNs, the final output can be written as
y=g(W0⋅δ). |
We introduce the batch gradient algorithm for SPSNNs. Let {Xl,Ol}Ll=1⊂RP×R be the given set of the training samples, where Xl denotes the lth input sample and the Ol is the lth corresponding ideal output. Let yl∈R be the real output for each input Xl. The conventional square error function can be given as
ˆE(W)=12L∑l=1(g(W0⋅δl)−Ol)2=L∑l=112(g(W0⋅δl)−Ol)2=L∑l=1gl(W0⋅δl), |
where gl(z)=12(g(z)−Ol)2,z∈R,1≤l≤L.
For convenience, we need to provide the following forms:
δl=(δl1,δl2,⋯δlQ)=(∏i∈H1ξi,∏i∈H2ξi,⋯∏i∈HQξi)=(∏i∈H1g(Wi⋅Xl),∏i∈H2g(Wi⋅Xl),⋯,∏i∈HQg(Wi⋅Xl)), | (3.1) |
and thus, by virtue of
ˆE(W)=L∑l=1gl(Q∑q=1w0q⋅∏i∈HQg(Wi⋅Xl)) | (3.2) |
and some calculation, we can gain
ˆEW0(W)=L∑l=1g′l(W0⋅δl)⋅δl. |
It follows from δlq=∏i∈HQg(Wi⋅Xl) that
∂δlq∂Wn={(∏i∈Hq∖nξi)⋅g′(Wn⋅Xl)Xl,ifq≠1andn∈Hq,0,ifq≡1orn∉Hq. |
Then, from the above equality and (3.2), we get
ˆEWn(W)=L∑l=1[g′l(W0⋅δl)Q∑q=1(w0q⋅∂δlq∂Wn)]. |
To simplify the network structure, we add the elastic net regularization to optimize the network on the group level. So, we can get the corresponding form of the error function
E(W)=L∑l=1gl(W0⋅δl)+τ(α(‖W0‖1+N∑n=1‖Wn‖1) +(1−α)(‖W0‖22+N∑n=1‖Wn‖22)), | (3.3) |
where τ and α are the tuning parameters that control the performance of the penalty term. With the gradual increase of α, the L1 regular term dominates, and the error function gets closer to Lasso regression; with the gradual decrease of α, the L2 regular term dominates, and the error function gets closer to ridge regression. In particular, the error function is equivalent to that of ridge regression at α=0, and the cost function is equivalent to that of Lasso regression at α=1. Let τ1=τ⋅α and τ2=2τ⋅(1−α). Then, we have
E(W)=L∑l=1gl(W0⋅δl)+τ1(‖W0‖1+N∑n=1‖Wn‖1) +τ22(‖W0‖22+N∑n=1‖Wn‖22). | (3.4) |
The gradient of the error function with respect to W0 and Wn are, respectively, given as
EW0(W)=L∑l=1g′l(W0⋅δl)⋅δl+τ1W0‖W0‖+τ2W0 | (3.5) |
and
EWn(W)=L∑l=1[g′l(W0⋅δl)∑q∈Fn[w0q(∏i∈Hq∖nξli) ×g′(Wn⋅Xl)⋅Xl]]+τ1Wn‖Wn‖+τ2Wn. | (3.6) |
We notice that elastic net regularization in (3.4) is combined with L1 norm regularization and L2 norm regularization. It is clear that (3.4) is not differentiable at the origin, which will yield the oscillation phenomenon, and we propose a smoothing approximation method to overcome this problem caused by the non-smoothness. For any limited dimensional vector u and a fixed constant γ>0, we can define a smoothing function of ‖u‖ as follows:
h(u,γ)={‖u‖,if‖u‖>γ,‖u‖22γ+γ2,if‖u‖≤γ. | (3.7) |
We use (3.7) to approximate the elastic net regularization in (3.4). Furthermore, the gradient of h(u,γ) with respect to the vector u is given as follows:
∇uh(u,γ)={u‖u‖,if‖u‖>γ,uγ,if‖u‖≤γ. |
Accordingly, the error function (3.4) can be rewritten as
E(W)=L∑l=1gl(W0⋅δl)+τ1[h(W0,γ)+N∑n=1h(Wn,γ)]+τ22(‖W0‖22+N∑n=1‖Wn‖22). | (3.8) |
According to (3.8), we can get a smoothing elastic net Sigma-Pi-Sigma neural network as
EW0(W)=L∑l=1g′l(W0⋅δl)⋅δlq+τ1∇W0h(W0,γ)+τ2W0, | (3.9) |
EWn(W)=L∑l=1g′l(W0⋅δl)∑q∈Fnw0q(∏i∈Hq∖nξli) ×g′(Wn⋅Xl)⋅Xl+τ1∇Wnh(Wn,γ)+τ2Wn, | (3.10) |
where l=1,2,⋯,L.
Beginning with an arbitrary initial weight vector W0, by the following iterative formula we define the weight sequence
Wk+1=Wk+△Wk, | (3.11) |
△Wk0=−ηEW0(Wk), | (3.12) |
△Wkn=−ηEWn(Wk), | (3.13) |
where η represents the learning rate.
In this section, the performance of the models with no regularizer, the L2 regularizer, the original L1/2 regularizer (OL1/2), the smoothing L1/2 regularizer (SL1/2), and the original group lasso regularizer (OGL) algorithms are compared with the smoothing elastic net regularizer algorithm (SGL) by using four examples: classification problem, parity problem, function approximation problem, and prediction problem.
In this example, we choose 8 benchmark data sets from the UCI machine learning repository to test the performance of the new algorithm (SGL), and compare it with the no regularizer, the L2 regularizer, the OL1/2, the SL1/2, and the OGL algorithms.
Table 1 presents the main characteristics of the relevant data sets, which includes the size of datasets, attributes, categories, and sizes of the training and testing sets, where the dataset is randomly partitioned into two subsets: 70% for training and 30% for testing.
Dataset | Dataset Size | Training set | Testing set | Attributes | Classes |
Ecoli | 336 | 224 | 112 | 8 | 7 |
Olitos | 120 | 80 | 40 | 25 | 4 |
Seeds | 210 | 147 | 63 | 7 | 3 |
Iris | 150 | 105 | 45 | 4 | 3 |
Wine | 178 | 120 | 58 | 13 | 3 |
Liver | 345 | 240 | 105 | 7 | 2 |
Sonar | 208 | 138 | 70 | 60 | 2 |
Diabetes | 768 | 526 | 242 | 8 | 2 |
As described at the beginning of this paper, we learn the structure of SPSNNs (see Figure 1). Then, we select P=13, N=4, Q=16, and 1, representing the number of the nodes of the input, the ∑1, the ∏, and the ∑2 layers, respectively. For each learning algorithm, the initial weights are randomly selected in the interval [−0.5,0.5], the learning rate η is 0.0028, and the regular factor τ is 0.001, and we conduct 20 trials for every data set to compare the performance of different algorithms.
To assess the performance of the smoothing elastic net regularizer, based on each data set, we compare the number of remaining hidden neurons after pruning (RNN), the training accuracy testing accuracy, and training time for each algorithm, and all experimental results are recorded in Table 2. From the table, it can be observed that the training accuracy has slightly improved, while the testing accuracy has increased by approximately 1% to 3%. We can find our proposed smoothing elastic net regularizer is superior to the no regularizer, the L2 regularizer, the original L1/2 regularizer, the smoothing L1/2 regularizer, and the original group Lasso regularizer algorithms.
Dataset | Algorithm | RNN | Training accuracy | Testing accuracy | Training time(s) |
Ecoli | NoPenalty | 13.30 | 0.9761 | 0.9082 | 17.2756 |
L2 | 13.00 | 0.9747 | 0.8980 | 16.7845 | |
OL1/2 | 12.50 | 0.9749 | 0.9191 | 18.1372 | |
SL1/2 | 11.80 | 0.9771 | 0.9304 | 16.8050 | |
OGL | 12.20 | 0.9749 | 0.9191 | 18.3792 | |
SGL | 11.50 | 0.9780 | 0.9314 | 17.3763 | |
Olitos | NoPenalty | 13.00 | 0.9573 | 0.8914 | 29.9742 |
L2 | 12.33 | 0.9592 | 0.9053 | 29.3393 | |
OL1/2 | 12.00 | 0.9604 | 0.9160 | 30.6111 | |
SL1/2 | 11.67 | 0.9589 | 0.9245 | 30.3067 | |
OGL | 12.00 | 0.9617 | 0.9264 | 33.0595 | |
SGL | 11.00 | 0.9628 | 0.9355 | 30.5098 | |
Seeds | NoPenalty | 13.33 | 0.9737 | 0.9522 | 13.4081 |
L2 | 12.67 | 0.9761 | 0.9554 | 13.5387 | |
OL1/2 | 12.33 | 0.9791 | 0.9582 | 13.8205 | |
SL1/2 | 11.67 | 0.9797 | 0.9676 | 13.2730 | |
OGL | 12.33 | 0.9792 | 0.9629 | 14.7718 | |
SGL | 11.33 | 0.9813 | 0.9749 | 12.7701 | |
Iris | NoPenalty | 13.67 | 0.9715 | 0.9296 | 13.4447 |
L2 | 13.00 | 0.9719 | 0.9390 | 13.4420 | |
OL1/2 | 12.67 | 0.9723 | 0.9458 | 14.3637 | |
SL1/2 | 12.33 | 0.9743 | 0.9554 | 13.5318 | |
OGL | 12.33 | 0.9748 | 0.9522 | 16.1023 | |
SGL | 11.00 | 0.9791 | 0.9629 | 14.2630 | |
Wine | NoPenalty | 12.67 | 0.9872 | 0.9729 | 20.4322 |
L2 | 12.67 | 0.9892 | 0.9753 | 20.4173 | |
OL1/2 | 12.00 | 0.9896 | 0.9770 | 21.3012 | |
SL1/2 | 11.50 | 0.9911 | 0.9814 | 20.5545 | |
OGL | 11.67 | 0.9906 | 0.9798 | 21.1104 | |
SGL | 10.50 | 0.9915 | 0.9833 | 20.1607 | |
Liver | NoPenalty | 13.33 | 0.9937 | 0.9823 | 15.5081 |
L2 | 12.67 | 0.9943 | 0.9838 | 15.4588 | |
OL1/2 | 12.33 | 0.9947 | 0.9858 | 16.4440 | |
SL1/2 | 11.33 | 0.9951 | 0.9861 | 15.9798 | |
OGL | 11.00 | 0.9948 | 0.9868 | 17.4374 | |
SGL | 10.33 | 0.9962 | 0.9902 | 16.1645 | |
Sonar | NoPenalty | 12.67 | 0.9825 | 0.9756 | 12.6535 |
L2 | 12.67 | 0.9831 | 0.9787 | 12.1906 | |
OL1/2 | 12.33 | 0.9860 | 0.9830 | 12.9125 | |
SL1/2 | 12.00 | 0.9909 | 0.9852 | 12.5690 | |
OGL | 12.33 | 0.9892 | 0.9849 | 13.7648 | |
SGL | 11.67 | 0.9918 | 0.9860 | 11.7338 | |
Diabetes | NoPenalty | 13.00 | 0.9925 | 0.9937 | 17.7933 |
L2 | 12.50 | 0.9936 | 0.9947 | 17.1156 | |
OL1/2 | 11.67 | 0.9954 | 0.9950 | 17.4822 | |
SL1/2 | 11.33 | 0.9967 | 0.9955 | 17.1170 | |
OGL | 11.67 | 0.9961 | 0.9953 | 18.4761 | |
SGL | 11.33 | 0.9978 | 0.9961 | 17.3682 |
In addition, we have also compared our approach with other existing methods. In [45], the authors considered the group Lasso regularization method on the Sigma-Pi-Sigma neural network. In [46], the authors applied the group L1/2 regularization term on high-order neural networks. Our proposed elastic net regularization method is on par with these approaches.
For the parity problem, there is an input set of 2n samples in n-dimensional space, and every sample is an n-bit binary vector. We consider a 5-bit parity problem which has an input set with 25 samples in 5-dimensional space; the ideal output equals to 1 if the number of 1 in the samples is odd, otherwise it equals to zero. Here, using the above method, we test the performance of our proposed smoothing elastic net regularizer.
Similarly, we can study the structure of SPSNNs. We select P=13, N=4, Q=16, and 1 for the number of the nodes of the input, ∑1, ∏, and ∑2 layers, separately. In the interval [−0.5,0.5], the initial weights are randomly selected, the learning rate η is 0.0045, and the regular factor τ is 0.001. For each learning algorithm we carry out 20 experiments, and we train up to 40, 000 steps or we stop once the error is less than 1e−4. So, as to assess the sparsity and convergence of the smoothing elastic net regularizer, we compare the average error (AVE) and the number of remaining hidden neurons after pruning (RNN) with the no regularizer, the L2 regularizer, the original L1/2 regularizer, the smoothing L1/2 regularizer, the original group Lasso regularizer, and the smoothing elastic net regularizer, which are listed in Table 3.
Learning Methods | AVE | RNN |
NoPenalty | 0.004433 | 17.00 |
L2 | 0.003967 | 17.00 |
OL1/2 | 0.003929 | 17.14 |
SL1/2 | 0.004033 | 17.33 |
OGL | 0.003925 | 17.00 |
SGL | 0.003471 | 16.71 |
The results show that the proposed smooth elastic net regularizer outperforms the no regularizer, the L2 regularizer, the original L1/2 regularizer, the smoothing L1/2 regularizer, and the original group Lasso regularizer.
Figure 2(a) shows the error performance of the original group Lasso regularizer and smoothing elastic net regularizer via the 5-bit parity problem. Figure 2(b) shows that the norm of the gradient curve of the error function, based on the 5-bit parity problem, approaches a small positive constant. This indicates that the smoothing elastic net regularizer removes the oscillation of occurring in the original group Lasso regularizer in the learning process.
In this example, we study the multi-dimensional Gabor function to compare the approximation performance of the above algorithms.
k(x,y)=12π(0.5)2exp(−x2+y22(0.5)2)cos(2π(x+y)). |
As described at the beginning of this paper, we learn the structure of the SPSNNs. Then, we select P=13, N=4, Q=16, and 1 for the number of the nodes of the input layer, ∑1, ∏, and ∑2 layers, separately.
In this experiment, we select 169 training samples from an evenly spaced 6×6 grid on the square −0.5≤x≤0.5 and −0.5≤y≤0.5. In the interval [−0.5,0.5], the initial weights are randomly selected, the learning rate η is 0.0028, and the regular factor τ is 0.001. For each learning algorithm we carry out 20 experiments, and we train up to 40, 000 times or until the error is less than 1e−4 stop iterations.
To assess the sparsity and convergence of the smoothing elastic net regularizer, we compare the average error (AVE) and the number of remaining hidden neurons after pruning (RNN) with the no regularizer, the L2 regularizer, the original L1/2 regularizer, the smoothing L1/2 regularizer, the original group Lasso regularizer, and the smoothing elastic net regularizer, which are shown in Table 4. We can see our proposed smoothing elastic net regularizer is superior to the no regularizer, the L2 regularizer, the original L1/2 regularizer, the smoothing L1/2 regularizer, and the original group Lasso regularizer.
Learning Methods | AVE | RNN |
NoPenalty | 0.003940 | 18.00 |
L2 | 0.003560 | 18.00 |
OL1/2 | 0.003783 | 17.67 |
SL1/2 | 0.004486 | 17.37 |
OGL | 0.003523 | 16.87 |
SGL | 0.003286 | 16.14 |
For each learning algorithm, we show the error function and the norm of gradient of one of the 20 experiments after 40, 000 epochs in Figures 3–5. Figure 3(a) shows the oscillation phenomenon of no regularizer, the L2 regularizer, the original L1/2 regularizer and the original group lasso regularizer. Figure 3(b) shows the error curve of the smoothing L1/2 regularizer and the smoothing elastic net regularizer. Figure 4(a) shows the norm of gradient curve of the no regularizer, the L2 regularizer, the original L1/2 regularizer, and the original group Lasso regularizer. Figure 4(b) shows the norm of the gradient curve of the smoothing L1/2 regularizer, and the smoothing elastic net regularizer. Obviously, it approaches to a small positive constant. Figure 5 shows a typical performance for one of 20 experiments, and we can see that it has a good approximation effect compared with other algorithms. In each learning algorithm for the same parameters, we get the corresponding results. We can see the learning method with the smoothing elastic net regularizer converges faster than other learning methods, and the smoothing elastic net regularizer method overcomes the numerical oscillation phenomenon. It also shows that during the iterative process the error function curves are monotonically decreasing and converge to zero, which also validates our theoretical results.
To verify the effectiveness of the algorithm further, this part takes an interval shield tunneling project of Metro Line 9 in Zhengzhou City, China, as an example. In order to monitor the impact of the subway shield tunneling process on the surface buildings and structures, 10-meter intervals are used to set up settlement observation points in advance in each shield tunneling section, and the surrounding structures and buildings are monitored. JGC1, the closest settlement observation point of the signal tower to the right line of the shield structure, is selected as the objective of study, and the Leica DNA03 level is used to collect data 10 days before the shield structure is excavated to the closest point of JGC1, for a total of 40 days, and the frequency of observation is once a day. In this experiment, we use the data of the first 30 days as the training data set and the data of the last 10 days as the test data set (see Table 5).
Time | JGC1 (mm) | Time | JGC1 (mm) |
2015.3.18 | +0.21000 | 2015.3.31 | 0.003286 |
2015.3.19 | -0.13000 | 2015.4.2 | +0.10000 |
2015.3.20 | +0.11000 | 2015.4.3 | -0.02000 |
2015.3.21 | -0.06000 | 2015.4.4 | +0.08000 |
2015.3.22 | -0.23000 | 2015.4.5 | -0.16000 |
2015.3.23 | -0.23000 | 2015.4.6 | +0.06000 |
2015.3.24 | -0.15000 | 2015.4.7 | -0.18000 |
2015.3.25 | -0.03000 | 2015.4.8 | +0.28000 |
2015.3.26 | 0.003286 | 2015.4.9 | -0.07000 |
2015.3.27 | 0.003286 | 2015.4.10 | +0.07000 |
Time | JGC1 (mm) | Time | JGC1 (mm) |
2015.4.11 | +0.01000 | 2015.4.22 | +0.01000 |
2015.4.12 | -0.01000 | 2015.4.23 | 0.00000 |
2015.4.13 | -0.15000 | 2015.4.24 | +0.19000 |
2015.4.14 | +0.08000 | 2015.4.25 | -0.21000 |
2015.4.15 | -0.09000 | 2015.4.26 | +0.10000 |
2015.4.16 | +0.14000 | 2015.4.27 | -0.12000 |
2015.4.17 | -0.02000 | 2015.4.28 | +0.05000 |
2015.4.18 | -0.26000 | 2015.4.29 | +0.15000 |
2015.4.19 | +0.42000 | 2015.4.30 | -0.17000 |
2015.4.20 | -0.22000 | 2015.5.1 | -0.07000 |
In this experiment, we learn the structure of the SPSNNs. Then, we select P=5, N=4, Q=16, and 1 for the number of the nodes of the input layer, ∑1, ∑2, and the output layer, respectively. We use the sigmod activation function at the ∑1 and output layers, respectively, and our stopping criteria in this experiment is an error of less than 1×10−5 or 5000 iterations.
Figure 6 is the error curve of the training set with 5000 iterations, in which red is the error curve without the regularization term and blue is the error curve with the smoothing elastic net regularization, it can be obtained that the error of the network with the smoothing elastic net regularization decreases faster, and after 500 iterations, the error is smaller than that without the regularization term, which precisely verifies the theoretical results of this paper and the effectiveness of the proposed algorithm.
In this paper, a new batch gradient algorithm for SPSNNs with an L1 plus L2 regularization algorithm is proposed as an effective weight pruning technique. It can handle multi-output regression and multi-class classification problems within a unified framework. This algorithm obtains good performance in both Lasso and ridge regression, penalizing the weights by reducing the weight vectors to zero, which is more efficient than other various pruning strategies. Moreover, the theoretical results and the advantages of this algorithm are also illustrated by numerical experiments.
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
Conceptualization, methodology, original draft preparation, J. Jiao; software, editing, K. Su.
All authors declare there is no conflict of interest.
[1] |
H. Ahmad, M. N. Khan, I. Ahmad, M. Omri, M. F. Alotaibi, A meshless method for numerical solutions of linear and nonlinear time-fractional Black-Scholes models, AIMS Mathematics, 8 (2023), 19677–19698. http://dx.doi.org/10.3934/math.20231003 doi: 10.3934/math.20231003
![]() |
[2] |
H. Ahmad, D. U. Ozsahin, U. Farooq, M. A. Fahmy, M. D. Albalwi, H. Abu-Zinadah, Comparative analysis of new approximate analytical method and Mohand variational transform method for the solution of wave-like equations with variable coefficients, Results Phys., 51 (2023), 106623. http://dx.doi.org/10.1016/j.rinp.2023.106623 doi: 10.1016/j.rinp.2023.106623
![]() |
[3] |
A. A. H. Ahmadini, M. Khuddush, S. N. Rao, Multiple positive solutions for a system of fractional order BVP with p-Laplacian operators and parameters, Axiom, 12 (2023), 974. http://doi.org/10.3390/axioms12100974 doi: 10.3390/axioms12100974
![]() |
[4] |
K. A. Aldwoah, M. A. Almalahi, M. A. Abdulwasaa, K. Shah, S. V. Kawale, M. Awadalla, et al., Mathematical analysis and numerical simulations of the piecewise dynamics model of Malaria transmission: A case study in Yemen, AIMS Mathematics, 9 (2024), 4376–4408. http://dx.doi.org/10.3934/math.2024216 doi: 10.3934/math.2024216
![]() |
[5] |
K. A. Aldwoah, M. A. Almalahi, K. Shah, Theoretical and numerical simulations on the hepatitis B virus model through a piecewise fractional order, Fractal Fract., 7 (2023), 844. http://dx.doi.org/10.3390/fractalfract7120844 doi: 10.3390/fractalfract7120844
![]() |
[6] |
M. A. Almalahi, S. K. Panchal, W. Shatanawi, M. S. Abdo, K. Shah, K. Abodayeh, Analytical study of transmission dynamics of 2019-nCoV pandemic via fractal fractional operator, Results Phys., 24 (2021), 104045. http://dx.doi.org/10.1016/j.rinp.2021.104045 doi: 10.1016/j.rinp.2021.104045
![]() |
[7] |
M. Al-Refai, Proper inverse operators of fractional derivatives with nonsingular kernels, Rend. Circ. Mat. Palermo, II. Ser., 71 (2022), 525–535. http://dx.doi.org/10.1007/s12215-021-00638-2 doi: 10.1007/s12215-021-00638-2
![]() |
[8] |
M. Al-Refai, D. Baleanu, On an extension of the operator with Mittag-Leffler kernel, Fractals, 30 (2022), 2240129. http://dx.doi.org/10.1142/S0218348X22401296 doi: 10.1142/S0218348X22401296
![]() |
[9] |
S. M. Alzahrani, R. Saadeh, M. A. Abdoon, A. Qazza, F. El Guma, M. Berir, Numerical simulation of an influenza epidemic: Prediction with fractional SEIR and the ARIMA model, Appl. Math. Inf. Sci., 18 (2024), 1–12. http://dx.doi.org/10.18576/amis/180101 doi: 10.18576/amis/180101
![]() |
[10] |
A. Atangana, D. Baleanu, New fractional derivatives with nonlocal and non-singular kernel: Theory and application to heat transfer model, Thermal Sci., 20 (2016), 763–769. http://dx.doi.org/10.2298/TSCI160111018A doi: 10.2298/TSCI160111018A
![]() |
[11] |
D. Baleanu, J. Alzabut, J. M. Jonnalagadda, Y. Adjabi, M. M. Matar, A coupled system of generalized Sturm-Liouville problems and Langevin fractional differential equations in the framework of nonlocal and nonsingular derivatives, Adv. Differ. Equ., 239 (2020), 239. http://dx.doi.org/10.1186/s13662-020-02690-1 doi: 10.1186/s13662-020-02690-1
![]() |
[12] |
I. M. Batiha, A. Ouannas, R. Albadarneh, A. A. Al-Nana, S. Momani, Existence and uniqueness of solutions for generalized Sturm-Liouville and Langevin equations via Caputo-Hadamard fractional-order operator, Eng. Comput., 39 (2022), 2581–2603. https://doi.org/10.1108/EC-07-2021-0393 doi: 10.1108/EC-07-2021-0393
![]() |
[13] |
A. Boutiara, M. S. Abdo, M. A. Almalahi, K. Shah, B. Abdalla, T. Abdeljawad, Study of Sturm-Liouville boundary value problems with p-Laplacian by using generalized form of fractional order derivative, AIMS Mathematics, 7 (2022), 18360–18376. http://dx.doi.org/10.3934/math.20221011 doi: 10.3934/math.20221011
![]() |
[14] |
A. Boutiara, M. Benbachir, S. Etemad, S. Rezapour, Kuratowski MNC method on a generalized fractional Caputo Sturm-Liouville-Langevin q-difference problem with generalized Ulam-Hyers stability, Adv. Differ. Equ., 2021 (2021), 454. http://dx.doi.org/10.1186/s13662-021-03619-y doi: 10.1186/s13662-021-03619-y
![]() |
[15] |
A. Ercan, Comparative analysis for fractional nonlinear Sturm-Liouville equations with singular and non-singular kernels, AIMS Mathematics, 7 (2022), 13325–13343. http://dx.doi.org/10.3934/math.2022736 doi: 10.3934/math.2022736
![]() |
[16] | A. Berhail, N. Tabouche, M. M. Matar, J. Alzabut, Boundary value problem defined by system of generalized Sturm-Liouville and Langevin Hadamard fractional differential equations, Math. Methods Appl. Sci., 2020. http://dx.doi.org/10.1002/mma.6507 |
[17] |
K. S. Eiman, M. Sarwar, T. Abdeljawad, On rotavirus infectious disease model using piecewise modified ABC fractional order derivative, Netw. Heterog. Media, 19 (2024), 214–234. http://dx.doi.org/10.3934/nhm.2024010 doi: 10.3934/nhm.2024010
![]() |
[18] | R. Gorenflo, A. A. Kilbas, F. Mainardi, S. V. Rogosin, Mittag-Leffler functions, related topics and applications, Heidelberg: Springer Berlin, 2014. |
[19] | A. Granas, J. Dugundji, Fixed point theory, New York: Springer, 2003. https://doi.org/10.1007/978-0-387-21593-8 |
[20] |
T. Guo, O. Nikan, Z. Avazzadeh, W. Qiu, Efficient alternating direction implicit numerical approaches for multi-dimensional distributed-order fractional integro differential problems, Comput. Appl. Math., 41 (2022), 236. http://dx.doi.org/10.1007/s40314-022-01934-y doi: 10.1007/s40314-022-01934-y
![]() |
[21] |
R. Khalil, M. Al Horani, A. Yousef, M. Sababheh, A new definition of fractional derivative, J. Comput. Appl. Math., 264 (2014), 65–70. https://doi.org/10.1016/j.cam.2014.01.002 doi: 10.1016/j.cam.2014.01.002
![]() |
[22] |
H. Khan, J. Alzabut, J. F. Gómez-Aguilar, A. Alkhazan, Essential criteria for existence of solution of a modified-ABC fractional order smoking model, Ain Shams Eng. J., 15 (2024), 102646. http://dx.doi.org/10.1016/j.asej.2024.102646 doi: 10.1016/j.asej.2024.102646
![]() |
[23] |
H. Khan, J. Alzabut, W. F. Alfwzan, H. Gulzar, Nonlinear dynamics of a piecewise modified ABC fractional-order leukemia model with symmetric numerical simulations, Symmetry, 15 (2023), 1338. http://dx.doi.org/10.3390/sym15071338 doi: 10.3390/sym15071338
![]() |
[24] |
H. Khan, J. Alzabut, H. Gulzar, Existence of solutions for hybrid modified ABC-fractional differential equations with p-Laplacian operator and an application to a waterborne disease model, Alex. Eng. J., 70 (2023), 665–672. http://dx.doi.org/10.1016/j.aej.2023.02.045 doi: 10.1016/j.aej.2023.02.045
![]() |
[25] |
H. Khan, J. Alzabut, D. Baleanu, G. Alobaidi, M. U. Rehman, Existence of solutions and a numerical scheme for a generalized hybrid class of n-coupled modified ABC-fractional differential equations with an application, AIMS Mathematics, 8 (2023), 6609–6625. http://dx.doi.org/10.3934/math.2023334 doi: 10.3934/math.2023334
![]() |
[26] | A. A. Kilbas, H. M. Srivastava, J. J. Trujillo, Theory and applications of fractional differential equations, Elsevier, 204 (2006), 1–523. |
[27] |
W. Li, J. Ji, L. Huang, L. Zhang, Global dynamics and control of malicious signal transmission in wireless sensor networks, Nonlinear Anal. Hybrid Syst., 48 (2023), 101324. http://dx.doi.org/10.1016/j.nahs.2022.101324 doi: 10.1016/j.nahs.2022.101324
![]() |
[28] |
L. J. Muhammad, E. A. Algehyne, S. S. Usman, A. Ahmad, C. Chakraborty, I. A. Mohammed, Supervised machine learning models for prediction of COVID-19 infection using epidemiology dataset, SN Comput. Sci., 2 (2021), 11. http://dx.doi.org/10.1007/s42979-020-00394-7 doi: 10.1007/s42979-020-00394-7
![]() |
[29] | I. Podlubny, Fractional differential equations, Elsevier, 198 (1998), 1–340. |
[30] |
M. Rafiq, K. Muhammad, H. Ahmad, A. Saliu, Critical analysis for nonlinear oscillations by least square, Sci. Rep., 14 (2024), 1456. http://dx.doi.org/10.1038/s41598-024-51706-3 doi: 10.1038/s41598-024-51706-3
![]() |
[31] |
M. ur Rahman, M. Yavuz, M. Arfan, A. Sami, Theoretical and numerical investigation of a modified ABC fractional operator for the spread of polio under the effect of vaccination, AIMS Biophys., 11 (2024), 97–120. http://dx.doi.org/10.3934/biophy.2024007 doi: 10.3934/biophy.2024007
![]() |
[32] |
S. N. Rao, M. Alesemi, On a coupled system of fractional differential equations with nonlocal non-separated boundary conditions, Adv. Differ. Equ., 2019 (2019), 97. https://doi.org/10.1186/s13662-019-2035-2 doi: 10.1186/s13662-019-2035-2
![]() |
[33] |
H. O. Sidi, M. J. Huntul, M. O. Sidi, H. Emadifar, Identifying an unknown coefficient in the fractional parabolic differential equation, Results Appl. Math., 19 (2023), 100356. https://doi.org/10.1016/j.rinam.2023.100386 doi: 10.1016/j.rinam.2023.100386
![]() |
[34] | D. R. Smart, Fixed point theorems, CUP Archive, 1980. |
[35] |
C. Urs, Coupled fixed point theorems and applications to periodic boundary value problems, Miskolc Math. Notes, 14 (2013), 323–333. http://dx.doi.org/10.18514/MMN.2013.598 doi: 10.18514/MMN.2013.598
![]() |
[36] |
B. Wang, Q. Zhu, S. Li, Stabilization of discrete-time hidden semi-Markov jump linear systems with partly unknown emission probability matrix, IEEE Trans. Automat. Control, 99 (2023), 1952–1959. http://dx.doi.org/10.1109/TAC.2023.3272190 doi: 10.1109/TAC.2023.3272190
![]() |
1. | Khidir Shaib Mohamed, Ibrhim M. A. Suliman, Mahmoud I. Alfeel, Abdalilah Alhalangy, Faiza A. Almostafa, Ekram Adam, A Modified High-Order Neural Network with Smoothing L1 Regularization and Momentum Terms, 2025, 19, 1863-1703, 10.1007/s11760-025-03973-4 | |
2. | Khidir Shaib Mohamed, Suhail Abdullah Alsaqer, Tahir Bashir, Ibrhim. M. A. Suliman, Convergence analysis of gradient descent based on smoothing L0 regularization and momentum terms, 2025, 1598-5865, 10.1007/s12190-024-02353-4 |
Dataset | Dataset Size | Training set | Testing set | Attributes | Classes |
Ecoli | 336 | 224 | 112 | 8 | 7 |
Olitos | 120 | 80 | 40 | 25 | 4 |
Seeds | 210 | 147 | 63 | 7 | 3 |
Iris | 150 | 105 | 45 | 4 | 3 |
Wine | 178 | 120 | 58 | 13 | 3 |
Liver | 345 | 240 | 105 | 7 | 2 |
Sonar | 208 | 138 | 70 | 60 | 2 |
Diabetes | 768 | 526 | 242 | 8 | 2 |
Dataset | Algorithm | RNN | Training accuracy | Testing accuracy | Training time(s) |
Ecoli | NoPenalty | 13.30 | 0.9761 | 0.9082 | 17.2756 |
L2 | 13.00 | 0.9747 | 0.8980 | 16.7845 | |
OL1/2 | 12.50 | 0.9749 | 0.9191 | 18.1372 | |
SL1/2 | 11.80 | 0.9771 | 0.9304 | 16.8050 | |
OGL | 12.20 | 0.9749 | 0.9191 | 18.3792 | |
SGL | 11.50 | 0.9780 | 0.9314 | 17.3763 | |
Olitos | NoPenalty | 13.00 | 0.9573 | 0.8914 | 29.9742 |
L2 | 12.33 | 0.9592 | 0.9053 | 29.3393 | |
OL1/2 | 12.00 | 0.9604 | 0.9160 | 30.6111 | |
SL1/2 | 11.67 | 0.9589 | 0.9245 | 30.3067 | |
OGL | 12.00 | 0.9617 | 0.9264 | 33.0595 | |
SGL | 11.00 | 0.9628 | 0.9355 | 30.5098 | |
Seeds | NoPenalty | 13.33 | 0.9737 | 0.9522 | 13.4081 |
L2 | 12.67 | 0.9761 | 0.9554 | 13.5387 | |
OL1/2 | 12.33 | 0.9791 | 0.9582 | 13.8205 | |
SL1/2 | 11.67 | 0.9797 | 0.9676 | 13.2730 | |
OGL | 12.33 | 0.9792 | 0.9629 | 14.7718 | |
SGL | 11.33 | 0.9813 | 0.9749 | 12.7701 | |
Iris | NoPenalty | 13.67 | 0.9715 | 0.9296 | 13.4447 |
L2 | 13.00 | 0.9719 | 0.9390 | 13.4420 | |
OL1/2 | 12.67 | 0.9723 | 0.9458 | 14.3637 | |
SL1/2 | 12.33 | 0.9743 | 0.9554 | 13.5318 | |
OGL | 12.33 | 0.9748 | 0.9522 | 16.1023 | |
SGL | 11.00 | 0.9791 | 0.9629 | 14.2630 | |
Wine | NoPenalty | 12.67 | 0.9872 | 0.9729 | 20.4322 |
L2 | 12.67 | 0.9892 | 0.9753 | 20.4173 | |
OL1/2 | 12.00 | 0.9896 | 0.9770 | 21.3012 | |
SL1/2 | 11.50 | 0.9911 | 0.9814 | 20.5545 | |
OGL | 11.67 | 0.9906 | 0.9798 | 21.1104 | |
SGL | 10.50 | 0.9915 | 0.9833 | 20.1607 | |
Liver | NoPenalty | 13.33 | 0.9937 | 0.9823 | 15.5081 |
L2 | 12.67 | 0.9943 | 0.9838 | 15.4588 | |
OL1/2 | 12.33 | 0.9947 | 0.9858 | 16.4440 | |
SL1/2 | 11.33 | 0.9951 | 0.9861 | 15.9798 | |
OGL | 11.00 | 0.9948 | 0.9868 | 17.4374 | |
SGL | 10.33 | 0.9962 | 0.9902 | 16.1645 | |
Sonar | NoPenalty | 12.67 | 0.9825 | 0.9756 | 12.6535 |
L2 | 12.67 | 0.9831 | 0.9787 | 12.1906 | |
OL1/2 | 12.33 | 0.9860 | 0.9830 | 12.9125 | |
SL1/2 | 12.00 | 0.9909 | 0.9852 | 12.5690 | |
OGL | 12.33 | 0.9892 | 0.9849 | 13.7648 | |
SGL | 11.67 | 0.9918 | 0.9860 | 11.7338 | |
Diabetes | NoPenalty | 13.00 | 0.9925 | 0.9937 | 17.7933 |
L2 | 12.50 | 0.9936 | 0.9947 | 17.1156 | |
OL1/2 | 11.67 | 0.9954 | 0.9950 | 17.4822 | |
SL1/2 | 11.33 | 0.9967 | 0.9955 | 17.1170 | |
OGL | 11.67 | 0.9961 | 0.9953 | 18.4761 | |
SGL | 11.33 | 0.9978 | 0.9961 | 17.3682 |
Learning Methods | AVE | RNN |
NoPenalty | 0.004433 | 17.00 |
L2 | 0.003967 | 17.00 |
OL1/2 | 0.003929 | 17.14 |
SL1/2 | 0.004033 | 17.33 |
OGL | 0.003925 | 17.00 |
SGL | 0.003471 | 16.71 |
Learning Methods | AVE | RNN |
NoPenalty | 0.003940 | 18.00 |
L2 | 0.003560 | 18.00 |
OL1/2 | 0.003783 | 17.67 |
SL1/2 | 0.004486 | 17.37 |
OGL | 0.003523 | 16.87 |
SGL | 0.003286 | 16.14 |
Time | JGC1 (mm) | Time | JGC1 (mm) |
2015.3.18 | +0.21000 | 2015.3.31 | 0.003286 |
2015.3.19 | -0.13000 | 2015.4.2 | +0.10000 |
2015.3.20 | +0.11000 | 2015.4.3 | -0.02000 |
2015.3.21 | -0.06000 | 2015.4.4 | +0.08000 |
2015.3.22 | -0.23000 | 2015.4.5 | -0.16000 |
2015.3.23 | -0.23000 | 2015.4.6 | +0.06000 |
2015.3.24 | -0.15000 | 2015.4.7 | -0.18000 |
2015.3.25 | -0.03000 | 2015.4.8 | +0.28000 |
2015.3.26 | 0.003286 | 2015.4.9 | -0.07000 |
2015.3.27 | 0.003286 | 2015.4.10 | +0.07000 |
Time | JGC1 (mm) | Time | JGC1 (mm) |
2015.4.11 | +0.01000 | 2015.4.22 | +0.01000 |
2015.4.12 | -0.01000 | 2015.4.23 | 0.00000 |
2015.4.13 | -0.15000 | 2015.4.24 | +0.19000 |
2015.4.14 | +0.08000 | 2015.4.25 | -0.21000 |
2015.4.15 | -0.09000 | 2015.4.26 | +0.10000 |
2015.4.16 | +0.14000 | 2015.4.27 | -0.12000 |
2015.4.17 | -0.02000 | 2015.4.28 | +0.05000 |
2015.4.18 | -0.26000 | 2015.4.29 | +0.15000 |
2015.4.19 | +0.42000 | 2015.4.30 | -0.17000 |
2015.4.20 | -0.22000 | 2015.5.1 | -0.07000 |
Dataset | Dataset Size | Training set | Testing set | Attributes | Classes |
Ecoli | 336 | 224 | 112 | 8 | 7 |
Olitos | 120 | 80 | 40 | 25 | 4 |
Seeds | 210 | 147 | 63 | 7 | 3 |
Iris | 150 | 105 | 45 | 4 | 3 |
Wine | 178 | 120 | 58 | 13 | 3 |
Liver | 345 | 240 | 105 | 7 | 2 |
Sonar | 208 | 138 | 70 | 60 | 2 |
Diabetes | 768 | 526 | 242 | 8 | 2 |
Dataset | Algorithm | RNN | Training accuracy | Testing accuracy | Training time(s) |
Ecoli | NoPenalty | 13.30 | 0.9761 | 0.9082 | 17.2756 |
L2 | 13.00 | 0.9747 | 0.8980 | 16.7845 | |
OL1/2 | 12.50 | 0.9749 | 0.9191 | 18.1372 | |
SL1/2 | 11.80 | 0.9771 | 0.9304 | 16.8050 | |
OGL | 12.20 | 0.9749 | 0.9191 | 18.3792 | |
SGL | 11.50 | 0.9780 | 0.9314 | 17.3763 | |
Olitos | NoPenalty | 13.00 | 0.9573 | 0.8914 | 29.9742 |
L2 | 12.33 | 0.9592 | 0.9053 | 29.3393 | |
OL1/2 | 12.00 | 0.9604 | 0.9160 | 30.6111 | |
SL1/2 | 11.67 | 0.9589 | 0.9245 | 30.3067 | |
OGL | 12.00 | 0.9617 | 0.9264 | 33.0595 | |
SGL | 11.00 | 0.9628 | 0.9355 | 30.5098 | |
Seeds | NoPenalty | 13.33 | 0.9737 | 0.9522 | 13.4081 |
L2 | 12.67 | 0.9761 | 0.9554 | 13.5387 | |
OL1/2 | 12.33 | 0.9791 | 0.9582 | 13.8205 | |
SL1/2 | 11.67 | 0.9797 | 0.9676 | 13.2730 | |
OGL | 12.33 | 0.9792 | 0.9629 | 14.7718 | |
SGL | 11.33 | 0.9813 | 0.9749 | 12.7701 | |
Iris | NoPenalty | 13.67 | 0.9715 | 0.9296 | 13.4447 |
L2 | 13.00 | 0.9719 | 0.9390 | 13.4420 | |
OL1/2 | 12.67 | 0.9723 | 0.9458 | 14.3637 | |
SL1/2 | 12.33 | 0.9743 | 0.9554 | 13.5318 | |
OGL | 12.33 | 0.9748 | 0.9522 | 16.1023 | |
SGL | 11.00 | 0.9791 | 0.9629 | 14.2630 | |
Wine | NoPenalty | 12.67 | 0.9872 | 0.9729 | 20.4322 |
L2 | 12.67 | 0.9892 | 0.9753 | 20.4173 | |
OL1/2 | 12.00 | 0.9896 | 0.9770 | 21.3012 | |
SL1/2 | 11.50 | 0.9911 | 0.9814 | 20.5545 | |
OGL | 11.67 | 0.9906 | 0.9798 | 21.1104 | |
SGL | 10.50 | 0.9915 | 0.9833 | 20.1607 | |
Liver | NoPenalty | 13.33 | 0.9937 | 0.9823 | 15.5081 |
L2 | 12.67 | 0.9943 | 0.9838 | 15.4588 | |
OL1/2 | 12.33 | 0.9947 | 0.9858 | 16.4440 | |
SL1/2 | 11.33 | 0.9951 | 0.9861 | 15.9798 | |
OGL | 11.00 | 0.9948 | 0.9868 | 17.4374 | |
SGL | 10.33 | 0.9962 | 0.9902 | 16.1645 | |
Sonar | NoPenalty | 12.67 | 0.9825 | 0.9756 | 12.6535 |
L2 | 12.67 | 0.9831 | 0.9787 | 12.1906 | |
OL1/2 | 12.33 | 0.9860 | 0.9830 | 12.9125 | |
SL1/2 | 12.00 | 0.9909 | 0.9852 | 12.5690 | |
OGL | 12.33 | 0.9892 | 0.9849 | 13.7648 | |
SGL | 11.67 | 0.9918 | 0.9860 | 11.7338 | |
Diabetes | NoPenalty | 13.00 | 0.9925 | 0.9937 | 17.7933 |
L2 | 12.50 | 0.9936 | 0.9947 | 17.1156 | |
OL1/2 | 11.67 | 0.9954 | 0.9950 | 17.4822 | |
SL1/2 | 11.33 | 0.9967 | 0.9955 | 17.1170 | |
OGL | 11.67 | 0.9961 | 0.9953 | 18.4761 | |
SGL | 11.33 | 0.9978 | 0.9961 | 17.3682 |
Learning Methods | AVE | RNN |
NoPenalty | 0.004433 | 17.00 |
L2 | 0.003967 | 17.00 |
OL1/2 | 0.003929 | 17.14 |
SL1/2 | 0.004033 | 17.33 |
OGL | 0.003925 | 17.00 |
SGL | 0.003471 | 16.71 |
Learning Methods | AVE | RNN |
NoPenalty | 0.003940 | 18.00 |
L2 | 0.003560 | 18.00 |
OL1/2 | 0.003783 | 17.67 |
SL1/2 | 0.004486 | 17.37 |
OGL | 0.003523 | 16.87 |
SGL | 0.003286 | 16.14 |
Time | JGC1 (mm) | Time | JGC1 (mm) |
2015.3.18 | +0.21000 | 2015.3.31 | 0.003286 |
2015.3.19 | -0.13000 | 2015.4.2 | +0.10000 |
2015.3.20 | +0.11000 | 2015.4.3 | -0.02000 |
2015.3.21 | -0.06000 | 2015.4.4 | +0.08000 |
2015.3.22 | -0.23000 | 2015.4.5 | -0.16000 |
2015.3.23 | -0.23000 | 2015.4.6 | +0.06000 |
2015.3.24 | -0.15000 | 2015.4.7 | -0.18000 |
2015.3.25 | -0.03000 | 2015.4.8 | +0.28000 |
2015.3.26 | 0.003286 | 2015.4.9 | -0.07000 |
2015.3.27 | 0.003286 | 2015.4.10 | +0.07000 |
Time | JGC1 (mm) | Time | JGC1 (mm) |
2015.4.11 | +0.01000 | 2015.4.22 | +0.01000 |
2015.4.12 | -0.01000 | 2015.4.23 | 0.00000 |
2015.4.13 | -0.15000 | 2015.4.24 | +0.19000 |
2015.4.14 | +0.08000 | 2015.4.25 | -0.21000 |
2015.4.15 | -0.09000 | 2015.4.26 | +0.10000 |
2015.4.16 | +0.14000 | 2015.4.27 | -0.12000 |
2015.4.17 | -0.02000 | 2015.4.28 | +0.05000 |
2015.4.18 | -0.26000 | 2015.4.29 | +0.15000 |
2015.4.19 | +0.42000 | 2015.4.30 | -0.17000 |
2015.4.20 | -0.22000 | 2015.5.1 | -0.07000 |