Research article

Sufficiency for singular trajectories in the calculus of variations

  • For calculus of variations problems of Bolza with variable end-points, nonlinear inequality and equality isoperimetric constraints and nonlinear inequality and equality mixed pointwise constraints, sufficient conditions for strong minima are derived. The main novelty of the new sufficiency results presented in this article concerns their applicability to cases in which the derivatives of the extremals to be optimal solutions are not necessarily continuous nor piecewise continuous but only essentially bounded and they do not necessarily satisfy the standard strengthened Legendre condition but only the corresponding necessary condition.

    Citation: Gerardo Sánchez Licea. Sufficiency for singular trajectories in the calculus of variations[J]. AIMS Mathematics, 2020, 5(1): 111-139. doi: 10.3934/math.2020008

    Related Papers:

    [1] Huiting Zhang, Yuying Yuan, Sisi Li, Yongxin Yuan . The least-squares solutions of the matrix equation AXB+BXA=D and its optimal approximation. AIMS Mathematics, 2022, 7(3): 3680-3691. doi: 10.3934/math.2022203
    [2] J. Alberto Conejero, Antonio Falcó, María Mora–Jiménez . A pre-processing procedure for the implementation of the greedy rank-one algorithm to solve high-dimensional linear systems. AIMS Mathematics, 2023, 8(11): 25633-25653. doi: 10.3934/math.20231308
    [3] Yinlan Chen, Wenting Duan . The Hermitian solution to a matrix inequality under linear constraint. AIMS Mathematics, 2024, 9(8): 20163-20172. doi: 10.3934/math.2024982
    [4] Yinlan Chen, Min Zeng, Ranran Fan, Yongxin Yuan . The solutions of two classes of dual matrix equations. AIMS Mathematics, 2023, 8(10): 23016-23031. doi: 10.3934/math.20231171
    [5] Jeong-Kweon Seo, Byeong-Chun Shin . Reduced-order modeling using the frequency-domain method for parabolic partial differential equations. AIMS Mathematics, 2023, 8(7): 15255-15268. doi: 10.3934/math.2023779
    [6] Yinlan Chen, Lina Liu . A direct method for updating piezoelectric smart structural models based on measured modal data. AIMS Mathematics, 2023, 8(10): 25262-25274. doi: 10.3934/math.20231288
    [7] Justin Eilertsen, Marc R. Roussel, Santiago Schnell, Sebastian Walcher . On the quasi-steady-state approximation in an open Michaelis–Menten reaction mechanism. AIMS Mathematics, 2021, 6(7): 6781-6814. doi: 10.3934/math.2021398
    [8] Wanlin Jiang, Kezheng Zuo . Revisiting of the BT-inverse of matrices. AIMS Mathematics, 2021, 6(3): 2607-2622. doi: 10.3934/math.2021158
    [9] Yonghong Duan, Ruiping Wen . An alternating direction power-method for computing the largest singular value and singular vectors of a matrix. AIMS Mathematics, 2023, 8(1): 1127-1138. doi: 10.3934/math.2023056
    [10] Muammer Ayata, Ozan Özkan . A new application of conformable Laplace decomposition method for fractional Newell-Whitehead-Segel equation. AIMS Mathematics, 2020, 5(6): 7402-7412. doi: 10.3934/math.2020474
  • For calculus of variations problems of Bolza with variable end-points, nonlinear inequality and equality isoperimetric constraints and nonlinear inequality and equality mixed pointwise constraints, sufficient conditions for strong minima are derived. The main novelty of the new sufficiency results presented in this article concerns their applicability to cases in which the derivatives of the extremals to be optimal solutions are not necessarily continuous nor piecewise continuous but only essentially bounded and they do not necessarily satisfy the standard strengthened Legendre condition but only the corresponding necessary condition.


    The Sigma-Pi-Sigma neural network (SPSNN) is a feed-forward neural network composed of Sigma-Pi units, which can be used to achieve a static mapping of multi-layer neural networks [1,2,3]. In [4], a new model which can be regarded as a subclass of networks on Sigma-Pi units was considered, and the authors showed the origin of the Kronecker product representation from the classical Sigma-Pi units. In [5], a Sigma-Pi network and a new arithmetic were proposed, by which the output representation self-organizes to form a topographic map, whose main contribution was to solve the frame of reference transformation issues through unsupervised learning. Furthermore, in [6,7,8], the approximation, convergence performance, and generalization ability of sparse Sigma-Pi network functions were studied, and it was shown that the new algorithms were more efficient than those in the existing literatures.

    It is widely recognized that neural network optimization has emerged as a highly significant research topic in recent times. Study on neural network optimization consists of two main topics: One is weight optimization, that is, for a given network structure, select the appropriate learning method to seek the optimal weight such that the training error and the generalization error are small enough [9,10]. The second is structural optimization, namely the selection of appropriate activation function, network layer number, connection mode, and so on [11,12,13]. But, the research on neural network structure optimization is far less rich than that on weight optimization. On the other hand, there is no literature showing that more hidden layer neurons yield better generalization ability.

    When it comes to neural networks, the number of neurons is a crucial factor. There are two common methods for determining the size of networks. The first method is the growing method, where the network starts with a smaller size and new hidden neurons are added during the training process [14]. Another method is the pruning method [15,16,17,18,19], which begins with a larger network and eventually removes redundant nodes.

    These kind of algorithms separates weight learning and weight training, which is inefficient. There are also many slightly more complex algorithms, which further introduce various mechanisms like particle swarm optimization [20], genetic algorithms [21], eigenvalue analysis [22], statistical analysis, and synthetic minority over-sampling techniques [23,24], so as to enhance the sparsification efficiency. The disadvantage of these algorithms is that the program is complex and the calculation is large.

    An appropriately sized network structure is instrumental in enhancing efficiency. Overfitting poses a significant challenge during network training and is particularly problematic for deep neural network learning [25]. Consequently, researchers have extensively explored various forms of sparse regularization techniques, highlighting their indispensability.

    Recent years, Lp regularization is diffusely used to solve variable selection and parameter estimation problems in machine learning. This regularization method takes the form

    E(W)=ˆE(W)+τWpp,

    where ˆE is the normal error function, Wp=(i|Wi|p)1/p denotes the p-norm, τ is the penalty coefficient, and represents the euclidean norm. This regularization term is also named the penalty term.

    In general, there are several common forms of regularization: weight elimination, weight decay, and approximate smoothing [26,27,28,29,30,31,32]. Among them, weight elimination is widely used as the penalty term in pruning feedforward neural networks, mainly to reduce unnecessary connections or optimize the network weights [33]. A little more detailed introduction for different penalty terms are as follows.

    For different values of p, the L2 norm to the standard error function makes it more optimized [34,35,36,37]. This practice is called L2 regularization, and the form is shown as follows:

    E(W)=ˆE(W)+τW22,

    where W22 denotes the penalty term, and the L2 norm solution is popular because of its special relationship with the normal distributions. It can serve as brute force to avoid excessive weights. But, unfortunately, it is not sparse. This means that, during the training process, the L2 regularization can not drive unnecessary weights to zero.

    The L0 regularization term is the ordinary method for feature extraction and variable selection. Constrained by the number of coefficients, the L0 regularizer produces the most sparse solutions, which are difficult to calculate, and it is a combinatory optimization problem [38,39].

    As an alternative to the L0 regularization term, the increasingly important L1 regularization term (Lasso) has become popular since it just needs to resolve a quadratic programming problem [40]. In [41], the L1-norm was combined with the capped L1-norm to indicate the amount of information collected by the filter and the control regularization. The L1 regularization penalty function is generally denoted as follows:

    E(W)=ˆE(W)+τW1.

    It was shown in [42,43] that, although these algorithms can generate an alternate neural network structure, they do not offer a unified neural network framework to solve a class of problems.

    In order to explore more appropriate neural network structures to get over the obstacles posed by suboptimal models, we propose a new SPSNN algorithm based on the L1 penalty and the L2 penalty to deal with complex and varied tasks within a unified framework to improve the robustness and generality of the model. Based on the L1 norm, an L2 norm is presented in SPSNNs to promote the population sparsity effect to select the relevant hidden node population. Therefore, our proposed variant algorithm benefits from ridge regression and the tendency towards sparse solutions of the L1 penalty, which generates a more suitable neural network structure than using one of the regular terms alone, and the elastic net algorithm can also be used to solve for these hybrid penalties [44].

    In this paper, the penalty method is considered in the case of selecting the weights, which not only overcomes the shortcomings of the L1 and L2 norms, (both L1 and L2 penalties are used in the same minimization problem), but also has good generalization ability and sparsity. Thus, the mathematical expression for this hybrid penalty as an error function can be expressed as follows:

    E(W)=ˆE(W)+τ1W1+τ2W22,

    where the tuning constants τ1 and τ2 are fixed and non-negative. The role of τ1 provides a choice of variables through a sparse vector, and τ2 ensures a unique solution and leads to a grouping effect. A penalty term is added that is a convex combination of the L1 norm W1 and the L2 norm W22 of the parameter W.

    During the training progress, the usual mixed regularization terms are not differentiable at the origin, which usually give rise to oscillation phenomenon. Therefore, we propose a new smoothing algorithm to get over these difficulties. That is, we can use the smoothing technique of the weight in the neighborhood of the origin of the error function instead of the absolute value of the weight. The main contribution and novelty of this article are as follows:

    (1) In this article, in order to obtain the optimal architecture with good generalization performance, we will eliminate the weights in the hidden layer by using the idea of elastic net regularization. It means that, by incorporating the L1 and L2 regularization terms, this novel algorithm not only eliminates unnecessary weights but also performs pruning of the network structure in the hidden layers. This pruning reduces the size of the network, leading to an effective optimization of the network structure.

    (2) We propose an SPSNN based on elastic net regularized batch gradient methods to obtain an optimal architecture with good generalization performance. By means of smoothing technology, we effectively get over the oscillation phenomenon in the process of network learning.

    (3) To test the accuracy and robustness, we apply the proposed method to a large number of regression and classification tasks and compare it with other algorithms that have good sparsity and generalization capabilities.

    The rest of the article is arranged as follows: In Section 2 the SPSNN is presented, the batch gradient algorithm for this model is depicted in Section 3, and the novel pruning algorithm based on L1 plus L2 penalty term is depicted for more details. Some numerical experiments are provided in Section 4, and finally some conclusions are made.

    Next, we mainly depict the fabric of SPSNNs, which are composed of an input layer, a hidden layer of summing nodes(1), a hidden layer of product nodes(), and output layer(2). P, N, Q, and 1 represent the number of the nodes of the input layer, 1 layer, layer, and 2 layer, respectively (see Fig. 1)

    Figure 1.  Structure of an SPSNN.

    We can use W0=(w01,w02,w0Q)TRQ to express the weight vector which connects 2 layer and the layer. Let the weight vector connecting the layer and the 1 layer be fixed as 1. Meanwhile, using Wn=(wn1,wn2,wnP)T as the weight vector which connects the input layer and the nth node of the 1 layer, we can set

    W=(W0T,W1T,WNT)TRQ+NP.

    Let us use X=(x1,x2,xP)RP to describe the input of the networks. Assume that g:RR is a given sigmoid activation function for the 1 layer. We denote the output vector ξRN of the 1 layer as

    ξ=(g(W1X),g(W2X),g(WNX))T,

    where denotes the inner product between vectors.

    Similarly, we take δ=(δ1,δ2,δQ)T to indicate the output vector of the layer. The nodes of the network are connected in two different ways. One is fully connected between the 1 layer and the layer, and the other one is sparsely connected. The difference between them lies in the number of product nodes: the former has 2N product nodes, and the latter is less than 2N. We can clearly see the structure of the SPSNNs in Figure 1.

    Here, Hq(1qQ) represents the set of nodes in the 1 layer connected with the q-th product node, while the set of all product nodes connected with the n-th node in the 1 layer is represented by Fn(1nN). We can suppose an arbitrary muster italicize, and φ(a) is the number of elements in muster italicize, and we have

    Qq=1φ(Hq)=Nn=1φ(Fn),

    which will be used later.

    We can compute the output vector δRQ in the layer by

    δq=iHqξi,1qQ.

    To make convention, we denote iHqξi=1 when Hq=. In the SPSNNs, the final output can be written as

    y=g(W0δ).

    We introduce the batch gradient algorithm for SPSNNs. Let {Xl,Ol}Ll=1RP×R be the given set of the training samples, where Xl denotes the lth input sample and the Ol is the lth corresponding ideal output. Let ylR be the real output for each input Xl. The conventional square error function can be given as

    ˆE(W)=12Ll=1(g(W0δl)Ol)2=Ll=112(g(W0δl)Ol)2=Ll=1gl(W0δl),

    where gl(z)=12(g(z)Ol)2,zR,1lL.

    For convenience, we need to provide the following forms:

    δl=(δl1,δl2,δlQ)=(iH1ξi,iH2ξi,iHQξi)=(iH1g(WiXl),iH2g(WiXl),,iHQg(WiXl)), (3.1)

    and thus, by virtue of

    ˆE(W)=Ll=1gl(Qq=1w0qiHQg(WiXl)) (3.2)

    and some calculation, we can gain

    ˆEW0(W)=Ll=1gl(W0δl)δl.

    It follows from δlq=iHQg(WiXl) that

    δlqWn={(iHqnξi)g(WnXl)Xl,ifq1andnHq,0,ifq1ornHq.

    Then, from the above equality and (3.2), we get

    ˆEWn(W)=Ll=1[gl(W0δl)Qq=1(w0qδlqWn)].

    To simplify the network structure, we add the elastic net regularization to optimize the network on the group level. So, we can get the corresponding form of the error function

    E(W)=Ll=1gl(W0δl)+τ(α(W01+Nn=1Wn1)    +(1α)(W022+Nn=1Wn22)), (3.3)

    where τ and α are the tuning parameters that control the performance of the penalty term. With the gradual increase of α, the L1 regular term dominates, and the error function gets closer to Lasso regression; with the gradual decrease of α, the L2 regular term dominates, and the error function gets closer to ridge regression. In particular, the error function is equivalent to that of ridge regression at α=0, and the cost function is equivalent to that of Lasso regression at α=1. Let τ1=τα and τ2=2τ(1α). Then, we have

    E(W)=Ll=1gl(W0δl)+τ1(W01+Nn=1Wn1)    +τ22(W022+Nn=1Wn22). (3.4)

    The gradient of the error function with respect to W0 and Wn are, respectively, given as

    EW0(W)=Ll=1gl(W0δl)δl+τ1W0W0+τ2W0 (3.5)

    and

    EWn(W)=Ll=1[gl(W0δl)qFn[w0q(iHqnξli)    ×g(WnXl)Xl]]+τ1WnWn+τ2Wn. (3.6)

    We notice that elastic net regularization in (3.4) is combined with L1 norm regularization and L2 norm regularization. It is clear that (3.4) is not differentiable at the origin, which will yield the oscillation phenomenon, and we propose a smoothing approximation method to overcome this problem caused by the non-smoothness. For any limited dimensional vector u and a fixed constant γ>0, we can define a smoothing function of u as follows:

    h(u,γ)={u,ifu>γ,u22γ+γ2,ifuγ. (3.7)

    We use (3.7) to approximate the elastic net regularization in (3.4). Furthermore, the gradient of h(u,γ) with respect to the vector u is given as follows:

    uh(u,γ)={uu,ifu>γ,uγ,ifuγ.

    Accordingly, the error function (3.4) can be rewritten as

    E(W)=Ll=1gl(W0δl)+τ1[h(W0,γ)+Nn=1h(Wn,γ)]+τ22(W022+Nn=1Wn22). (3.8)

    According to (3.8), we can get a smoothing elastic net Sigma-Pi-Sigma neural network as

    EW0(W)=Ll=1gl(W0δl)δlq+τ1W0h(W0,γ)+τ2W0, (3.9)
    EWn(W)=Ll=1gl(W0δl)qFnw0q(iHqnξli)    ×g(WnXl)Xl+τ1Wnh(Wn,γ)+τ2Wn, (3.10)

    where l=1,2,,L.

    Beginning with an arbitrary initial weight vector W0, by the following iterative formula we define the weight sequence

    Wk+1=Wk+Wk, (3.11)
    Wk0=ηEW0(Wk), (3.12)
    Wkn=ηEWn(Wk), (3.13)

    where η represents the learning rate.

    In this section, the performance of the models with no regularizer, the L2 regularizer, the original L1/2 regularizer (OL1/2), the smoothing L1/2 regularizer (SL1/2), and the original group lasso regularizer (OGL) algorithms are compared with the smoothing elastic net regularizer algorithm (SGL) by using four examples: classification problem, parity problem, function approximation problem, and prediction problem.

    In this example, we choose 8 benchmark data sets from the UCI machine learning repository to test the performance of the new algorithm (SGL), and compare it with the no regularizer, the L2 regularizer, the OL1/2, the SL1/2, and the OGL algorithms.

    Table 1 presents the main characteristics of the relevant data sets, which includes the size of datasets, attributes, categories, and sizes of the training and testing sets, where the dataset is randomly partitioned into two subsets: 70% for training and 30% for testing.

    Table 1.  Detailed description of the classification data sets.
    Dataset Dataset Size Training set Testing set Attributes Classes
    Ecoli 336 224 112 8 7
    Olitos 120 80 40 25 4
    Seeds 210 147 63 7 3
    Iris 150 105 45 4 3
    Wine 178 120 58 13 3
    Liver 345 240 105 7 2
    Sonar 208 138 70 60 2
    Diabetes 768 526 242 8 2

     | Show Table
    DownLoad: CSV

    As described at the beginning of this paper, we learn the structure of SPSNNs (see Figure 1). Then, we select P=13, N=4, Q=16, and 1, representing the number of the nodes of the input, the 1, the , and the 2 layers, respectively. For each learning algorithm, the initial weights are randomly selected in the interval [0.5,0.5], the learning rate η is 0.0028, and the regular factor τ is 0.001, and we conduct 20 trials for every data set to compare the performance of different algorithms.

    To assess the performance of the smoothing elastic net regularizer, based on each data set, we compare the number of remaining hidden neurons after pruning (RNN), the training accuracy testing accuracy, and training time for each algorithm, and all experimental results are recorded in Table 2. From the table, it can be observed that the training accuracy has slightly improved, while the testing accuracy has increased by approximately 1% to 3%. We can find our proposed smoothing elastic net regularizer is superior to the no regularizer, the L2 regularizer, the original L1/2 regularizer, the smoothing L1/2 regularizer, and the original group Lasso regularizer algorithms.

    Table 2.  Performance comparison for classification problems.
    Dataset Algorithm RNN Training accuracy Testing accuracy Training time(s)
    Ecoli NoPenalty 13.30 0.9761 0.9082 17.2756
    L2 13.00 0.9747 0.8980 16.7845
    OL1/2 12.50 0.9749 0.9191 18.1372
    SL1/2 11.80 0.9771 0.9304 16.8050
    OGL 12.20 0.9749 0.9191 18.3792
    SGL 11.50 0.9780 0.9314 17.3763
    Olitos NoPenalty 13.00 0.9573 0.8914 29.9742
    L2 12.33 0.9592 0.9053 29.3393
    OL1/2 12.00 0.9604 0.9160 30.6111
    SL1/2 11.67 0.9589 0.9245 30.3067
    OGL 12.00 0.9617 0.9264 33.0595
    SGL 11.00 0.9628 0.9355 30.5098
    Seeds NoPenalty 13.33 0.9737 0.9522 13.4081
    L2 12.67 0.9761 0.9554 13.5387
    OL1/2 12.33 0.9791 0.9582 13.8205
    SL1/2 11.67 0.9797 0.9676 13.2730
    OGL 12.33 0.9792 0.9629 14.7718
    SGL 11.33 0.9813 0.9749 12.7701
    Iris NoPenalty 13.67 0.9715 0.9296 13.4447
    L2 13.00 0.9719 0.9390 13.4420
    OL1/2 12.67 0.9723 0.9458 14.3637
    SL1/2 12.33 0.9743 0.9554 13.5318
    OGL 12.33 0.9748 0.9522 16.1023
    SGL 11.00 0.9791 0.9629 14.2630
    Wine NoPenalty 12.67 0.9872 0.9729 20.4322
    L2 12.67 0.9892 0.9753 20.4173
    OL1/2 12.00 0.9896 0.9770 21.3012
    SL1/2 11.50 0.9911 0.9814 20.5545
    OGL 11.67 0.9906 0.9798 21.1104
    SGL 10.50 0.9915 0.9833 20.1607
    Liver NoPenalty 13.33 0.9937 0.9823 15.5081
    L2 12.67 0.9943 0.9838 15.4588
    OL1/2 12.33 0.9947 0.9858 16.4440
    SL1/2 11.33 0.9951 0.9861 15.9798
    OGL 11.00 0.9948 0.9868 17.4374
    SGL 10.33 0.9962 0.9902 16.1645
    Sonar NoPenalty 12.67 0.9825 0.9756 12.6535
    L2 12.67 0.9831 0.9787 12.1906
    OL1/2 12.33 0.9860 0.9830 12.9125
    SL1/2 12.00 0.9909 0.9852 12.5690
    OGL 12.33 0.9892 0.9849 13.7648
    SGL 11.67 0.9918 0.9860 11.7338
    Diabetes NoPenalty 13.00 0.9925 0.9937 17.7933
    L2 12.50 0.9936 0.9947 17.1156
    OL1/2 11.67 0.9954 0.9950 17.4822
    SL1/2 11.33 0.9967 0.9955 17.1170
    OGL 11.67 0.9961 0.9953 18.4761
    SGL 11.33 0.9978 0.9961 17.3682

     | Show Table
    DownLoad: CSV

    In addition, we have also compared our approach with other existing methods. In [45], the authors considered the group Lasso regularization method on the Sigma-Pi-Sigma neural network. In [46], the authors applied the group L1/2 regularization term on high-order neural networks. Our proposed elastic net regularization method is on par with these approaches.

    For the parity problem, there is an input set of 2n samples in n-dimensional space, and every sample is an n-bit binary vector. We consider a 5-bit parity problem which has an input set with 25 samples in 5-dimensional space; the ideal output equals to 1 if the number of 1 in the samples is odd, otherwise it equals to zero. Here, using the above method, we test the performance of our proposed smoothing elastic net regularizer.

    Similarly, we can study the structure of SPSNNs. We select P=13, N=4, Q=16, and 1 for the number of the nodes of the input, 1, , and 2 layers, separately. In the interval [0.5,0.5], the initial weights are randomly selected, the learning rate η is 0.0045, and the regular factor τ is 0.001. For each learning algorithm we carry out 20 experiments, and we train up to 40, 000 steps or we stop once the error is less than 1e4. So, as to assess the sparsity and convergence of the smoothing elastic net regularizer, we compare the average error (AVE) and the number of remaining hidden neurons after pruning (RNN) with the no regularizer, the L2 regularizer, the original L1/2 regularizer, the smoothing L1/2 regularizer, the original group Lasso regularizer, and the smoothing elastic net regularizer, which are listed in Table 3.

    Table 3.  Emulation results for 5-bit parity problem.
    Learning Methods AVE RNN
    NoPenalty 0.004433 17.00
    L2 0.003967 17.00
    OL1/2 0.003929 17.14
    SL1/2 0.004033 17.33
    OGL 0.003925 17.00
    SGL 0.003471 16.71

     | Show Table
    DownLoad: CSV

    The results show that the proposed smooth elastic net regularizer outperforms the no regularizer, the L2 regularizer, the original L1/2 regularizer, the smoothing L1/2 regularizer, and the original group Lasso regularizer.

    Figure 2(a) shows the error performance of the original group Lasso regularizer and smoothing elastic net regularizer via the 5-bit parity problem. Figure 2(b) shows that the norm of the gradient curve of the error function, based on the 5-bit parity problem, approaches a small positive constant. This indicates that the smoothing elastic net regularizer removes the oscillation of occurring in the original group Lasso regularizer in the learning process.

    Figure 2.  The performance results based on 5-bit parity problem with OGL and SGL.

    In this example, we study the multi-dimensional Gabor function to compare the approximation performance of the above algorithms.

    k(x,y)=12π(0.5)2exp(x2+y22(0.5)2)cos(2π(x+y)).

    As described at the beginning of this paper, we learn the structure of the SPSNNs. Then, we select P=13, N=4, Q=16, and 1 for the number of the nodes of the input layer, 1, , and 2 layers, separately.

    In this experiment, we select 169 training samples from an evenly spaced 6×6 grid on the square 0.5x0.5 and 0.5y0.5. In the interval [0.5,0.5], the initial weights are randomly selected, the learning rate η is 0.0028, and the regular factor τ is 0.001. For each learning algorithm we carry out 20 experiments, and we train up to 40, 000 times or until the error is less than 1e4 stop iterations.

    To assess the sparsity and convergence of the smoothing elastic net regularizer, we compare the average error (AVE) and the number of remaining hidden neurons after pruning (RNN) with the no regularizer, the L2 regularizer, the original L1/2 regularizer, the smoothing L1/2 regularizer, the original group Lasso regularizer, and the smoothing elastic net regularizer, which are shown in Table 4. We can see our proposed smoothing elastic net regularizer is superior to the no regularizer, the L2 regularizer, the original L1/2 regularizer, the smoothing L1/2 regularizer, and the original group Lasso regularizer.

    Table 4.  Emulation results for identifying the Gabor function.
    Learning Methods AVE RNN
    NoPenalty 0.003940 18.00
    L2 0.003560 18.00
    OL1/2 0.003783 17.67
    SL1/2 0.004486 17.37
    OGL 0.003523 16.87
    SGL 0.003286 16.14

     | Show Table
    DownLoad: CSV

    For each learning algorithm, we show the error function and the norm of gradient of one of the 20 experiments after 40, 000 epochs in Figures 35. Figure 3(a) shows the oscillation phenomenon of no regularizer, the L2 regularizer, the original L1/2 regularizer and the original group lasso regularizer. Figure 3(b) shows the error curve of the smoothing L1/2 regularizer and the smoothing elastic net regularizer. Figure 4(a) shows the norm of gradient curve of the no regularizer, the L2 regularizer, the original L1/2 regularizer, and the original group Lasso regularizer. Figure 4(b) shows the norm of the gradient curve of the smoothing L1/2 regularizer, and the smoothing elastic net regularizer. Obviously, it approaches to a small positive constant. Figure 5 shows a typical performance for one of 20 experiments, and we can see that it has a good approximation effect compared with other algorithms. In each learning algorithm for the same parameters, we get the corresponding results. We can see the learning method with the smoothing elastic net regularizer converges faster than other learning methods, and the smoothing elastic net regularizer method overcomes the numerical oscillation phenomenon. It also shows that during the iterative process the error function curves are monotonically decreasing and converge to zero, which also validates our theoretical results.

    Figure 3.  The performance results of the error function for the above algorithms.
    Figure 4.  The performance results of the gradient norm for the above algorithms.
    Figure 5.  The approximation result of the smoothing elastic net regularizer.

    To verify the effectiveness of the algorithm further, this part takes an interval shield tunneling project of Metro Line 9 in Zhengzhou City, China, as an example. In order to monitor the impact of the subway shield tunneling process on the surface buildings and structures, 10-meter intervals are used to set up settlement observation points in advance in each shield tunneling section, and the surrounding structures and buildings are monitored. JGC1, the closest settlement observation point of the signal tower to the right line of the shield structure, is selected as the objective of study, and the Leica DNA03 level is used to collect data 10 days before the shield structure is excavated to the closest point of JGC1, for a total of 40 days, and the frequency of observation is once a day. In this experiment, we use the data of the first 30 days as the training data set and the data of the last 10 days as the test data set (see Table 5).

    Table 5.  Measured settlement of metro shield at point JGC1.
    Time JGC1 (mm) Time JGC1 (mm)
    2015.3.18 +0.21000 2015.3.31 0.003286
    2015.3.19 -0.13000 2015.4.2 +0.10000
    2015.3.20 +0.11000 2015.4.3 -0.02000
    2015.3.21 -0.06000 2015.4.4 +0.08000
    2015.3.22 -0.23000 2015.4.5 -0.16000
    2015.3.23 -0.23000 2015.4.6 +0.06000
    2015.3.24 -0.15000 2015.4.7 -0.18000
    2015.3.25 -0.03000 2015.4.8 +0.28000
    2015.3.26 0.003286 2015.4.9 -0.07000
    2015.3.27 0.003286 2015.4.10 +0.07000
    Time JGC1 (mm) Time JGC1 (mm)
    2015.4.11 +0.01000 2015.4.22 +0.01000
    2015.4.12 -0.01000 2015.4.23 0.00000
    2015.4.13 -0.15000 2015.4.24 +0.19000
    2015.4.14 +0.08000 2015.4.25 -0.21000
    2015.4.15 -0.09000 2015.4.26 +0.10000
    2015.4.16 +0.14000 2015.4.27 -0.12000
    2015.4.17 -0.02000 2015.4.28 +0.05000
    2015.4.18 -0.26000 2015.4.29 +0.15000
    2015.4.19 +0.42000 2015.4.30 -0.17000
    2015.4.20 -0.22000 2015.5.1 -0.07000

     | Show Table
    DownLoad: CSV

    In this experiment, we learn the structure of the SPSNNs. Then, we select P=5, N=4, Q=16, and 1 for the number of the nodes of the input layer, 1, 2, and the output layer, respectively. We use the sigmod activation function at the 1 and output layers, respectively, and our stopping criteria in this experiment is an error of less than 1×105 or 5000 iterations.

    Figure 6 is the error curve of the training set with 5000 iterations, in which red is the error curve without the regularization term and blue is the error curve with the smoothing elastic net regularization, it can be obtained that the error of the network with the smoothing elastic net regularization decreases faster, and after 500 iterations, the error is smaller than that without the regularization term, which precisely verifies the theoretical results of this paper and the effectiveness of the proposed algorithm.

    Figure 6.  The curve of the error function for no penalty and for the smoothing elastic net.

    In this paper, a new batch gradient algorithm for SPSNNs with an L1 plus L2 regularization algorithm is proposed as an effective weight pruning technique. It can handle multi-output regression and multi-class classification problems within a unified framework. This algorithm obtains good performance in both Lasso and ridge regression, penalizing the weights by reducing the weight vectors to zero, which is more efficient than other various pruning strategies. Moreover, the theoretical results and the advantages of this algorithm are also illustrated by numerical experiments.

    The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

    Conceptualization, methodology, original draft preparation, J. Jiao; software, editing, K. Su.

    All authors declare there is no conflict of interest.



    [1] G. A. Bliss, Lectures on the Calculus of Variations, Chicago: University of Chicago Press, 1946.
    [2] O. Bolza, Lectures on the Calculus of Variations, New York: Chelse Press, 1961.
    [3] U. Brechtken-Manderscheid, Introduction to the Calculus of Variations, London: Chapman & Hall, 1983.
    [4] L. Cesari, Optimization-Theory and Applications, Problems with Ordinary Differential Equations, New York: Springer-Verlag, 1983.
    [5] F. H. Clarke, Functional Analysis, Calculus of Variations and Optimal Control, New York: Springer-Verlag, 2013.
    [6] G. M. Ewing, Calculus of Variations with Applications, New York: Dover, 1985.
    [7] I. M. Gelfand, S. V. Fomin, Calculus of Variations, New Jersey: Prentice-Hall, 1963.
    [8] M. Giaquinta, S. Hildebrant, Calculus of Variations I, New York: Springer-Verlag, 2004.
    [9] M. Giaquinta, S. Hildebrant, Calculus of Variations II, New York: Springer-Verlag, 2004.
    [10] M. R. Hestenes, Calculus of Variations and Optimal Control Theory, New York: John Wiley & Sons, 1966.
    [11] G. Leitmann, The Calculus of Variations and Optimal Control, New York: Plenum Press, 1981.
    [12] P. D. Loewen, Second-order sufficiency criteria and local convexity for equivalent problems in the calculus of variations, J. Math. Anal. Appl., 146 (1990), 512-522.
    [13] A. A. Milyutin, N. P. Osmolovskii, Calculus of Variations and Optimal Control, Rhode Island: American Mathematical Society, 1998.
    [14] M. Morse, Variational Analysis: Critical Extremals and Sturmian Extensions, New York: John Wiley & Sons, 1973.
    [15] F. Rindler, Calculus of Variations, Coventry: Springer, 2018.
    [16] J. L. Troutman, Variational Calculus with Elementary Convexity, New York: Springer-Verlag, 1983.
    [17] F. Y. M. Wan, Introduction to the Calculus of Variations and its Applications, New York: Chapman & Hall, 1995.
  • This article has been cited by:

    1. Khidir Shaib Mohamed, Ibrhim M. A. Suliman, Mahmoud I. Alfeel, Abdalilah Alhalangy, Faiza A. Almostafa, Ekram Adam, A Modified High-Order Neural Network with Smoothing L1 Regularization and Momentum Terms, 2025, 19, 1863-1703, 10.1007/s11760-025-03973-4
    2. Khidir Shaib Mohamed, Suhail Abdullah Alsaqer, Tahir Bashir, Ibrhim. M. A. Suliman, Convergence analysis of gradient descent based on smoothing L0 regularization and momentum terms, 2025, 1598-5865, 10.1007/s12190-024-02353-4
  • Reader Comments
  • © 2020 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(3979) PDF downloads(430) Cited by(1)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog