Research article Special Issues

Neural architecture search via standard machine learning methodologies

  • Received: 24 March 2021 Revised: 19 October 2021 Accepted: 13 January 2022 Published: 11 February 2022
  • In the context of deep learning, the more expensive computational phase is the full training of the learning methodology. Indeed, its effectiveness depends on the choice of proper values for the so-called hyperparameters, namely the parameters that are not trained during the learning process, and such a selection typically requires an extensive numerical investigation with the execution of a significant number of experimental trials. The aim of the paper is to investigate how to choose the hyperparameters related to both the architecture of a Convolutional Neural Network (CNN), such as the number of filters and the kernel size at each convolutional layer, and the optimisation algorithm employed to train the CNN itself, such as the steplength, the mini-batch size and the potential adoption of variance reduction techniques. The main contribution of the paper consists in introducing an automatic Machine Learning technique to set these hyperparameters in such a way that a measure of the CNN performance can be optimised. In particular, given a set of values for the hyperparameters, we propose a low-cost strategy to predict the performance of the corresponding CNN, based on its behavior after only few steps of the training process. To achieve this goal, we generate a dataset whose input samples are provided by a limited number of hyperparameter configurations together with the corresponding CNN measures of performance obtained with only few steps of the CNN training process, while the label of each input sample is the performance corresponding to a complete training of the CNN. Such dataset is used as training set for a Support Vector Machines for Regression and/or Random Forest techniques to predict the performance of the considered learning methodology, given its performance at the initial iterations of its learning process. Furthermore, by a probabilistic exploration of the hyperparameter space, we are able to find, at a quite low cost, the setting of a CNN hyperparameters which provides the optimal performance. The results of an extensive numerical experimentation, carried out on CNNs, together with the use of our performance predictor with NAS-Bench-101, highlight how the proposed methodology for the hyperparameter setting appears very promising.

    Citation: Giorgia Franchini, Valeria Ruggiero, Federica Porta, Luca Zanni. Neural architecture search via standard machine learning methodologies[J]. Mathematics in Engineering, 2023, 5(1): 1-21. doi: 10.3934/mine.2023012

    Related Papers:

    [1] Guoyi Li, Jun Wang, Kaibo Shi, Yiqian Tang . Some novel results for DNNs via relaxed Lyapunov functionals. Mathematical Modelling and Control, 2024, 4(1): 110-118. doi: 10.3934/mmc.2024010
    [2] Saravanan Shanmugam, R. Vadivel, S. Sabarathinam, P. Hammachukiattikul, Nallappan Gunasekaran . Enhancing synchronization criteria for fractional-order chaotic neural networks via intermittent control: an extended dissipativity approach. Mathematical Modelling and Control, 2025, 5(1): 31-47. doi: 10.3934/mmc.2025003
    [3] Gani Stamov, Ekaterina Gospodinova, Ivanka Stamova . Practical exponential stability with respect to hmanifolds of discontinuous delayed Cohen–Grossberg neural networks with variable impulsive perturbations. Mathematical Modelling and Control, 2021, 1(1): 26-34. doi: 10.3934/mmc.2021003
    [4] Bangxin Jiang, Yijun Lou, Jianquan Lu . Input-to-state stability of delayed systems with bounded-delay impulses. Mathematical Modelling and Control, 2022, 2(2): 44-54. doi: 10.3934/mmc.2022006
    [5] Hongwei Zheng, Yujuan Tian . Exponential stability of time-delay systems with highly nonlinear impulses involving delays. Mathematical Modelling and Control, 2025, 5(1): 103-120. doi: 10.3934/mmc.2025008
    [6] M. Haripriya, A. Manivannan, S. Dhanasekar, S. Lakshmanan . Finite-time synchronization of delayed complex dynamical networks via sampled-data controller. Mathematical Modelling and Control, 2025, 5(1): 73-84. doi: 10.3934/mmc.2025006
    [7] Shipeng Li . Impulsive control for stationary oscillation of nonlinear delay systems and applications. Mathematical Modelling and Control, 2023, 3(4): 267-277. doi: 10.3934/mmc.2023023
    [8] Qin Xu, Xiao Wang, Yicheng Liu . Emergent behavior of Cucker–Smale model with time-varying topological structures and reaction-type delays. Mathematical Modelling and Control, 2022, 2(4): 200-218. doi: 10.3934/mmc.2022020
    [9] Yanchao He, Yuzhen Bai . Finite-time stability and applications of positive switched linear delayed impulsive systems. Mathematical Modelling and Control, 2024, 4(2): 178-194. doi: 10.3934/mmc.2024016
    [10] Weisong Zhou, Kaihe Wang, Wei Zhu . Synchronization for discrete coupled fuzzy neural networks with uncertain information via observer-based impulsive control. Mathematical Modelling and Control, 2024, 4(1): 17-31. doi: 10.3934/mmc.2024003
  • In the context of deep learning, the more expensive computational phase is the full training of the learning methodology. Indeed, its effectiveness depends on the choice of proper values for the so-called hyperparameters, namely the parameters that are not trained during the learning process, and such a selection typically requires an extensive numerical investigation with the execution of a significant number of experimental trials. The aim of the paper is to investigate how to choose the hyperparameters related to both the architecture of a Convolutional Neural Network (CNN), such as the number of filters and the kernel size at each convolutional layer, and the optimisation algorithm employed to train the CNN itself, such as the steplength, the mini-batch size and the potential adoption of variance reduction techniques. The main contribution of the paper consists in introducing an automatic Machine Learning technique to set these hyperparameters in such a way that a measure of the CNN performance can be optimised. In particular, given a set of values for the hyperparameters, we propose a low-cost strategy to predict the performance of the corresponding CNN, based on its behavior after only few steps of the training process. To achieve this goal, we generate a dataset whose input samples are provided by a limited number of hyperparameter configurations together with the corresponding CNN measures of performance obtained with only few steps of the CNN training process, while the label of each input sample is the performance corresponding to a complete training of the CNN. Such dataset is used as training set for a Support Vector Machines for Regression and/or Random Forest techniques to predict the performance of the considered learning methodology, given its performance at the initial iterations of its learning process. Furthermore, by a probabilistic exploration of the hyperparameter space, we are able to find, at a quite low cost, the setting of a CNN hyperparameters which provides the optimal performance. The results of an extensive numerical experimentation, carried out on CNNs, together with the use of our performance predictor with NAS-Bench-101, highlight how the proposed methodology for the hyperparameter setting appears very promising.



    Since neural networks (NNs) are effective in modeling and describing nonlinear dynamics, there has been a remarkable surge in the utilization of NNs, which have contributed substantially to the fields of signal processing, image processing, combinatorial optimization, pattern recognition, and other scientific and engineering domains [1,2]. Theoretical advancements have laid a solid foundation for this progress, thereby facilitating the successful establishment of NNs as a powerful tool for a diverse range of applications. Consequently, the stability analysis of delayed neural networks (DNNs) has attracted many scholars [3,4].

    It should be noted that temporal lags are invariably encountered within NNs as a consequence of intrinsic factors, including but not limited to the finite velocity of information processing [5], which can lead to instability and degraded performance in numerous real-world applications of NNs. In this way, the prognostic capacity and resilience of the neural network would suffer severe impairment, thereby gravely undermining its efficacy and dependability. Hence, the assessment of the stability of a computing system must be conducted with great care and precision, employing accurate evaluation criteria that abide by the principles of scientific rigor to provide reliable guarantee for the system's operation. Thus, in recent years, researchers have been dedicated to analyzing the stability of DNNs and investing significant amounts of time and effort into reducing the conservatism of stability criteria [6,7,8]. In regards to the stability criterion, accommodating a wider range of delay tolerance would result in a less conservative estimate. Consequently, the upper limit of the delay range assumes critical significance in the assessment and quantification of the degree of conservativeness. The stability of NNs with variable delays has been widely studied using the Lyapunov-Krasovskii functionals (LKFs) technique because time delays in actual NNs are usually time-varying.

    In order to analyze and solve the stability problems of DNNs, common approaches such as the Lyapunov stability method and linear matrix inequalities (LMIs) are often adopted. Among these, the Lyapunov stability method is widely applied, while LMIs provide a useful descriptive framework for these systems. This method aims to create suitable LKFs to derive less conservative stability conditions, ensuring the stability of the studied DNN even with delays varying within the largest possible closed interval. Various types of augmented LKFs were introduced in [9,10,11,12,13,14,15,16,17,18,19,20,21,22] to investigate the asymptotic or exponential stability that is dependent on delay in DNNs with varying temporal delays. Utilizing the augmented LKF approach, numerous improved stability criteria were also established. Moreover, integral inequality, the free-weighting matrix method, and reciprocally convex combination, which are commonly used methods or techniques, have been utilized to obtain the stability criteria. To reduce the conservatism of stability criteria, recent works [10,13,14,15] have employed delayed state derivative terms as augmented vector elements to estimate the time derivative. Although the existing literature indicates that the resultant stability criteria are less conservative, it is worth noting that the dimensions of the criteria in the LMI formulation experience substantial expansion. Thus, enhanced LKFs generally increase the difficulty and complexity, resulting in a corresponding increase in the server's computational burden and time.

    The work done by the above scholars still requires symmetry in the construction of LKF. In [23], two novel delay-dependent stability criteria for time-delay systems are presented, utilizing LMIs. Both are established via asymmetric augmented LKFs, ensuring positivity without the need for all involved matrices in the LKFs to be symmetric and positive definite. In [24], the author used the same method to study the stability of Takagi-Sugeno fuzzy system.

    In this paper, the primary contribution can be outlined as follows:

    (1) An improved asymmetric LKF is proposed, which can be positively definite without requiring that all matrix variables be symmetric or positive-definite.

    (2) A new stability criterion is formulated by utilizing linear matrix inequalities incorporating integral inequality and reciprocally convex combination techniques.

    (3) Compared to traditional methods, this new approach has less conservatism and complexity, which enables it to more accurately characterize neural network stability issues.

    Ultimately, the efficacy and superiority of this novel approach were successfully demonstrated, corroborating its robustness and superiority over existing methodologies through a commonly employed numerical illustration, providing a feasible solution for practical engineering applications.

    Notations: Rn denotes the n-dimensional Euclidean vector space. Rn×m is the set of all n×m real matrices. Dn+ represents the set of positive-definite diagonal matrices of Rn×n. diag{} denotes a block-diagonal matrix. He(M)=M+MT. N stands for the sets of nonnegative integers.

    Lemma 2.1. (Jensen's inequality[25]) Given Q>0, for any continuous function

    ζ(θ):[δ1,δ2]Rn,

    the following inequality holds:

    (α1α2)α1α2ζT(θ)Qζ(θ)dθ(α1α2ζ(θ)dθ)TQ(α1α2ζ(θ)dθ),
    (α1α2)22α1α2α1s˙xT(u)Q˙x(u)duds (α1α2α1s˙x(u)duds)TQ(α1α2α1s˙x(u)duds).

    Lemma 2.2. (Wirtinger-based integral inequality[26]) Given R>0, for any continuous function

    ζ(η):[δ1,δ2]Rn,

    the following inequality holds:

    δ2δ1ζT(η)(s)Rζ(η)dη1δ2δ1(δ2δ1ζT(η)dη)R(δ2δ1ζ(η)dη)+3δ2δ1ΦTRΦ,

    where

    Φ=δ2δ1ζ(η)dη2δ2δ1δ2δ1δ1θζ(η)dηdθ,δ2δ1˙ζT(η)R˙ζ(η)dηπ2(δ2δ1)2δ2δ1ζT(η)(s)Rζ(η)dη.

    Lemma 2.3. (B-L inequality[27]) Given R>0, for any continuous function

    x(t):[ψ1,ψ2]Rn,

    the following inequality holds:

    ψ2ψ1˙xT(s)R˙x(s)ds 1ψ2ψ1ϑTN[Nk=0(2k+1)ΓTN(k)RΓN(k)]ϑN

    holds for any NN, where

    ϑN:= {col{x(ψ2),x(ψ1)},                             N=0,col{x(ψ2),x(ψ1),1ψ2ψ1Ω0,,1ψ2ψ1ΩN1},N1,m=1ψ2ψ1,ΓN(k):={[I  I],                                               N=0,[I  (1)k+1I  γ0NKI  γN1NKI],              N1,γjNk:={(2j+1)(1(1)k+j),              jk1,0,                                                       jk1.

    Consider the following DNNs:

    ˙x(t)=Ax(t)+W0f(x(t))+W1f(x(th(t))), (3.1)

    where

    x(t)=[x1(t),x2(t),,xn(t)]TRn

    represents the neuron state vector, and the activation functions are given by

    f(x(t))=[f1(x1(t)),f2(x2(t)),,fn(xn(t))]T,A=diag{a1,a2,,an}>0,  Wi(i=0,1)

    are the connection matrices. ht is the abbreviation of h(t) denotes a time-varying delay, which is a differentiable function satisfying:

    0hthm, ν1˙htν2, (3.2)

    where hm, ν1 and ν2 are the real constants.

    The activation functions f() is continuous, satisfy

    fi(0)=0

    and

    k1ifi(u1)fi(u2)u1u2k2i,  u1u2, (3.3)

    where k1i and k2i are known real constants, i=1,2,,m. For convenience, we define

    K1=diag{k11,k12,,k1m}

    and

    K2=diag{k21,k22,,k2m}.

    For simplicity, we define the following notations:

    ht=h(t),  fw(t)=f(x(t)),eTi=[0n(i1)n   Inn   0n(10i)n],  i=1,2 ,10,μ1=col{x(t),x(thm)},  μ2=col{x(tht),f(x(t))},μ3=col{f(x(th(t))),tthm˙x(s)ds},μ4=col{tthmx(u)du,ts x(u)du},μ5=col{tthmtsx(u)du,tthmthmθtsx(u)dudsdθ},ζ(t)=col{μ1,μ2,μ3,μ4,μ5}.

    Theorem 3.1. Given hm>0, system (1) is asymptotically stable if there exist matrices

    Q=[Q1Q2]

    with

    Q1=QT1,   P=PT=[P11 P12 P22],P11,P12,P22Rn×n,   QiRn×n, (i=1,2),TiRn×n>0, (i=1,2),   FiRn×n>0, (i=1,2),NiDn+, (i=1,2),

    any matrices S1Rn×n,GRn×n, such that

    R7=[T1GT1]>0, (3.4)
    R8=[Q1+T112Q2T2]>0, (3.5)
    R9=[Θ11Θ1200Θ22Θ233h1mQ2Θ33Θ3412h2mT2]>0, (3.6)
    R10=[Π11Π12Π22]<0, (3.7)
    Θ11=eT1[hmP11+h2mT1]e1,Θ12=eT1[hmP12hmT1]e7,Θ22=eT7[hmP22+4Q1+4T1]e7,Θ23=eT7[2Q26h1mQ16h1mT1]e9,Θ33=eT9[12h2mQ13h1mHe(Q2)+12h2mT1+4T2]e9,Θ34=eT9[6h2mQ26h1mT2]e10,Π11=eT1[Q1+h2mT2+He(P12AP11)+h2mAT1AK1R1S1+F1+(X1)T1]e1eT2[K1R2(X1)T1+Q1]e2eT3[T2Xπ2h2mT1]e3+eT1[He(12S1GPT12)]e2+eT1[He(P22AP12+12QT2)]e3eT2[He(P22+12Q2)]e3,Π12=eT1[PT11WT0h2mAT1W0+K2R1]e4+eT1[PT11WT1h2mAT1W1]e5+eT1S1e6+eT1[GT1]e7eT2K2R2e512eT2S1e6+eT2[GTT1]e7+eT3PT12WT0e4+eT3PT12WT1e5,Π22=eT4[F2R1+h2mW0T1W0]e4+eT4[h2mHe(W0T1×W1)]e5+eT5[h2mW1T1W1R2(1˙ht)F2]e5eT6S1e6+eT7[2T1He(G)(1ht)F1]e7.

    Proof. Consider the following LKF candidate:

    Vt=4i=1Vti, (3.8)

    where

    Vt1= ρT(t)Pρ(t), ρ(t)=[xT(t)tthmxT(u)du]T,Vt2=tthmxT(s)Q[xT(s)  tsxT(u)du]Tds,  Q=[Q1 Q2],Vt3=tthtxT(s)F1x(s)ds+hmtthmts˙xT(u)T1˙x(u)duds+hmtthmtsxT(u)T2x(u)duds,Vt4=tthtfTw(s)F2fw(s)ds.

    By Lemma 2.1, since T1>0 and T2>0, we obtain

    hmts˙xT(u)T1˙x(u)du(x(t)x(s))TT1(x(t)x(s)), (3.9)
    hmtsxT(u)T2x(u)dutsxT(u)duT2tsx(u)du. (3.10)

    Thus, we can infer

    Vt2+Vt3=tthtxT(s)F1x(s)ds+tthm[xT(s)Q[xT(s)   tsxT(u)du]T+hmts[˙xT(u)T1˙x(u)+xT(u)T2x(u)]du]dstthtxT(s)F1x(s)ds+tthm[x(t)x(s)tsx(u)du]TR6  [x(t)x(s)tsx(u)du]ds, (3.11)

    where

    R6=[T1T10Q1+T112Q2T2].

    Based on the above conditions, we can get

    4i=1VtiVt1+tthm[x(t)x(s)tsx(u)du]TR6  [x(t)x(s)tsx(u)du]ds+tthtxT(s)F1x(s)ds+tthtfT(x(s))F2f(x(s))ds1hmζT(t)R9ζ(t)+tthtxT(s)F1x(s)ds+tthtfT(x(s))F2f(x(s))ds>0. (3.12)

    The time derivatives of Vti,i{1,,4} along the trajectory of system (3.1) is given by:

    ˙Vt1=2˙ρT(t)Pρ(t),˙Vt2=xT(t)Q1x(t)+tthmxT(u)duQ2x(t)xT(thm)Q[xT(thm)  tthmxT(u)du]T,˙Vt3=h2m˙xT(t)T1˙x(t)hmtthm˙xT(u)T1˙x(u)du+h2mxT(t)T2x(t)hmtthmxT(u)T2x(u)du+xT(t)F1x(t)(1˙ht)xT(tht)F1x(tht),˙Vt4=fT(x(t))F2f(x(t))(1˙ht)fT(x(tht))F2f(x(tht)). (3.13)

    For any symmetric matrices S1, the following one equalities hold:

    [x(t)x(thm)+tthm˙x(s)ds]T×S1[x(t)x(thm)tthm˙x(s)ds]=[e1e2+e6]TS1[e1e2e6]. (3.14)

    According to (3.3), for any appropriate diagonal matrices

    Ni=diag{ni1,ni2,,nim}>0,  (i=1,2),

    we have

    0ni=1[fi(xi(t))k1ixi(t)]ni1[k2ixi(t)fi(xi(t))]+ni=1[fi(xi(th(t)))k1ixi(th(t)]ni2[k2ixi(th(t))fi(xi(th(t)))]=[f(x(t))K1x(t)]TN1[K2x(t)f(x(t))]+[f(x(th(t)))K1x(th(t)]TN2[K2x(th(t))f(x(th(t)))]. (3.15)

    Then

    0[x(t)f(x(t))]T[M1N1M2N1N1][x(t)f(x(t))]+[x(th(t))f(x(th(t)))]T[M1N2M2N2N2][x(th(t))f(x(th(t)))]=[e1e4]T[M1N1M2N1N1][e1e4]+[e3e5]T[M1N2M2N2N2][e3e5], (3.16)

    where

    M1=diag{k11k21,k12k22,,k1mk2m},M2=diag{k11+k212,k12+k222,,k1m+k2m2}.

    Based on (3.4) and (3.5), we can obtain

    hmtthm˙xT(u)T1˙x(u)du=αhmtthm˙xT(u)T1˙x(u)du(1α)hmtthm˙xT(u)T1˙x(u)duα[e1e2e3]T[T1MT1+MT1T1+MT2T1MMT][e1e2e3](1α)π2hmeT7T1e7. (3.17)

    According to Lemma 2.1, we can obtain

    hmtthmxT(u)T2x(u)dutthmxT(u)duT2tthmx(u)du. (3.18)

    By combining (3.16)–(3.18), one can derive that

    ˙VtζT(t)R10ζ(t). (3.19)

    The inequality (3.19) holds, so

    ζT(t)R10ζ(t)<0

    proves that ˙Vt<0. Therefore, if LMIs (3.4)–(3.7) hold, the system (3.1) is asymptotically stable.

    The proof is completed.

    Remark 3.1. The underlying expression is deftly rescaled through the integration of the Wirtinger-based integral inequality and the B-L inequality. Here, their coefficients are assigned as α and 1α, respectively. This approach provides a flexible transformation within the range of [0,1], which facilitates the pursuit of optimal amalgamation. In scenarios where α=1, thus making 1α=0, the outcome solely relies on scaling through the B-L inequality. Conversely, if α=0, which makes 1α=1, scaling exclusively uses the Wirtinger-based integral inequality.

    Remark 3.2. Numerous researchers have introduced various LKFs in the analysis of DNNs, traditionally requiring the matrices to be symmetric. However, this study optimizes this traditional constraint by devising asymmetric forms of LKFs. With such asymmetric constructs, it is not mandatory for each matrix to be symmetric or positive definite when setting the conditions. Consequently, the conditions become more relaxed, leading to a less conservative theorem, thus broadening the horizons of analysis in this domain.

    In this part, a numerical example is given to illustrate the effectiveness of the suggested stability criterion. The primary goal is to acquire an acceptable maximum upper bound (AMUB) that is deemed acceptable for the time-varying delays and provides assurance of the neural networks under consideration's global asymptotic stability. As the AMUB increases, the stability criterion becomes less conservative.

    Example 1. We consider DNNs (3.1) with the following parameters [17,18,21,28,29]:

    W0=[0.05030.04540.09870.2075],  W1=[0.23810.93200.03880.5062]

    and

    M1=[0,0;0,0],M2=[0.3,0;0,0.8],A=[1.5,0;0,0.7].

    In Table 1, the maximum time delay achieved for various values of μ within a specific range is presented for Theorem 1 and other related articles.

    Table 1.  The maximum allowable delays of ht with various μ for Example 1.
    Methods μ=0.4 μ=0.45 μ=0.5 μ=0.55
    [17]Th.1 7.6697 6.7287 6.4126 6.2569
    [21]Th.1 8.3186 7.2119 6.8373 6.6485
    [28]Th.1 8.3958 7.3107 7.0641 6.7829
    [29]Th.1 10.1095 8.6732 8.1733 7.8993
    [18]Th.3 13.8671 11.1174 10.0050 9.4157
    Theorem 1 13.7544 12.4328 11.4954 10.7915

     | Show Table
    DownLoad: CSV

    It is evident that the AMUB as derived from Theorem 1 surpasses that obtained in the previous work by Theorem 1 presented in [17,18,21,28,29]. Compared with Theorem 3 in [26], a larger AMUB is obtained when μ ranges from 0.45 to 0.55. When μ is 0.4, the result obtained is smaller than that obtained by [26]. Importantly, the number of NVs involved in Theorem 1, as shown in Table 2, is 16n2+12n, which is less than the 79.5n2+13.5n reported in [26]. This reduction in the number of NVs lowers the computational complexity from a computation perspective. When μ = 0.4 and the initial states are 0.5, 0.8, -0.5, and -0.8, the corresponding state trajectories are illustrated in Figure 1. For μ = 0.4, the illustration shows that the trajectories tend toward zero near the abscissa of around 350. With μ = 0.55 and the same initial states, the state trajectories are depicted in Figure 2, where they tend towards zero near the abscissa of approximately 300.

    Table 2.  Computational complexity.
    Approaches Number of DVs Number of LMIs
    Th.1[17] 15n2+16n 24
    Th.1[21] 39.5n2+29.5n 40
    Th.1[28] 15n2+11n 17
    Th.1[29] 87n2+41n 46
    Th.3[18] 79.5n2+13.5n 17
    Theorem 1 16n2+12n 10

     | Show Table
    DownLoad: CSV
    Figure 1.  The state trajectories with μ=0.4.
    Figure 2.  The state trajectories with μ=0.55.

    A suitable asymmetric LKF has been constructed, enabling us to demonstrate its positive definiteness without the necessity for all matrix variables to be symmetric or positive-definite. Additionally, a novel combinatorial optimization approach is employed to its full potential, utilizing the linear combination of multiple inequalities to identify the optimal arrangement and to process these inequalities. Therefore, the conditions presented in this study are shown to have less conservatism, and our newly proposed technique exhibits tremendous potential in terms of its theoretical and empirical capability to generate larger maximum allowable delays in comparison to select recent works in the literature. Furthermore, both theoretical and quantitative analyses confirm that our method notably reduces conservatism. Finally, the proposed method can be effectively combined with the delay segmentation technique to segment the delay interval into N segments for fine processing, which will be further investigated in future work.

    Throughout the preparation of this work, we utilized the AI-based proofreading tool "Grammarly" to identify and correct grammatical errors. Subsequently, we carefully reviewed the content and made any necessary additional edits. We take complete responsibility for the content of this publication.

    This work was supported by the National Natural Science Foundation of China under Grant (No. 12061088), the Key R & D Projects of Sichuan Provincial Department of Science and Technology (2023YFG0287) and Sichuan Natural Science Youth Fund Project (Nos. 24NSFSC7038 and 2024NSFSC1404).

    There are no conflicts of interest regarding this work.



    [1] T. Akiba, S. Sano, T. Yanase, T. Ohta, M. Koyama, Optuna: A next-generation hyperparameter optimization framework, In: Proceedings of the 25rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2019, 2623–2631. http://dx.doi.org/10.1145/3292500.3330701
    [2] B. Baker, O. Gupta, R. Raskar, N. Naik, Accelerating neural architecture search using performance prediction, 2017, arXiv: 1705.10823.
    [3] B. Baker, O. Gupta, N. Naik, R. Raskar, Designing neural network architectures using reinforcement learning, 2017, arXiv: 1611.02167.
    [4] J. F. Barrett, N. Keat, Artifacts in CT: recognition and avoidance, RadioGraphics, 24 (2004), 1679–1691. http://dx.doi.org/10.1148/rg.246045065 doi: 10.1148/rg.246045065
    [5] J. Bergstra, R. Bardenet, Y. Bengio, B. Kégl, Algorithms for hyper-parameter optimization, In: Advances in Neural Information Processing Systems, 2011, 2546–2554.
    [6] J. Bergstra, Y. Bengio, Random search for hyper-parameter optimization, J. Mach. Learn. Res., 13 (2012), 281–305.
    [7] L. Bottou, F. E. Curtis, J. Nocedal, Optimization methods for large-scale machine learning, SIAM Rev., 60 (2018), 223–311. http://dx.doi.org/10.1137/16M1080173 doi: 10.1137/16M1080173
    [8] L. Breiman, Random forests, Machine Learning, 45 (2001), 5–32. http://dx.doi.org/10.1023/A:1010933404324 doi: 10.1023/A:1010933404324
    [9] C. J. C. Burges, A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery, 2 (1998), 121–167. http://dx.doi.org/10.1023/A:1009715923555 doi: 10.1023/A:1009715923555
    [10] H. Cai, T. Chen, W. Zhang, Y. Yu, J. Wang, Efficient architecture search by network transformation, 2017, arXiv/1707.04873.
    [11] T. Domhan, J. T. Springenberg, F. Hutter, Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves, In: IJCAI International Joint Conference on Artificial Intelligence, 2015, 3460–3468.
    [12] T. Elsken, J. H. Metzen, F. Hutter, Neural architecture search: a survey, J. Mach. Learn. Res., 20 (2019), 1997–2017.
    [13] T. Elsken, J.-H. Metzen, F. Hutter, Simple and efficient architecture search for convolutional neural networks, 2017, arXiv: 1711.04528.
    [14] G. Franchini, M. Galinier, M. Verucchi, Mise en abyme with artificial intelligence: how to predict the accuracy of NN, applied to hyper-parameter tuning, In: INNSBDDL 2019: Recent advances in big data and deep learning, Cham: Springer, 2020,286–295. http://dx.doi.org/10.1007/978-3-030-16841-4_30
    [15] D. E. Goldberg, Genetic algorithms in search, optimization, and machine learning, Addison Wesley Publishing Co. Inc., 1989.
    [16] T. Hospedales, A. Antoniou, P. Micaelli, A. Storkey, Meta-learning in neural networks: a survey, IEEE Transactions on Pattern Analysis and Machine Intelligence, in press. http://dx.doi.org/10.1109/TPAMI.2021.3079209
    [17] F. Hutter, L. Kotthoff, J. Vanschoren, Automatic machine learning: methods, systems, challenges, Cham: Springer, 2019. http://dx.doi.org/10.1007/978-3-030-05318-5
    [18] F. Hutter, H. Hoos, K. Leyton-Brown, Sequential model-based optimization for general algorithm configuration, In: LION 2011: Learning and Intelligent Optimization, Berlin, Heidelberg: Springer, 2011,507–523. http://dx.doi.org/10.1007/978-3-642-25566-3_40
    [19] D. P. Kingma, J. Ba, Adam: a method for stochastic optimization, 2017, arXiv: 1412.6980.
    [20] N. Loizou, P. Richtarik, Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods, Comput. Optim. Appl., 77 (2020), 653–710. http://dx.doi.org/10.1007/s10589-020-00220-z doi: 10.1007/s10589-020-00220-z
    [21] J. Mockus, V. Tiesis, A. Zilinskas, The application of Bayesian methods for seeking the extremum, In: Towards global optimisation, North-Holand, 2012,117–129.
    [22] B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, N. De Freitas, Taking the human out of the loop: A review of bayesian optimization, Proc. IEEE, 104 (2016), 148–175. http://dx.doi.org/10.1109/JPROC.2015.2494218 doi: 10.1109/JPROC.2015.2494218
    [23] S. Thrun, L. Pratt, Learning to learn: introduction and overview, In: Learning to learn, Boston, MA: Springer, 1998, 3–17. http://dx.doi.org/10.1007/978-1-4615-5529-2_1
    [24] C. Ying, A. Klein, E. Real, E. Christiansen, K. Murphy, F. Hutter, NAS-Bench-101: Towards reproducible neural architecture search, In: Proceedings of the 36–th International Conference on Machine Learning, 2019, 7105–7114.
    [25] Z. Zhong, J. Yan, W. Wei, J. Shao, C.-L. Liu, Practical block-wise neural network architecture generation, In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, 2423–2432. http://dx.doi.org/10.1109/CVPR.2018.00257
    [26] B. Zoph, Q. V. Le, Neural architecture search with reinforcemente learning, 2017, arXiv: 1611.01578.
  • This article has been cited by:

    1. Weizhong Chen, Haoxu Wang, Xiaohua Wang, Lei Zhang, Interval Estimation for Switched Neural Networks by Zonotopes, 2025, 0890-6327, 10.1002/acs.4042
  • Reader Comments
  • © 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(4981) PDF downloads(417) Cited by(14)

Figures and Tables

Figures(10)  /  Tables(6)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog