Neural architecture search via standard machine learning methodologies

Giorgia Franchini; Valeria Ruggiero; Federica Porta; Luca Zanni; Giorgia Franchini; Valeria Ruggiero; Federica Porta; Luca Zanni

doi:10.3934/mine.2023012

Mathematics in Engineering

2023, Volume 5, Issue 1: 1-21. doi: 10.3934/mine.2023012

Previous Article Next Article

Research article Special Issues

Neural architecture search via standard machine learning methodologies

1.
Department of Mathematics and Computer Science, University of Ferrara, Via Machiavelli, 30, 44121 Ferrara (FE), Italy
2.
Department of Physics, Informatics and Mathematics, University of Modena and Reggio Emilia, Via Campi 213/B, 41121 Modena (MO), Italy

Received: 24 March 2021 Revised: 19 October 2021 Accepted: 13 January 2022 Published: 11 February 2022

In the context of deep learning, the more expensive computational phase is the full training of the learning methodology. Indeed, its effectiveness depends on the choice of proper values for the so-called hyperparameters, namely the parameters that are not trained during the learning process, and such a selection typically requires an extensive numerical investigation with the execution of a significant number of experimental trials. The aim of the paper is to investigate how to choose the hyperparameters related to both the architecture of a Convolutional Neural Network (CNN), such as the number of filters and the kernel size at each convolutional layer, and the optimisation algorithm employed to train the CNN itself, such as the steplength, the mini-batch size and the potential adoption of variance reduction techniques. The main contribution of the paper consists in introducing an automatic Machine Learning technique to set these hyperparameters in such a way that a measure of the CNN performance can be optimised. In particular, given a set of values for the hyperparameters, we propose a low-cost strategy to predict the performance of the corresponding CNN, based on its behavior after only few steps of the training process. To achieve this goal, we generate a dataset whose input samples are provided by a limited number of hyperparameter configurations together with the corresponding CNN measures of performance obtained with only few steps of the CNN training process, while the label of each input sample is the performance corresponding to a complete training of the CNN. Such dataset is used as training set for a Support Vector Machines for Regression and/or Random Forest techniques to predict the performance of the considered learning methodology, given its performance at the initial iterations of its learning process. Furthermore, by a probabilistic exploration of the hyperparameter space, we are able to find, at a quite low cost, the setting of a CNN hyperparameters which provides the optimal performance. The results of an extensive numerical experimentation, carried out on CNNs, together with the use of our performance predictor with NAS-Bench-101, highlight how the proposed methodology for the hyperparameter setting appears very promising.

Keywords:

Citation: Giorgia Franchini, Valeria Ruggiero, Federica Porta, Luca Zanni. Neural architecture search via standard machine learning methodologies[J]. Mathematics in Engineering, 2023, 5(1): 1-21. doi: 10.3934/mine.2023012

Related Papers:

[1]	Guoyi Li, Jun Wang, Kaibo Shi, Yiqian Tang . Some novel results for DNNs via relaxed Lyapunov functionals. Mathematical Modelling and Control, 2024, 4(1): 110-118. doi: 10.3934/mmc.2024010
[2]	Saravanan Shanmugam, R. Vadivel, S. Sabarathinam, P. Hammachukiattikul, Nallappan Gunasekaran . Enhancing synchronization criteria for fractional-order chaotic neural networks via intermittent control: an extended dissipativity approach. Mathematical Modelling and Control, 2025, 5(1): 31-47. doi: 10.3934/mmc.2025003
[3]	Gani Stamov, Ekaterina Gospodinova, Ivanka Stamova . Practical exponential stability with respect to $h-$ manifolds of discontinuous delayed Cohen–Grossberg neural networks with variable impulsive perturbations. Mathematical Modelling and Control, 2021, 1(1): 26-34. doi: 10.3934/mmc.2021003
[4]	Bangxin Jiang, Yijun Lou, Jianquan Lu . Input-to-state stability of delayed systems with bounded-delay impulses. Mathematical Modelling and Control, 2022, 2(2): 44-54. doi: 10.3934/mmc.2022006
[5]	Hongwei Zheng, Yujuan Tian . Exponential stability of time-delay systems with highly nonlinear impulses involving delays. Mathematical Modelling and Control, 2025, 5(1): 103-120. doi: 10.3934/mmc.2025008
[6]	M. Haripriya, A. Manivannan, S. Dhanasekar, S. Lakshmanan . Finite-time synchronization of delayed complex dynamical networks via sampled-data controller. Mathematical Modelling and Control, 2025, 5(1): 73-84. doi: 10.3934/mmc.2025006
[7]	Shipeng Li . Impulsive control for stationary oscillation of nonlinear delay systems and applications. Mathematical Modelling and Control, 2023, 3(4): 267-277. doi: 10.3934/mmc.2023023
[8]	Qin Xu, Xiao Wang, Yicheng Liu . Emergent behavior of Cucker–Smale model with time-varying topological structures and reaction-type delays. Mathematical Modelling and Control, 2022, 2(4): 200-218. doi: 10.3934/mmc.2022020
[9]	Yanchao He, Yuzhen Bai . Finite-time stability and applications of positive switched linear delayed impulsive systems. Mathematical Modelling and Control, 2024, 4(2): 178-194. doi: 10.3934/mmc.2024016
[10]	Weisong Zhou, Kaihe Wang, Wei Zhu . Synchronization for discrete coupled fuzzy neural networks with uncertain information via observer-based impulsive control. Mathematical Modelling and Control, 2024, 4(1): 17-31. doi: 10.3934/mmc.2024003

Abstract

1. Introduction

Since neural networks (NNs) are effective in modeling and describing nonlinear dynamics, there has been a remarkable surge in the utilization of NNs, which have contributed substantially to the fields of signal processing, image processing, combinatorial optimization, pattern recognition, and other scientific and engineering domains ^[1,2]. Theoretical advancements have laid a solid foundation for this progress, thereby facilitating the successful establishment of NNs as a powerful tool for a diverse range of applications. Consequently, the stability analysis of delayed neural networks (DNNs) has attracted many scholars ^[3,4].

It should be noted that temporal lags are invariably encountered within NNs as a consequence of intrinsic factors, including but not limited to the finite velocity of information processing ^[5], which can lead to instability and degraded performance in numerous real-world applications of NNs. In this way, the prognostic capacity and resilience of the neural network would suffer severe impairment, thereby gravely undermining its efficacy and dependability. Hence, the assessment of the stability of a computing system must be conducted with great care and precision, employing accurate evaluation criteria that abide by the principles of scientific rigor to provide reliable guarantee for the system's operation. Thus, in recent years, researchers have been dedicated to analyzing the stability of DNNs and investing significant amounts of time and effort into reducing the conservatism of stability criteria ^[6,7,8]. In regards to the stability criterion, accommodating a wider range of delay tolerance would result in a less conservative estimate. Consequently, the upper limit of the delay range assumes critical significance in the assessment and quantification of the degree of conservativeness. The stability of NNs with variable delays has been widely studied using the Lyapunov-Krasovskii functionals (LKFs) technique because time delays in actual NNs are usually time-varying.

In order to analyze and solve the stability problems of DNNs, common approaches such as the Lyapunov stability method and linear matrix inequalities (LMIs) are often adopted. Among these, the Lyapunov stability method is widely applied, while LMIs provide a useful descriptive framework for these systems. This method aims to create suitable LKFs to derive less conservative stability conditions, ensuring the stability of the studied DNN even with delays varying within the largest possible closed interval. Various types of augmented LKFs were introduced in ^{[9,10,11,12,13,14,15,16,17,18,19,20,21,22]} to investigate the asymptotic or exponential stability that is dependent on delay in DNNs with varying temporal delays. Utilizing the augmented LKF approach, numerous improved stability criteria were also established. Moreover, integral inequality, the free-weighting matrix method, and reciprocally convex combination, which are commonly used methods or techniques, have been utilized to obtain the stability criteria. To reduce the conservatism of stability criteria, recent works ^{[10,13,14,15]} have employed delayed state derivative terms as augmented vector elements to estimate the time derivative. Although the existing literature indicates that the resultant stability criteria are less conservative, it is worth noting that the dimensions of the criteria in the LMI formulation experience substantial expansion. Thus, enhanced LKFs generally increase the difficulty and complexity, resulting in a corresponding increase in the server's computational burden and time.

The work done by the above scholars still requires symmetry in the construction of LKF. In ^[23], two novel delay-dependent stability criteria for time-delay systems are presented, utilizing LMIs. Both are established via asymmetric augmented LKFs, ensuring positivity without the need for all involved matrices in the LKFs to be symmetric and positive definite. In ^[24], the author used the same method to study the stability of Takagi-Sugeno fuzzy system.

In this paper, the primary contribution can be outlined as follows:

(1) An improved asymmetric LKF is proposed, which can be positively definite without requiring that all matrix variables be symmetric or positive-definite.

(2) A new stability criterion is formulated by utilizing linear matrix inequalities incorporating integral inequality and reciprocally convex combination techniques.

(3) Compared to traditional methods, this new approach has less conservatism and complexity, which enables it to more accurately characterize neural network stability issues.

Ultimately, the efficacy and superiority of this novel approach were successfully demonstrated, corroborating its robustness and superiority over existing methodologies through a commonly employed numerical illustration, providing a feasible solution for practical engineering applications.

Notations: $R^{n}$ denotes the n-dimensional Euclidean vector space. $R^{n\times m}$ is the set of all $n\times m$ real matrices. $D_{+}^{n}$ represents the set of positive-definite diagonal matrices of $R^{n\times n}$ . $diag\{\}$ denotes a block-diagonal matrix. $\text{He}(\mathcal{M}) = \mathcal{M}+\mathcal{M}^{T}$ . $\mathbb{N}$ stands for the sets of nonnegative integers.

2. Preliminaries

Lemma 2.1. (Jensen's inequality^[25]) Given $\mathcal{Q} > 0$ , for any continuous function

$\zeta( \theta): [ \delta _{1} , \delta _{2} ]\rightarrow R ^{n},$

the following inequality holds:

$\begin{equation*} \begin{aligned} \begin{split} ( \alpha _{1} -\alpha _{2} )\int _{\alpha _{2} }^{\alpha _{1} } \zeta^{T}( \theta) \mathcal{Q}\zeta( \theta) d\theta \geq \left(\int _{\alpha _{2} }^{\alpha _{1} } \zeta( \theta) d\theta\right)^{T} \mathcal{Q}\left(\int _{\alpha _{2} }^{\alpha _{1} } \zeta( \theta) d\theta\right), \end{split}\end{aligned} \end{equation*}$

$\begin{equation*} \begin{split} & \frac{( \alpha _{1} -\alpha _{2} )^{2}}{2}\int _{\alpha _{2} }^{\alpha _{1} }\int _{s}^{\alpha _{1} }\dot{x}^{T}( u) \mathcal{Q}\dot{x}( u) duds\\ &\geq \ \left(\int _{\alpha _{2} }^{\alpha _{1} }\int _{s}^{\alpha _{1} }\dot{x}( u) duds\right)^{T} \mathcal{Q}\left(\int _{\alpha _{2} }^{\alpha _{1} }\int _{s}^{\alpha _{1} }\dot{x}( u) duds\right). \end{split} \end{equation*}$

Lemma 2.2. (Wirtinger-based integral inequality^[26]) Given $\mathcal{R} > 0$ , for any continuous function

$\zeta ( \eta): [ \delta _{1} , \delta _{2}]\rightarrow R ^{n},$

the following inequality holds:

$\begin{equation*} \int _{\delta _{1}}^{\delta _{2}} \zeta ^{T}( \eta )( s) \mathcal{R}\zeta ( \eta ) d\eta\\ \geq \frac{1}{\delta _{2} -\delta _{1}}\left(\int _{\delta _{1}}^{\delta _{2}} \zeta ^{T}( \eta ) d\eta \right) \mathcal{R}\left(\int _{\delta _{1}}^{\delta _{2}} \zeta ( \eta ) d\eta \right) +\frac{3}{\delta _{2} -\delta _{1}} \Phi ^{T} \mathcal{R}\Phi, \end{equation*}$

where

$\begin{align*} &\Phi = \int _{\delta _{1}}^{\delta _{2}} \zeta ( \eta ) d\eta -\frac{2}{\delta _{2} -\delta _{1}}\int _{\delta _{1}}^{\delta _{2}}\int _{\theta }^{\delta _{1}} \zeta ( \eta ) d\eta d\theta, \\ &\int _{\delta _{1}}^{\delta _{2}}\dot{\zeta }^{T}( \eta ) R\dot{\zeta }( \eta ) d\eta \geq \frac{\pi ^{2}}{( \delta _{2} -\delta _{1})^{2}}\int _{\delta _{1}}^{\delta _{2}} \zeta ^{T}( \eta )( s) R\zeta ( \eta ) d\eta. \end{align*}$

Lemma 2.3. (B-L inequality^[27]) Given $\mathcal{R} > 0$ , for any continuous function

$x( t):[ \psi _{1} , \psi _{2}]\rightarrow R ^{n},$

the following inequality holds:

$\begin{equation*} \int _{\psi _{1}}^{\psi _{2}}\dot{x}^{T}( s) R\dot{x}( s) ds \geq \ \frac{1}{\psi _{2}-\psi _{1}} \vartheta _{N}^{T}\left[\sum\limits_{k = 0}^{N}( 2k+1) \Gamma _{N}^{T}( k) R\Gamma _{N}( k)\right] \vartheta _{N}\end{equation*}$

holds for any $N\in \mathbb{N}$ , where

$\begin{align*} \vartheta _{N} &: = \ \begin{cases} col\{x( \psi _{2}) , x( \psi _{1})\} , \; \; \;\; \; \; \; \; \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \; \; N = 0, \\ col\{x( \psi _{2}) , x( \psi _{1}) , \frac{1}{\psi _{2}-\psi _{1}} {\Omega}_{0} , \ldots , \frac{1}{\psi _{2}-\psi _{1}} {\Omega}_{N-1}\} , N\geq 1, \end{cases}\\ m& = \frac{1}{\psi _{2}-\psi _{1}}, \\ \Gamma _{N}( k)& : = \begin{cases} [ I\ \ -I] , \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ N = 0, \\ \left[ I\ \ ( -1)^{k+1} I\ \ \gamma _{NK}^{0} I\ \ldots\ \gamma _{NK}^{N-1} I\right], \ \ \ \ \ \ \ \ \ \ \ \ \ \ N\geq 1, \end{cases}\\ \gamma _{Nk}^{j}& : = \begin{cases} -( 2j+1)\left( 1-( -1)^{k+j}\right) , \ \ \ \ \ \ \ \; \; \ \ \; \; \; \ \ \ \ \ j\leq k-1, \\ 0 , \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ j\leq k-1. \end{cases} \end{align*}$

3. Main results

Consider the following DNNs:

$\begin{equation} \begin{split} \dot{x}( t) = -Ax( t) +W_{0} f( x( t)) +W_{1} f( x( t-h( t))), \end{split} \end{equation}$

(3.1)

where

$x(t) = [x_{1}(t) , x_{2}(t) , \ldots, x_{n}(t)]^{T} \in R^{n}$

represents the neuron state vector, and the activation functions are given by

$\begin{align*} f( x(t)) & = [f_{1}(x_{1}( t)) , f_{2}(x_{2}(t)) , \ldots, f_{n}( x_{n}(t))]^{T}, \\ A& = \text{diag}\{a_{1} , a_{2} , \ldots, a_{n}\} > 0, \ \ W_{i} ( i = 0, 1) \end{align*}$

are the connection matrices. $h_{t}$ is the abbreviation of $h (t)$ denotes a time-varying delay, which is a differentiable function satisfying:

$\begin{equation} \begin{split} 0\leq h_{t} \leq h_{m} , \ \nu _{1} \leq \dot{h}_{t} \leq \nu _{2}, \end{split} \end{equation}$

(3.2)

where $\ h_{m}, \ \nu _{1} \ and\ \nu _{2}$ are the real constants.

The activation functions $f(\cdotp)$ is continuous, satisfy

$f_{i}( 0) = 0$

and

$\begin{equation} k_{1i} \leq \frac{f_{i}( u_{1}) -f_{i}( u_{2})}{u_{1} -u_{2}} \leq k_{2i} , \ \ u_{1} \neq u_{2}, \end{equation}$

(3.3)

where $k_{1i}$ and $k_{2i}$ are known real constants, $i = 1, 2, \ldots, m$ . For convenience, we define

$K_{1} = \text{diag}\{k_{11} , k_{12}, \ldots, k_{1m}\}$

and

$K_{2} = \text{diag}\{k_{21} , k_{22}, \ldots, k_{2m}\}.$

For simplicity, we define the following notations:

$\begin{align*} h_{t} & = h( t), \ \ f_{w}( t) = f( x( t)), \\ e_{i}^{T} & = [ 0_{n*( i-1) n} \ \ \ I_{n*n} \ \ \ 0_{n*( 10-i) n}] , \ \ i = 1, 2\ \ldots, 10, \\ \mu_{1} & = col\left\{ x( t), x( t-h_{m})\right\}, \ \ \mu_{2} = col\left\{ x( t-h_{t}), f( x( t))\right\}, \\ \mu_{3}& = col\left\{ f( x( t-h( t))), \int _{t-h_{m}}^{t}\dot{x}( s) ds\right\}, \\ \mu_{4}& = col\left\{ \int _{t-h_{m}}^{t} x( u) du, \int _{s}^{t} \ x( u) du\right\}, \\ \mu_{5} & = col\left\{ \int _{t-h_{m}}^{t}\int _{s}^{t} x( u) du, \int _{t-h_{m}}^{t}\int _{\theta }^{t-h_{m}}\int _{s}^{t} x( u) dudsd\theta\right\}, \\ \zeta ( t)& = col\left\{\mu_{1}, \mu_{2}, \mu_{3}, \mu_{4}, \mu_{5}\right\}. \end{align*}$

Theorem 3.1. Given $h_{m} > 0$ , system (1) is asymptotically stable if there exist matrices

$Q = [Q_{1} Q_{2}]$

with

$\begin{align*} \begin{aligned} Q_{1} & = Q_{1}^{T}, \ \ \ P = P^{T} = \left[\begin{matrix} P_{11} \ & P_{12} \ \\ * & P_{22} \end{matrix}\right] , \\ P_{11} , P_{12} , P_{22} &\in R^{n\times n} , \ \ \ Q_{i} \in R^{n\times n}, \ (i = 1, 2) , \\ T_{i} &\in R^{n\times n} > 0, \ ( i = 1, 2) , \ \ \ F_{i} \in R^{n\times n} > 0, \ ( i = 1, 2) , \\ N_{i}& \in D_{+}^{n}, \ ( i = 1, 2), \end{aligned} \end{align*}$

any matrices $S_{1} \in R^{n\times n}, G\in R^{n\times n}$ , such that

$\begin{align} R_{7}& = \left[\begin{matrix} T_{1} & G\\ * & T_{1} \end{matrix}\right] > 0, \end{align}$

(3.4)

$\begin{align} R_{8} & = \left[\begin{matrix} Q_{1} +T_{1} & \frac{1}{2}Q_{2}\\ * & T_{2} \end{matrix}\right] > 0, \end{align}$

(3.5)

$\begin{align} R_{9}& = \left[\begin{matrix} \Theta _{11} & \Theta _{12} & 0 & 0\\ * & \Theta _{22} & \Theta _{23} & -3h_{m}^{-1}Q_{2}\\ * & * & \Theta _{33} & \Theta _{34}\\ * & * & * & 12h_{m}^{-2}T_{2} \end{matrix}\right] > 0, \end{align}$

(3.6)

$\begin{align} R_{10}& = \left[\begin{matrix} \Pi _{11} & \Pi _{12}\\ * & \Pi _{22} \end{matrix}\right] < 0, \end{align}$

(3.7)

$\begin{equation*} \begin{array}{l} \Theta _{11} = e_1^{T}[h_{m}P_{11}+h_{m}^{2}T_{1}]e_1, \\ \Theta _{12} = e_1^{T}[h_{m}P_{12}-h_{m}T_{1}]e_7, \\ \Theta _{22} = e_7^{T}[h_{m}P_{22}+4Q_{1} +4T_{1}]e_7, \\ \Theta _{23} = e_7^{T}[2Q_{2}-6h_{m}^{-1}Q_{1} -6h_{m}^{-1}T_{1}]e_{9}, \\ \Theta _{33} = e_{9}^{T}[12h_{m}^{-2}Q_{1} -3h_{m}^{-1} \text{He}\left(Q_{2}\right) +12h_{m}^{-2}T_{1} +4T_{2}]e_{9}, \\ \Theta _{34} = e_{9}^{T}[6h_{m}^{-2}Q_{2} -6h_{m}^{-1}T_{2}]e_{10}, \\ \Pi _{11} = e_1^{T}[Q_{1} +h_{m}^{2}T_{2} + \text{He}(P_{12} -AP_{11}) +h_{m}^{2} AT_{1} A\\ \quad \quad \quad-K_{1} R_{1} -S_{1}+F_{1}+( X-1)T_{1}]e_1 -e_2^{T}[K_{1} R_{2}\\ \quad \quad \quad-( X-1)T_{1} +Q_{1}]e_2 -e_3^{T}[T_{2}-\frac{-X\pi ^{2}}{h_{m}^{2}}T_{1}]e_3\\ \quad \quad \quad+e_1^{T}[ \text{He}(\frac{1}{2}S_{1}-G -P_{12}^{T})]e_2 +e_1^{T}[ \text{He}(P_{22}\\ \quad \quad \quad-AP_{12}+ \frac{1}{2}Q_{2}^{T})]e_3 -e_2^{T}[ \text{He}(P_{22} +\frac{1}{2}Q_{2})]e_3, \\ \Pi _{12} = e_1^{T}[P_{11}^{T} W_{0}^{T} -h_{m}^{2} AT_{1} W_{0} +K_{2} R_{1} ]e_4+e_1^{T}[P_{11}^{T} W_{1}^{T}\\ \quad \quad \quad-h_{m}^{2} AT_{1} W_{1}]e_5 +e_1^{T}S_{1}e_6 +e_1^{T}[G-T_{1}]e_7\\ \quad \quad \quad-e_2^{T}K_{2} R_{2}e_5 -\frac{1}{2}e_2^{T}S_{1}e_6 +e_2^{T}[G^{T}-T_{1}]e_7\\ \quad \quad \quad+e_3^{T}P_{12}^{T} W_{0}^{T}e_4 +e_3^{T}P_{12}^{T} W_{1}^{T}e_5, \\ \Pi _{22} = e_4^{T}[F_{2}-R_{1} +h_{m}^{2} W_{0}T_{1} W_{0}]e_4 +e_4^{T}[h_{m}^{2} \text{He}(W_{0}T_{1}\\ \quad \quad \quad\times W_{1})]e_5+e_5^{T}[h_{m}^{2} W_{1}T_{1} W_{1}-R_{2}-( 1-\dot{h}_{t}) F_{2}]e_5\\ \quad \quad \quad-e_6^{T}S_{1}e_6+e_7^{T}[2T_{1}- \text{He}(G)-( 1-h_{t}) F_{1}]e_7. \end{array} \end{equation*}$

Proof. Consider the following LKF candidate:

$\begin{align} V_{t} = \sum _{i = 1}^{4} V_{ti}, \end{align}$

(3.8)

where

$\begin{align*} V_{t1} = & \ {\rho^T}( t )P\rho ( t ), \ \rho \left( t \right) = \left[ {{x^T}\left( t \right){\rm{ }}\int_{t - {h_m}}^t {{x^T}\left( u \right)du} } \right]^{T}, \\ V_{t2} = &\int _{t-h_{m}}^{t} x^{T}( s) Q\left[ x^{T}( s) \ \ \int _{s}^{t} x^{T}( u) du\right]^{T} ds, \ \ Q = [Q_{1} \ Q_{2}], \\ V_{t3} = &\int _{t-h_{t}}^{t} x^{T}( s) F_{1} x( s) ds+h_{m}\int _{t-h_{m}}^{t}\int _{s}^{t}\dot{x}^{T}( u)T_{1}\dot{x}( u) duds\\ &+h_{m}\int _{t-h_{m}}^{t}\int _{s}^{t} x^{T}( u)T_{2} x( u) duds, \\ V_{t4} = &\int _{t-h_{t}}^{t} f_{w}^{T}( s) F_{2} f_{w}( s) ds. \end{align*}$

By Lemma 2.1, since $T_{1} > 0\ and\ T_{2} > 0$ , we obtain

$\begin{align} &h_{m}\int _{s}^{t}\dot{x}^{T}( u)T_{1}\dot{x}( u) du\geq ( x( t) -x( s))^{T}T_{1}( x( t) -x( s)), \end{align}$

(3.9)

$\begin{align} &h_{m}\int _{s}^{t}x^{T}( u)T_{2} x( u) du\geq \int _{s}^{t} x^{T}( u) duT_{2}\int _{s}^{t} x(u)du. \end{align}$

(3.10)

Thus, we can infer

$\begin{align} V_{t2} +V_{t3} = &\int _{t-h_{t}}^{t} x^{T}( s) F_{1} x( s) ds\\ &+\int _{t-h_{m}}^{t}[ x^{T}( s)Q\left[ x^{T}( s) \ \ \ \int _{s}^{t} x^{T}( u) du\right]^{T} \\ &+h_{m}\int _{s}^{t}\left[\dot{x}^{T}( u)T_{1}\dot{x}( u) +x^{T}( u)T_{2} x( u)\right] du] ds\\ \geq& \int _{t-h_{t}}^{t} x^{T}( s) F_{1} x( s) ds\\ &+\int _{t-h_{m}}^{t}\left[\begin{matrix} x( t)\\ x( s)\\ \int _{s}^{t} x( u) du \end{matrix}\right]^{T} R_{6} \ \ \left[\begin{matrix} x( t)\\ x( s)\\ \int _{s}^{t} x( u) du \end{matrix}\right] ds, \end{align}$

(3.11)

where

$R_6 = \left[\begin{matrix} T_{1} & -T_{1} & 0\\ * & Q_{1} +T_{1} & \frac{1}{2}Q_{2}\\ * & * & T_{2} \end{matrix}\right].$

Based on the above conditions, we can get

$\begin{align} \sum _{i = 1}^{4} V_{ti} \geq& V_{t1} +\int _{t-h_{m}}^{t}\left[\begin{matrix} x( t)\\ x( s)\\ \int _{s}^{t} x( u) du \end{matrix}\right]^{T} R_{6}\ \ \left[\begin{matrix} x( t)\\ x( s)\\ \int _{s}^{t} x( u) du \end{matrix}\right] ds \\ &+\int _{t-h_{t}}^{t} x^{T}( s) F_{1} x( s) ds+\int _{t-h_{t}}^{t} f^{T}( x( s)) F_{2} f( x( s)) ds\\ \geq&\frac{1}{h_{m}} \zeta ^{T}(t) R_{9} \zeta (t) +\int _{t-h_{t}}^{t} x^{T}( s) F_{1} x( s) ds\\ &+\int _{t-h_{t}}^{t} f^{T}( x( s)) F_{2} f( x( s)) ds > 0. \end{align}$

(3.12)

The time derivatives of $\ V_{ti}, i \in \{1, \ldots, 4\}$ along the trajectory of system (3.1) is given by:

$\begin{align} \dot{V}_{t1} = &2\dot{\rho }^{T}( t)P \rho ( t), \\ \dot{V}_{t2} = &x^{T}( t)Q_{1} x( t) +\int _{t-h_{m}}^{t} x^{T}( u) duQ_{2} x( t)\\ &-x^{T}( t-h_{m})Q\left[ x^{T}( t-h_{m}) \ \ \int _{t-h_{m}}^{t} x^{T}( u) du\right]^{T}, \\ \dot{V}_{t3} = &h_{m}^{2}\dot{x}^{T}( t)T_{1}\dot{x}( t) -h_{m}\int _{t-h_{m}}^{t}\dot{x}^{T}( u)T_{1}\dot{x}( u) du\\ &+h_{m}^{2} x^{T}( t)T_{2} x( t) -h_{m}\int _{t-h_{m}}^{t} x^{T}( u)T_{2} x( u) du \\ &+x^{T}( t) F_{1} x( t)-( 1-\dot{h}_{t}) x^{T}( t-h_{t}) F_{1} x( t-h_{t}), \\ \dot{V}_{t4} = &f^{T}( x( t)) F_{2} f( x( t))-( 1-\dot{h}_{t}) f^{T}( x( t-h_{t})) F_{2} f( x( t-h_{t})). \end{align}$

(3.13)

For any symmetric matrices $S_{1}$ , the following one equalities hold:

$\begin{align} [ x( t) -x( t-h_{m}) &+\int _{t-h_{m}}^{t}\dot{x}( s) ds]^{T}\times S_{1}[ x( t)\\ & -x( t-h_{m}) -\int _{t-h_{m}}^{t}\dot{x}( s) ds]\\& = [e_1-e_2+e_6]^{T}S_{1}[ e_1-e_2-e_6]. \end{align}$

(3.14)

According to (3.3), for any appropriate diagonal matrices

$N_{i} = \text{diag}\{n_{i1} , n_{i2} , \ldots , n_{im}\} > 0, \ \ (i = 1, 2),$

we have

$\begin{align} 0 \leq &\sum\limits_{i = 1}^{n}[ f_{i}( x_{i}( t)) -k_{1i} x_{i}( t)] n_{i1}[ k_{2i} x_{i}( t) -f_{i}( x_{i}( t))] \\&+\sum _{i = 1}^{n}[ f_{i}( x_{i}( t-h( t))) -k_{1i} x_{i}( t-h( t)]n_{i2} [ k_{2i} x_{i}( t-h( t))\\ &-f_{i}( x_{i}( t-h( t)))]\\ = &[ f( x( t)) -K_{1} x( t)]^{T} N_{1}[ K_{2} x( t) -f( x( t))]\\ &+[ f( x( t-h( t))) -K_{1} x( t-h( t)]^{T}N_{2} [ K_{2} x( t-h( t))\\ &-f( x( t-h( t)))]. \end{align}$

(3.15)

Then

$\begin{align} 0 \leq& \left[\begin{matrix} x( t)\\ f( x( t)) \end{matrix}\right]^{T}\left[\begin{matrix} -M_{1} N_{1} & M_{2} N_{1} \\ * & -N_{1} \end{matrix}\right]\left[\begin{matrix} x( t) \nonumber\\ f( x( t)) \end{matrix}\right] \\&+ \left[\begin{matrix} x( t-h( t))\\ f( x( t-h( t))) \end{matrix}\right]^{T}\left[\begin{matrix} -M_{1} N_{2} & M_{2} N_{2}\\ * & -N_{2} \end{matrix}\right]\left[\begin{matrix} x( t-h( t))\\ f( x( t-h( t))) \end{matrix}\right] \\ = &\left[\begin{matrix} e_1\\ e_4 \end{matrix}\right]^{T}\left[\begin{matrix} -M_{1} N_{1} & M_{2} N_{1}\\ * & -N_{1} \end{matrix}\right]\left[\begin{matrix} e_1\\ e_4 \end{matrix}\right]\\ &+ \left[\begin{matrix} e_3\\ e_5 \end{matrix}\right]^{T}\left[\begin{matrix} -M_{1} N_{2} & M_{2} N_{2}\\ * & -N_{2} \end{matrix}\right]\left[\begin{matrix} e_3\\ e_5 \end{matrix}\right], \end{align}$

(3.16)

where

$\begin{align*} M_{1} & = \text{diag}\{k_{11} k_{21} , k_{12} k_{22} , \ldots, k_{1m} k_{2m} \}, \\ M_{2}& = \text{diag}\left\{\frac{ k_{11} + k_{21}}{2} , \frac{ k_{12} + k_{22}}{2} , \ldots, \frac{ k_{1m} + k_{2m}}{2} \right\}. \end{align*}$

Based on (3.4) and (3.5), we can obtain

$\begin{align} &-h_{m}\int _{t-h_{m}}^{t}\dot{x}^{T}( u)T_{1}\dot{x}( u) du = -\alpha h_{m}\int _{t-h_{m}}^{t}\dot{x}^{T}( u)T_{1}\dot{x}( u) du\\ &-( 1-\alpha ) h_{m}\int _{t-h_{m}}^{t}\dot{x}^{T}( u)T_{1}\dot{x}( u) du \\ &\leq -\alpha \left[\begin{matrix} e_1\\ e_2\\ e_3 \end{matrix}\right]^{T}\left[\begin{matrix} T_{1} & -M & -T_{1} +M\\ * & T_{1} & -T_{1} +M^{T}\\ * & * & 2T_{1} -M-M^{T} \end{matrix}\right]\left[\begin{matrix} e_1\\ e_2\\ e_3 \end{matrix}\right] \\ &-( 1-\alpha ) \frac{\pi ^{2}}{h_{m}}e^{T}_7T_{1}e_7. \end{align}$

(3.17)

According to Lemma 2.1, we can obtain

$\begin{align} &-h_{m}\int _{t-h_{m}}^{t} x^{T}( u)T_{2} x( u) du\\ &\leq-\int _{t-h_{m}}^{t} x^{T}( u)duT_{2}\int _{t-h_{m}}^{t} x( u)du. \end{align}$

(3.18)

By combining (3.16)–(3.18), one can derive that

$\begin{align} \dot{V}_{t} \leq\zeta ^{T}(t) R_{10} \zeta (t). \end{align}$

(3.19)

The inequality (3.19) holds, so

$\zeta ^{T}(t) R_{10} \zeta (t) < 0$

proves that $\dot{V}_{t} < 0$ . Therefore, if LMIs (3.4)–(3.7) hold, the system (3.1) is asymptotically stable.

The proof is completed. □

Remark 3.1. The underlying expression is deftly rescaled through the integration of the Wirtinger-based integral inequality and the B-L inequality. Here, their coefficients are assigned as $\alpha$ and $1-\alpha$ , respectively. This approach provides a flexible transformation within the range of $[0, 1]$ , which facilitates the pursuit of optimal amalgamation. In scenarios where $\alpha = 1$ , thus making $1-\alpha = 0$ , the outcome solely relies on scaling through the B-L inequality. Conversely, if $\alpha = 0$ , which makes $1-\alpha = 1$ , scaling exclusively uses the Wirtinger-based integral inequality.

Remark 3.2. Numerous researchers have introduced various LKFs in the analysis of DNNs, traditionally requiring the matrices to be symmetric. However, this study optimizes this traditional constraint by devising asymmetric forms of LKFs. With such asymmetric constructs, it is not mandatory for each matrix to be symmetric or positive definite when setting the conditions. Consequently, the conditions become more relaxed, leading to a less conservative theorem, thus broadening the horizons of analysis in this domain.

4. Numerical example

In this part, a numerical example is given to illustrate the effectiveness of the suggested stability criterion. The primary goal is to acquire an acceptable maximum upper bound (AMUB) that is deemed acceptable for the time-varying delays and provides assurance of the neural networks under consideration's global asymptotic stability. As the AMUB increases, the stability criterion becomes less conservative.

Example 1. We consider DNNs (3.1) with the following parameters ^{[17,18,21,28,29]}:

$\begin{array}{l} W_{0} = \left[\begin{matrix} 0.0503 & 0.0454\\ 0.0987 & 0.2075 \end{matrix}\right], \ \ W_{1} = \left[\begin{matrix} 0.2381 & 0.9320\\ 0.0388 & 0.5062 \end{matrix}\right]\end{array}$

and

$M_{1} = [ 0, 0;0, 0], \; M_{2} = [ 0.3, 0;0, 0.8], \; A = [ 1.5, 0;0, 0.7].$

In , the maximum time delay achieved for various values of $\mu$ within a specific range is presented for Theorem 1 and other related articles.

Table 1. The maximum allowable delays of

$h_t$ with various

$\mu$ for Example 1.

Methods	$\mu$ =0.4	$\mu$ =0.45	$\mu$ =0.5	$\mu$ =0.55
^[17]Th.1	7.6697	6.7287	6.4126	6.2569
^[21]Th.1	8.3186	7.2119	6.8373	6.6485
^[28]Th.1	8.3958	7.3107	7.0641	6.7829
^[29]Th.1	10.1095	8.6732	8.1733	7.8993
^[18]Th.3	13.8671	11.1174	10.0050	9.4157
Theorem 1	13.7544	12.4328	11.4954	10.7915

| Show Table

DownLoad: CSV

It is evident that the AMUB as derived from Theorem 1 surpasses that obtained in the previous work by Theorem 1 presented in ^{[17,18,21,28,29]}. Compared with Theorem 3 in ^[26], a larger AMUB is obtained when $\mu$ ranges from 0.45 to 0.55. When $\mu$ is 0.4, the result obtained is smaller than that obtained by ^[26]. Importantly, the number of NVs involved in Theorem 1, as shown in , is $16n^{2}+12n$ , which is less than the $79.5n^{2}+13.5n$ reported in ^[26]. This reduction in the number of NVs lowers the computational complexity from a computation perspective. When $\mu$ = 0.4 and the initial states are 0.5, 0.8, -0.5, and -0.8, the corresponding state trajectories are illustrated in . For $\mu$ = 0.4, the illustration shows that the trajectories tend toward zero near the abscissa of around 350. With $\mu$ = 0.55 and the same initial states, the state trajectories are depicted in Figure 2, where they tend towards zero near the abscissa of approximately 300.

Table 2. Computational complexity.

Approaches	Number of DVs	Number of LMIs
Th.1^[17]	$15n^2+16n$	24
Th.1^[21]	$39.5n^2+29.5n$	40
Th.1^[28]	$15n^2+11n$	17
Th.1^[29]	$87n^2+41n$	46
Th.3^[18]	$79.5n^2+13.5n$	17
Theorem 1	$16n^2+12n$	10

| Show Table

DownLoad: CSV

Figure 1. The state trajectories with

$\mu = 0.4$ .

DownLoad: Full-Size Img PowerPoint

Figure 2. The state trajectories with

$\mu = 0.55$ .

DownLoad: Full-Size Img PowerPoint

5. Conclusions

A suitable asymmetric LKF has been constructed, enabling us to demonstrate its positive definiteness without the necessity for all matrix variables to be symmetric or positive-definite. Additionally, a novel combinatorial optimization approach is employed to its full potential, utilizing the linear combination of multiple inequalities to identify the optimal arrangement and to process these inequalities. Therefore, the conditions presented in this study are shown to have less conservatism, and our newly proposed technique exhibits tremendous potential in terms of its theoretical and empirical capability to generate larger maximum allowable delays in comparison to select recent works in the literature. Furthermore, both theoretical and quantitative analyses confirm that our method notably reduces conservatism. Finally, the proposed method can be effectively combined with the delay segmentation technique to segment the delay interval into $N$ segments for fine processing, which will be further investigated in future work.

Use of AI tools declaration

Throughout the preparation of this work, we utilized the AI-based proofreading tool "Grammarly" to identify and correct grammatical errors. Subsequently, we carefully reviewed the content and made any necessary additional edits. We take complete responsibility for the content of this publication.

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant (No. 12061088), the Key R & D Projects of Sichuan Provincial Department of Science and Technology (2023YFG0287) and Sichuan Natural Science Youth Fund Project (Nos. 24NSFSC7038 and 2024NSFSC1404).

Conflict of interest

There are no conflicts of interest regarding this work.

References

[1]	T. Akiba, S. Sano, T. Yanase, T. Ohta, M. Koyama, Optuna: A next-generation hyperparameter optimization framework, In: Proceedings of the 25rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2019, 2623–2631. http://dx.doi.org/10.1145/3292500.3330701
[2]	B. Baker, O. Gupta, R. Raskar, N. Naik, Accelerating neural architecture search using performance prediction, 2017, arXiv: 1705.10823.
[3]	B. Baker, O. Gupta, N. Naik, R. Raskar, Designing neural network architectures using reinforcement learning, 2017, arXiv: 1611.02167.
[4]	J. F. Barrett, N. Keat, Artifacts in CT: recognition and avoidance, RadioGraphics, 24 (2004), 1679–1691. http://dx.doi.org/10.1148/rg.246045065 doi: 10.1148/rg.246045065
[5]	J. Bergstra, R. Bardenet, Y. Bengio, B. Kégl, Algorithms for hyper-parameter optimization, In: Advances in Neural Information Processing Systems, 2011, 2546–2554.
[6]	J. Bergstra, Y. Bengio, Random search for hyper-parameter optimization, J. Mach. Learn. Res., 13 (2012), 281–305.
[7]	L. Bottou, F. E. Curtis, J. Nocedal, Optimization methods for large-scale machine learning, SIAM Rev., 60 (2018), 223–311. http://dx.doi.org/10.1137/16M1080173 doi: 10.1137/16M1080173
[8]	L. Breiman, Random forests, Machine Learning, 45 (2001), 5–32. http://dx.doi.org/10.1023/A:1010933404324 doi: 10.1023/A:1010933404324
[9]	C. J. C. Burges, A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery, 2 (1998), 121–167. http://dx.doi.org/10.1023/A:1009715923555 doi: 10.1023/A:1009715923555
[10]	H. Cai, T. Chen, W. Zhang, Y. Yu, J. Wang, Efficient architecture search by network transformation, 2017, arXiv/1707.04873.
[11]	T. Domhan, J. T. Springenberg, F. Hutter, Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves, In: IJCAI International Joint Conference on Artificial Intelligence, 2015, 3460–3468.
[12]	T. Elsken, J. H. Metzen, F. Hutter, Neural architecture search: a survey, J. Mach. Learn. Res., 20 (2019), 1997–2017.
[13]	T. Elsken, J.-H. Metzen, F. Hutter, Simple and efficient architecture search for convolutional neural networks, 2017, arXiv: 1711.04528.
[14]	G. Franchini, M. Galinier, M. Verucchi, Mise en abyme with artificial intelligence: how to predict the accuracy of NN, applied to hyper-parameter tuning, In: INNSBDDL 2019: Recent advances in big data and deep learning, Cham: Springer, 2020,286–295. http://dx.doi.org/10.1007/978-3-030-16841-4_30
[15]	D. E. Goldberg, Genetic algorithms in search, optimization, and machine learning, Addison Wesley Publishing Co. Inc., 1989.
[16]	T. Hospedales, A. Antoniou, P. Micaelli, A. Storkey, Meta-learning in neural networks: a survey, IEEE Transactions on Pattern Analysis and Machine Intelligence, in press. http://dx.doi.org/10.1109/TPAMI.2021.3079209
[17]	F. Hutter, L. Kotthoff, J. Vanschoren, Automatic machine learning: methods, systems, challenges, Cham: Springer, 2019. http://dx.doi.org/10.1007/978-3-030-05318-5
[18]	F. Hutter, H. Hoos, K. Leyton-Brown, Sequential model-based optimization for general algorithm configuration, In: LION 2011: Learning and Intelligent Optimization, Berlin, Heidelberg: Springer, 2011,507–523. http://dx.doi.org/10.1007/978-3-642-25566-3_40
[19]	D. P. Kingma, J. Ba, Adam: a method for stochastic optimization, 2017, arXiv: 1412.6980.
[20]	N. Loizou, P. Richtarik, Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods, Comput. Optim. Appl., 77 (2020), 653–710. http://dx.doi.org/10.1007/s10589-020-00220-z doi: 10.1007/s10589-020-00220-z
[21]	J. Mockus, V. Tiesis, A. Zilinskas, The application of Bayesian methods for seeking the extremum, In: Towards global optimisation, North-Holand, 2012,117–129.
[22]	B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, N. De Freitas, Taking the human out of the loop: A review of bayesian optimization, Proc. IEEE, 104 (2016), 148–175. http://dx.doi.org/10.1109/JPROC.2015.2494218 doi: 10.1109/JPROC.2015.2494218
[23]	S. Thrun, L. Pratt, Learning to learn: introduction and overview, In: Learning to learn, Boston, MA: Springer, 1998, 3–17. http://dx.doi.org/10.1007/978-1-4615-5529-2_1
[24]	C. Ying, A. Klein, E. Real, E. Christiansen, K. Murphy, F. Hutter, NAS-Bench-101: Towards reproducible neural architecture search, In: Proceedings of the 36–th International Conference on Machine Learning, 2019, 7105–7114.
[25]	Z. Zhong, J. Yan, W. Wei, J. Shao, C.-L. Liu, Practical block-wise neural network architecture generation, In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, 2423–2432. http://dx.doi.org/10.1109/CVPR.2018.00257
[26]	B. Zoph, Q. V. Le, Neural architecture search with reinforcemente learning, 2017, arXiv: 1611.01578.

This article has been cited by:

Weizhong Chen, Haoxu Wang, Xiaohua Wang, Lei Zhang, Interval Estimation for Switched Neural Networks by Zonotopes, 2025, 0890-6327, 10.1002/acs.4042

Reader Comments

Your name:*

Email:*
© 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)