Analysis of modified Holling-Tanner model with strong Allee effect

Kunlun Huang; Xintian Jia; Cuiping Li; Kunlun Huang; Xintian Jia; Cuiping Li

doi:10.3934/mbe.2023693

Mathematical Biosciences and Engineering

2023, Volume 20, Issue 8: 15524-15543. doi: 10.3934/mbe.2023693

Previous Article Next Article

Research article Special Issues

Analysis of modified Holling-Tanner model with strong Allee effect

LMIB-School of Mathematical Sciences, Beihang University, Beijing 100083, China

Academic Editor: Xiang-Sheng Wang

Received: 13 May 2023 Revised: 20 July 2023 Accepted: 20 July 2023 Published: 26 July 2023

In this paper, we study a predator-prey system, the modified Holling-Tanner model with strong Allee effect. The existence and stability of the non-negative equilibria are discussed first. Several kinds of bifurcation phenomena, which the model may undergo, such as saddle-node bifurcation, Hopf bifurcation, and Bogdanov-Takens bifurcation, are studied second. Bifurcation diagram for Bogdanov-Takens bifurcation of codimension 2 is given. Then, possible dynamical behaviors of this model are illustrated by numerical simulations. This paper appears to be the first study of the modified Holling-Tanner model that includes the influence of a strong Allee effect.

Keywords:

Citation: Kunlun Huang, Xintian Jia, Cuiping Li. Analysis of modified Holling-Tanner model with strong Allee effect[J]. Mathematical Biosciences and Engineering, 2023, 20(8): 15524-15543. doi: 10.3934/mbe.2023693

Related Papers:

[1]	Yunhong Gong, Yanan Sun, Dezhong Peng, Xiangru Chen . Bridge the gap between fixed-length and variable-length evolutionary neural architecture search algorithms. Electronic Research Archive, 2024, 32(1): 263-292. doi: 10.3934/era.2024013
[2]	Xiaoping Zhao, Liwen Jiang, Adam Slowik, Zhenman Zhang, Yu Xue . Evolving blocks by segmentation for neural architecture search. Electronic Research Archive, 2024, 32(3): 2016-2032. doi: 10.3934/era.2024092
[3]	Yiyuan Qian, Kai Zhang, Jingzhi Li, Xiaoshen Wang . Adaptive neural network surrogate model for solving the implied volatility of time-dependent American option via Bayesian inference. Electronic Research Archive, 2022, 30(6): 2335-2355. doi: 10.3934/era.2022119
[4]	Liping Fan, Pengju Yang . Load forecasting of microgrid based on an adaptive cuckoo search optimization improved neural network. Electronic Research Archive, 2024, 32(11): 6364-6378. doi: 10.3934/era.2024296
[5]	Licui Zheng, Yiyao Zhang, Jinwang Liu . The algorithm for canonical forms of neural ideals. Electronic Research Archive, 2024, 32(5): 3162-3170. doi: 10.3934/era.2024145
[6]	Tianbao Liu, Lingling Yang, Yue Li, Xiwen Qin . An improved dung beetle optimizer based on Padé approximation strategy for global optimization and feature selection. Electronic Research Archive, 2025, 33(3): 1693-1762. doi: 10.3934/era.2025079
[7]	Zhensheng Zhou, Lin Wang, Xue Zou, Fei Wang, Zaijun Zhang, Xiaobo Yan . The first hitting time analysis of evolutionary algorithms based on renewal process. Electronic Research Archive, 2024, 32(5): 2994-3015. doi: 10.3934/era.2024137
[8]	Rongrong Bi, Liang Guo, Botao Yang, Jinke Wang, Changfa Shi . 2.5D cascaded context-based network for liver and tumor segmentation from CT images. Electronic Research Archive, 2023, 31(8): 4324-4345. doi: 10.3934/era.2023221
[9]	Shi Wang, Sheng Li, Hang Yu . A power generation accumulation-based adaptive chaotic differential evolution algorithm for wind turbine placement problems. Electronic Research Archive, 2024, 32(7): 4659-4683. doi: 10.3934/era.2024212
[10]	Yazhuo Fan, Jianhua Song, Yizhe Lu, Xinrong Fu, Xinying Huang, Lei Yuan . DPUSegDiff: A Dual-Path U-Net Segmentation Diffusion model for medical image segmentation. Electronic Research Archive, 2025, 33(5): 2947-2971. doi: 10.3934/era.2025129

Abstract

1. Introduction

The rapid advancement of deep neural networks has immensely facilitated the daily life of the general public, yet the burden on neural architecture designers is perennial. Typically, designers must craft a precise neural architecture that is tailored for a specific task, involving professionals in an arduous manual trial-and-error process that is both tedious and time-consuming ^[1]. In order to design efficient neural architectures automatically, neural architecture search (NAS), as a pivotal branch of automated machine learning, has gradually emerged as a focal point of research ^[2].

NAS is implemented gradually by selecting neural architectures from an artificially defined search space and evaluating the selected architectures according to a performance evaluation strategy ^[3]. The current NAS process usually relies on reinforcement learning, evolutionary algorithms, and gradient-based methods ^[4]. The reinforcement learning-based NAS approaches use a controller to select an architecture within a search space ^[5]. NAS methods based on an evolutionary algorithm can generate and retain the neural architecture by mimicking the evolution process of a population ^[6]. The gradient-based NAS methods perform architecture searches by jointly optimizing network parameters and network structure weights ^[7].

Although some experiments have shown that the neural architecture searched by NAS methods based on an evolutionary algorithm is superior to those searched by NAS methods based on reinforcement learning in terms of performance; NAS methods based on an evolutionary algorithm usually consume excessively large amounts of computing resources due to the characteristics of the evolutionary algorithm itself ^[8]. This difficulty reduces the possibility of practical application of NAS methods based on evolutionary algorithms.

NAS methods based on evolutionary algorithms are well known for their powerful ability to explore a search space ^[9]. Neuroevolution, proposed earlier by Floreano et al., uses evolutionary algorithms to search for neural architectures and their corresponding weights simultaneously ^[10]. Evolutionary neural architecture search (ENAS), on the other hand, has only been proposed in recent years and focuses solely on the process of searching neural architectures by using evolutionary algorithms, and its corresponding weights are computed by using gradient-based optimization algorithms after the architecture is determined. ENAS algorithms encode neural structures in the search space into vectors, and then they rely on evolutionary strategies such as crossover, mutation, evaluation, and selection to gradually explore neural architectures and realize excellent performance in the search space. The cost of computing resources and time required to evaluate one neural architecture in a search space is often acceptable, but evaluating thousands of neural architectures, as is required the ENAS approach is, often prohibitively expensive ^[11].

To retain the powerful search ability of ENAS algorithms and reduce the cost of computing resources, there are two kinds of methods to accelerate the efficiency of ENAS. One is proxy metrics-based ENAS, which uses other methods such as low fidelity estimates, learning curve extrapolation, or one-shot models to evaluate the architecture, but this still requires updating of the parameters of the searched neural architectures based on gradients, which adds additional computational overhead ^[12]. The other is surrogate-based ENAS, which uses the surrogate models (or performance predictors) to evaluate the performance of the searched neural architectures that only depend on the information of the architecture itself, without considering the parameters of the searched architectures ^[13]. Reduction of the overhead of computing resources is achieved by building surrogate models to directly predict the performance of candidate architectures, rather than actually evaluating the performance of architectures using concrete datasets.

At present, most ENAS methods based on surrogate models use a regression model as the surrogate model. The surrogate model is trained on the existing historical data (architecture, performance) pairs so that it can learn the mapping of architecture features to its performance and directly predict the performance of candidate architectures ^[13]. For example, graph neural networks (GNNs) ^[14] are used as surrogate models to directly predict the performance of architectures ^[15]. Wei et al. proposed NPENAS, a regression performance predictor-guided evolutionary algorithm method, to enhance the exploration ability of evolutionary algorithms for NAS ^[16]. In addition to directly predicting performance, Wang et al. chose a support vector machine (SVM) as a surrogate model to predict which convolutional neural network (CNN) would achieve better performance ^[17]. In recent years, some researchers have also used the ranking model as a surrogate model to perform performance ranking of candidate neural architectures and realize the selection of neural architectures with high levels of performance ^[18]. However, these methods are often affected by poor architectures of the historical data and their own distribution, which reduce the reliability of the surrogate model and degrades its functionality when exploring the architectures in a search space.

In this paper, to solve the ENAS problems mentioned above, we propose a novel approach, i.e., similarity surrogate-assisted ENAS with dual encoding strategy (SSENAS), to realize a computationally efficient ENAS by simultaneously accelerating the performance evaluation process and improving the search strategy efficiency. The contributions of this paper are summarized as follows:

1) To prevent the surrogate model from being affected by inferior architecture information, we propose a basic online learning framework for SSENAS. We propose the use of a surrogate model based on similarity to reliably select excellent neural architectures from a large number of candidate architectures. The neural architectures with excellent performance can be found step by step from a search space by using the proposed method.

2) In order to represent the information of one architecture more comprehensively, we propose a novel encoding strategy based on inter-node information transmission to encode an architecture. It ensures that the scale of variables within each dimension of architecture encoding is consistent, and it achieves a more efficient representation of architecture information.

3) To generate neural architectures with excellent performance during evolution, we have designed a novel encoding strategy for the generation of candidate architectures. By decoupling the connection between the architecture nodes and the operation types of the architecture nodes, it explores architectures that are similar to the current excellent architecture in the search space through evolutionary operations, so as to improve the efficiency of the algorithm and reduce the cost of computing resources.

2. Related work

In this section, we briefly introduce the related work, including ENAS and surrogate model methods used to accelerate the NAS process. Based on the above, the limitations of the traditional surrogates, which are based on regression or ranking models, are explained in Section 2.2.

2.1. ENAS

NAS can be mathematically described as an optimization problem as follows:

$\begin{equation} \begin{aligned} \min\limits_\alpha \quad &Loss(\alpha, D_{train}, D_{val}) \\ &s.t., \quad \alpha \in \Omega \end{aligned} \end{equation}$

(2.1)

where $\Omega$ represents the search space containing all candidate architectures, and $\alpha$ represents one architecture in the search space. High-performance neural architectures are obtained by minimizing the loss ( $Loss$ ) of architectures on the validation set $D_{val}$ after being trained on the training set $D_{train}$ . NAS usually consists of three components: a search space, search strategy, and performance estimation strategy ^[12]. The process of NAS is as follows. First, an architecture $\alpha$ is selected by the search strategy from a predefined search space $\Omega$ . Then, the architecture is passed to a performance estimation strategy, which returns the estimated performance of $\alpha$ to the search strategy. NAS methods that use evolutionary algorithms instead of ordinary search strategies are often called ENAS methods. ENAS searches for excellent neural architectures in the search space by implementing evolutionary algorithms with balanced global and local exploration capabilities.

The process of ENAS is as follows. First, some neural architectures are initialized from the defined search space, and these architectures (individuals) are evaluated to obtain their fitness values. Then, the parent individuals are selected via environmental selection, and the offspring individuals are obtained via genetic operations that depend on the parent individuals. The offspring individuals are then evaluated and their fitness values are obtained. Finally, parent and offspring individuals are merged, and new parent individuals are selected to enter the next generation through environmental selection. Relying on the powerful exploration capability of evolutionary algorithms, ENAS is able to search for excellent neural architectures in a search space. For example, Xie and Yuille. earlier represented the architecture in a search space as a fixed-length binary string and applied genetic algorithms to automate the process of learning the architecture of deep CNNs ^[19]. Sun et al. proposed EvoCNN, which uses a variable-length encoding strategy to encode architectures, and they realized the automatic evolution of the architectures and the weights of the CNN for image classification problems ^[20]. Huang et al. used the particle swarm optimization (PSO) algorithm to evolve compact CNN architectures in a modified parameter-efficient MBConv blocks search space ^[6]. Xue et al. proposed SaMuNet, which relies on an adaptive mutation strategy and semi-complete binary competition strategy to automatically construct CNN architectures in the search space that has been constructed based on ResNet and DenseNet blocks ^[21]. By encoding architectures through encoding strategies and relying on evolutionary algorithms to explore a predefined search space step by step, excellent neural architectures can be generated and selected from the search space.

2.2. Surrogate model for NAS

Full evaluation of a neural architecture in a search space can be time-consuming and take anywhere from a few minutes to a few hours, depending on many factors, such as the number of neural architecture parameters, the size of the dataset, and the required hardware. In order to accelerate the process of performance estimation for NAS, a surrogate model has been developed proposed to assist or replace the process of evaluating neural architectures. Surrogate models are also known as performance predictors, and earlier surrogate models focused on designing regression models to directly predict the performance of candidate neural architectures. For example, Deng et al. proposed Peephole, which uses an end-to-end approach to jointly optimize long short-term memory (LSTM) and multilayer perceptron (MLP) to make direct predictions about neural architecture performance ^[22]. Tang et al. proposed a semi-supervised assessor to evaluate the neural architectures by predicting their performance directly ^[23]. Wen et al. used the graph convolutional network (GCN) regression model as a surrogate model to directly predict the performance of neural architectures ^[15]. However, the reliability of surrogate models based on regression models is often reduced due to inconsistent data distribution. Especially in ENAS, the performance distribution of the first sampled architectures is inconsistent with that of the later generated architectures, which makes the regression model-based surrogate model unable to accurately predict the performance of the neural architecture.

However, rather than understanding the exact performance of a set of neural architectures, we are actually more concerned about which neural architecture will lead to better performance. In recent years, more and more ranking-based surrogate models have been proposed to predict the ranking of neural architectures. For example, Xu et al. proposed ReNAS to rank multiple neural architectures before training ^[18]. Chen et al. proposed CTNAS, which uses the results of comparisons between architectures as a reward to search for promising architectures ^[24]. Huang et al. proposed a task-transferable NAS method by training a pairwise relation predictor to solve graph-ordering problems ^[25]. However, the performance of these ranking-based surrogate models is still affected by poor architectures, and the performance of the surrogate models is also affected by the imbalanced distribution of the samples used to train the surrogate models. To solve these problems, in this paper, we propose a novel surrogate model based on similarity measurement, which is different from the surrogate models based on regression or ranking. The surrogate model based on similarity pays more attention to the architecture with good performance, and the performance of the model is not affected by the existing poor architectures.

3. Methods

In this section, we go into the details of the proposed algorithm. First, the overall framework of the proposed algorithm is introduced through the pseudo-code and the figure in Section 3.1. Then, a similarity-based surrogate model is described in detail in Section 3.2. Finally, an encoding strategy for surrogate evaluation is described in Section 3.3, and an encoding strategy for generating candidate neural architectures is described in Section 3.4.

3.1. Overall framework

The overall framework of the proposed surrogate-assisted ENAS is shown in . In this framework, since there are additional data that can be actively collected during the evolutionary process to update the surrogate and guide the search, we refer to this type of framework as an online learning framework. In addition, Algorithm 1 gives the pseudo-code of the whole process to explain the proposed algorithm in more detail. Given a search space $\Omega$ , this algorithm can obtain an excellent neural architecture in the search space.

Figure 1. The overall framework of the proposed method.

DownLoad: Full-Size Img PowerPoint

Algorithm 1 The framework of the proposed method

Input:

$\Omega$ : Search space;

$N$ : Population size;

$M$ : New individuals' size; maximum iterations.
Output:

$\alpha^*$ : A high-performance architecture from

$\Omega$ .
1:

$t\leftarrow 0$ : Initialize an iteration counter.
2:

$Arcs^0 \leftarrow \lbrace \alpha_1^0, \alpha_2^0, \dots, \alpha_N^0 \rbrace$ : Randomly initialize a set of architectures from

$\Omega$ as the initial population.
3:

$Y^0 \leftarrow \lbrace y_1^0, y_2^0, \dots, y_N^0 \rbrace$ : Evaluate

$Arcs^0$ to get their ground-truth performance on a specific task.
4:

$S_{w}^{0}$ : A surrogate model designed for coarse-grained screening of architectures in

$\Omega$ and trained by

$(Arcs^0, Y^0)$ pairs.
5: while

$t <$ maximum iterations do
6:

$Inds^{t+1} \leftarrow \lbrace \alpha_1^{t+1}, \alpha_2^{t+1}, \dots, \alpha_M^{t+1} \rbrace$ : Generate massive individuals (new architectures) through crossover and mutation operations by (

$Arcs^t, Y^t$ ) pairs.
7:

$Offs^{t+1} \leftarrow \lbrace \alpha_{s1}^{t+1}, \alpha_{s2}^{t+1}, \dots, \alpha_{sN}^{t+1} \rbrace$ : Select

$N$ individuals from

$Inds^{t+1}$ as the offspring population according to

$S_{w}^{t}$ , where

$N < M$ .
8:

$Y_{offs}^{t+1} \leftarrow \lbrace y_1^{t+1}, y_2^{t+1}, \dots, y_N^{t+1} \rbrace$ : Evaluate

$Offs^{t+1}$ to get their ground-truth performance.
9:

$(Arcs^{t+1}, Y^{t+1}) \leftarrow (Arcs^t, Y^t) \bigcup (Offs^{t+1}, Y_{offs}^{t+1})$ : Update the current information for all authentically evaluated architectures.
10:

$S_{w}^{t+1}$ : Update the weights of

$S_{w}^{t}$ by

$(Arcs^{t+1}, Y^{t+1})$ pairs.
11:

$t \leftarrow t+1$ .
12: end while
13: Select the best architecture

$\alpha^*$ from all authentically evaluated architectures.
14: Return:

$\alpha^*$

First, at the initialization stage, an iteration counter is initialized to count the current number of generations of the evolutionary algorithm. Then, $N$ architectures are randomly initialized according to the search space to act as the initial population of the evolutionary algorithm. Then, the ground-truth performance of these architectures is measured in terms of specific tasks, which is a time-consuming step in ENAS. To speed up the process of architecture evaluation, a surrogate model has been designed to perform coarse-grained screening of the architectures in the search space. The surrogate model is designed, and then its parameters are updated according to the architecture–performance pairs in the initialization population; this endows the surrogate model with the ability to distinguish between excellent architectures and poor architectures in the search space. Second, the main loop proceeds as follows: 1) Evolutionary operations are used to generate a large number, $M$ , of new neural architectures in the search space, relying on the information of the existing evaluated architectures; 2) Then, the surrogate model is used to roughly evaluate the network architectures, and the $N$ architectures with the best performance are selected as the offspring population according to the evaluation metric of the surrogate model; 3) Then, these network architectures are evaluated to obtain their ground-truth performance; 4) All architecture-related information that has been truly evaluated is updated, and followed by the parameters of the surrogate model based on this information. Based on the local and global search ability of the evolutionary algorithm, it can achieve a balance between the exploitation and exploration of the search space. With the increase of evolution generation, the reliability of the surrogate model will gradually increase, which ensures that the surrogate model can select some excellent neural architectures from a large number of neural architectures. To efficiently identify outstanding architectures in the search space while minimizing computational resource usage, we have developed an innovative approach. It introduces a surrogate model that leverages similarity for the screening of architectures at a coarse-grained level, as well as a novel dual encoding strategy for surrogate evaluation and architecture generation.

3.2. Similarity-based surrogate model

Unlike the traditional surrogate models based on regression models, a surrogate model based on similarity measurement has been developed to replace the performance evaluation process for neural architectures, and this serves to screen out the excellent neural architectures from the massive neural architectures in a short period of time. We have constructed a triplet neural network as the surrogate model, $TripletNet$ , and use a triplet loss function, $Loss$ , to train the neural network. The structure of the surrogate model is shown in Figure 2.

Figure 2. The structure of the surrogate model.

DownLoad: Full-Size Img PowerPoint

It takes three encoding vectors, which are introduced in Section 3.3, and maps the encoding vectors to three latent vectors in a latent space, respectively, through a weight-sharing model. Then, $Loss$ is calculated through the latent vectors, and the optimization of the model parameters is achieved by relying on the backpropagation algorithm. Finally, architectures with similar performance are mapped to similar distances in the latent space. Greater closeness of two latent vectors in the latent space means greater similarity of the two latent vectors, and vice versa.

Suppose that the surrogate model based on similarity measurement is represented as $S_{w}$ . The trainable parameters of $S_{w}$ are trained based on set $\mathit{\boldsymbol{[X, y]}}$ , where $\mathit{\boldsymbol{X}} = \{X_{1}, X_{2}, \dots, X_{N}\}$ represents the encoding of the neural architecture and $\mathit{\boldsymbol{y}} = \{y_{1}, y_{2}, \dots, y_{N}\}$ is the relevant performance metric of the corresponding neural architecture. In order to train $S_{w}$ , the first step is to build a triplet $(X_{a}, X_{p}, X_{q})$ based on $\mathit{\boldsymbol{[X, y]}}$ , where $X_{a}$ and $X_{p}$ are of the same class and $X_{a}$ and $X_{q}$ are of different classes; additionally, each tuple in the triple needs to be determined depending on $(y_{a}, y_{p}, y_{q})$ . How to define the similarities between architectures and divide them into different categories is crucial to the training of $S_{w}$ . The algorithm implemented to generate the triplet that is used as a similarity dataset to train $S_{w}$ is shown in Algorithm 2. First, the anchor, $(X_{a}, y_{a})$ , is initialized by random sampling and the similarity threshold, i.e., $H$ , is set. Then, $(X_{t}, y_{t})$ is obtained via the random sampling method, and $(X_{t}, y_{t})$ is determined as $(X_{p}, y_{p})$ or $(X_{q}, y_{q})$ by calculating the similarity between $y_{t}$ and $y_{a}$ . Until the condition is satisfied, the loop ends and the generated triplet is returned.

Algorithm 2 Triplet generation

Input:

$\mathit{\boldsymbol{[X, y]}}$ : Architecture metrics;

$H$ : similarity threshold.
Output:

$(X_{a}, X_{p}, X_{q})$ : A triplet.
1:

$(X_{a}, y_{a})$ : randomly sample

$(X, y)\in\mathit{\boldsymbol{[X, y]}}$ as an anchor.
2:

$(X_{p}, y_{p})\leftarrow None$ ,

$(X_{q}, y_{q})\leftarrow None$ : Initialization.
3: while

$(X_{p}, y_{p}) \; is \; None$ or

$(X_{q}, y_{q}) \; is \; None$ do
4:

$(X_{t}, y_{t})$ : randomly sample

$(X, y)\in\mathit{\boldsymbol{[X, y]}}$ .
5:

$\nabla y \leftarrow y_{t}-y_{a}$ : Calculate the error of

$y_{t}$ and

$y_{a}$ .
6: if

$\nabla y\leq0$ then
7:

$similarity \leftarrow (2/(1+e^{\nabla y}))-1$ .
8: else
9:

$similarity \leftarrow 1-(2/(1+e^{\nabla y}))$ .
10: end if
11: if

$similarity > H$ and

$(X_{p}, y_{p}) \; is \; None$ then
12:

$(X_{p}, y_{p}) \leftarrow (X_{t}, y_{t})$ .
13: end if
14: if

$similarity \leq H$ and

$(X_{q}, y_{q}) \; is \; None$ then
15:

$(X_{q}, y_{q}) \leftarrow (X_{t}, y_{t})$ .
16: end if
17: end while
18: Return:

$(X_{a}, X_{p}, X_{q})$

The input to the surrogate model is a triplet, i.e., $(X_{a}, X_{p}, X_{q})$ , consisting of an anchor example, a positive example, and a negative example. By optimizing the trainable parameters of the surrogate model via a backpropagation algorithm, the similarity measurement between samples is realized. The ultimate optimization goal is to narrow the distance between $X_{a}$ and $X_{p}$ , and to stretch the distance between $X_{a}$ and $X_{q}$ . The triplet loss function is defined as

$\begin{equation} Loss = Max (Distance_{S}-Distance_{D}+m, 0) \end{equation}$

(3.1)

The positive number $m$ in Eq (3.1) is included to ensure the existence of a margin between distances of negative pairs and distances of positive pairs. $Distance_{S}$ and $Distance_{D}$ in the above equation represent the Euclidean distance of similar pairs $(X_{a}, X_{p})$ in latent space and the Euclidean distance of dissimilar pairs $(X_{a}, X_{q})$ in latent space, respectively. Their exact equations are as follows:

$\begin{equation} Distance_{S} = \Vert Model(X_{a})-Model(X_{p}) \Vert_2 \end{equation}$

(3.2)

$\begin{equation} Distance_{D} = \Vert Model(X_{a})-Model(X_{q}) \Vert_2 \end{equation}$

(3.3)

Regarding the loss function, $Loss$ , it contains three cases, as shown in Figure 3:

Figure 3. There are three possible scenarios for loss function optimization.

DownLoad: Full-Size Img PowerPoint

1) Case 1: When $Distance_{S}+m < Distance_{D}$ , it means that the distance between $X_{a}$ and $X_{p}$ plus the margin is less than the distance between $X_{a}$ and $X_{q}$ , that is, the distance between the anchor and the positive sample is closer, while the distance between the anchor and the negative sample is farther. In this case, no optimization is required, and the positive and negative examples are naturally separated.

2) Case 2-1: When $Distance_{S}+m > Distance_{D}$ and $Distance_{S} < Distance_{D}$ , it indicates that the distance between $X_{a}$ and $X_{p}$ plus the margin is greater than the distance between $X_{a}$ and $X_{q}$ , whereas the distance between $X_{a}$ and $X_{p}$ is less than the distance between $X_{a}$ and $X_{q}$ . In this case, there is a loss ( $Loss < m$ ), and the loss is relatively small; additionally, and the surrogate model's trainable parameters are optimized slightly.

3) Case 2-2: When $Distance_{S}+m > Distance_{D}$ and $Distance_{S} > Distance_{D}$ , it indicates that the distance between $X_{a}$ and $X_{p}$ plus the margin is greater than the distance between $X_{a}$ and $X_{q}$ , and the distance between $X_{a}$ and $X_{p}$ is greater than the distance between $X_{a}$ and $X_{q}$ . In this case, there is a loss ( $Loss > m$ ), and the loss is relatively large; also, and the surrogate model's trainable parameters are optimized sharply.

With this loss, we can appropriately adjust the trainable parameters of the surrogate model to map similar neural architectures to locations close to each other in latent space. The surrogate model is used with the evolutionary algorithm. The evolutionary algorithm affords the possibility of jumping out of local minima, while the surrogate model implements the actual process of jumping out of local minima. As for how to detect the fall into local minima, it is simply a matter of observing whether there are architectures in each generation that have been evaluated realistically that are better than the previous generation. If one of the architectures in each generation outperforms all of the architectures in the previous generation, then the algorithm is not trapped in local minima. Conversely, if no architecture in a generation performs better than the optimal architecture in the previous generation, then the algorithm is trapped in local minima.

3.3. Encoding strategy for surrogate evaluation

Proper design of the encoding strategy for neural architecture representation in a search space is crucial for surrogate models. Generally, a neural architecture $G$ can be represented by using a directed acyclic graph (DAG), $G = (V, E)$ , which uses either nodes $V$ to represent operations and edges $E$ to represent information flows, such as NAS-Bench-101 ^[26], or nodes $V$ to represent information and edges $E$ to represent operations, such as NAS-Bench-201 ^[27]. We propose the encoding of neural architectures that can be represented by DAGs through the use of an approach based on information transfer between nodes. It relies on the transmission of information between nodes to obtain more comprehensive information about the neural architectures represented by the DAG. First, the topological ordering of nodes is obtained according to the DAG, and each node is encoded by a one-hot strategy to get representation vector $h$ . Then, the information transmission between nodes is carried out according to the connections between nodes. Specifically, the formula for the $k+1$ th information transmission of node $v$ is shown as follows:

$\begin{equation} h_{v}^{k+1} = h_{v}^{k} + \sum\limits_{u\in N(v)} \frac{h_{u}^{k}}{d_{u}} \end{equation}$

(3.4)

where $h_{v}^{k}$ represents the representation vector after the transmission of the $k$ th information round of node $v, v\in V$ , and $N(v)$ represents all neighbors of node $v$ . $d_u$ represents the degree (out and in) of node $u$ , and $h_u^k$ represents the representation vector after the transmission of the $k$ th information round of node $u$ . When updating the representation vector, the current node should not only consider the representation vector information of the current neighbor node, it should also consider its own representation vector information. Through multiple rounds of information transmission between nodes, a richer representation vector of each node can be obtained, instead of a simple one-hot vector. Since the representation vector corresponding to each node is one-hot encoded at the beginning, after node transmission and aggregation, they still have the characteristics of the same scale of variables in each dimension. It means that there is no need for the re-scaling of a vector of the final architecture encoding. Ultimately, the proposed encoding strategy for surrogate evaluation can be represented by the following equation:

$\begin{equation} X_i = flattened(T_{sort}(I_{transmit}(one\text{-}hot(V), E))) \end{equation}$

(3.5)

where $one\text{-}hot(V)$ indicates that the nodes are one-hot encoded, $I_{transmit}(\cdot)$ denotes information transmission of the nodes according to Eq (3.4), and $T_{sort}(\cdot)$ denotes the topological sorting of nodes. Finally, the vector is flattened into a one-dimensional vector with $flattened(\cdot)$ .

Since the topological ordering of a DAG is often not unique, this can result in multiple different encodings representing the same DAG. This is clearly not what we want. Inspired by an existing paper ^[28], we apply multiple different encodings of a DAG as a data enhancement technique in the hope of achieving adequate training of the surrogate model.

3.4. Encoding strategy for architecture generation

We propose a novel encoding strategy specifically for new architecture generation. The types of operations represented by DAG nodes and the connections between them are decoupled. The new architecture is generated in two steps. The first step only considers the connection mode between nodes, without considering the operation type of each node, and the second step only considers the operation type of each node, without considering the connection mode between nodes. Figure 4 shows an example of new architecture generation in the search space of NAS-Bench-101. The first step of the proposed encoding strategy is shown in Figure 4(a). Given two parent individuals randomly selected from the parent population and represented by respective DAGs, the two DAGs are first decomposed to obtain all of the simple paths constituting each of the two DAGs. Then, using a random selection method, half of the paths from all of the simple paths generated by the two parents are chosen to form a set of paths. Nodes with the same number are merged into one node (shown in green), while nodes with different numbers remain separate (shown in orange and blue). Finally, a new DAG is constructed from this set of paths.

Figure 4. The encoding strategy for architecture generation. Architectures in the NAS-Bench-101 search space are used here as an example: (a) shows the process of constructing a new DAG, and (b) shows the way to generate new offspring operations.

DownLoad: Full-Size Img PowerPoint

This approach is novel, as it is based on an encoding strategy that reconstructs how the internal nodes of an architecture are connected based on simple paths. After the first step is complete, a new DAG is obtained without considering node types. The second step of the proposed encoding strategy is illustrated in . Given the operation type of the corresponding node of the parent individual, the input node and output node of the offspring individual are fixed, while the intermediate nodes are randomly inherited from the two parents with an inheritance probability of 50% each. For example, taking the operation type of child node 2 as an example, it has a 50% probability of being $3\times3Conv$ , as inherited from parent 1, and a 50% probability of being $1\times1Conv$ , as inherited from parent 2. In summary, the first step determines how inner nodes of the architecture are connected, and the second step determines the type of architecture node. By generating a new architecture in this way, a completely new architecture is obtained that is similar to, but different from, the two parent architectures. Finally, it is judged whether the generated architecture is within the defined search space. If it is, the current generated architecture is retained; if not, the current architecture is discarded.

4. Experiments

In this section, the experimental details are presented and the results are analyzed. First, Section 4.1 introduces the NAS benchmarks and evaluation metrics. Then, in Section 4.2, the experimental details on training the similarity-based surrogate model are presented. Next, in Sections 4.3 and 4.4, the experimental results of the proposed method are introduced and compared with those of the current methods. Finally, the ablation studies of the proposed method are presented in Section 4.5.

4.1. NAS benchmarks and evaluation metrics

NAS benchmarks. Experiments in this paper were conducted using NAS benchmarks, which have been specifically designed for the purpose of evaluating and comparing the performance of NAS algorithms, including NAS-Bench-101 ^[26] and NAS-Bench-201 ^[27]. We introduce these NAS benchmarks briefly as follows:

● NAS-Bench-101 is the first NAS benchmark. It provides tabular benchmarks, with a total of 423,624 neural architectures covering the entire proposed search space. Each architecture represented by a DAG, usually called a cell, is trained and evaluated three times on CIFAR-10. A complete neural architecture in the search space consists of a cell stacked multiple times. The node of a cell indicates the candidate operation, and the edge indicates the data flow.

● NAS-Bench-201 has 15,625 neural architectures in a predefined search space. Similar to NAS-Bench-101, a complete neural architecture in NAS-Bench-201 consists of a single cell stacked multiple times. The difference is that cells, in this search space, nodes are used to represent data flow and edges are used to represent candidate operations. Each architecture in the search space has the same topology, but the operations to connect the nodes are different. It provides tabular benchmarks with more diagnostic information on multiple datasets, including CIFAR-10, CIFAR-100, and ImageNet-16-120.

Evaluation metrics. Since the ultimate goal of NAS is to find neural architectures that perform well in a search space, we chose to use the classification accuracy (mean $\pm$ std) of the searched architectures on the validation and test sets as the evaluation metric. In addition, the effectiveness of the proposed method is measured by the number of queries. The lower the number of queries, the more efficient is the method and the less time it consumes. Therefore, the number of queries is also applied as an evaluation metric.

4.2. Training surrogate model

The detailed hyperparameter settings in the experiments are listed in . The number of generations of the evolutionary algorithm was set as 50 in the experiments, and the population size was set to 20. In the initialization phase, 20 architectures are randomly sampled in a search space and their architecture information is obtained by querying the corresponding tabular benchmark. These architectures are encoded as one-dimensional vectors by the encoding strategy proposed in Section 3.3. The encoding length of the architecture in NAS-Bench-101 was set as 35 (five operation types, seven nodes representing operations), and the encoding length of the architecture in NAS-Bench-201 was set as 30 (five operation types, six edges representing operations). The backbone network of $TripNet$ , which is used as a surrogate model, is an MLP, which only has two hidden layers, and the number of neurons in each layer was set as 256 and 128, respectively, for NAS-Bench-101 and NAS-Bench-201. We chose to use the Adam optimizer, with a learning rate of 0.001, to train the surrogate model. As the number of sampled architectures increased, we gradually increased the batch size and epochs. The batch size and epoch were revised for each additional set of 250 samples. All experiments were run on a single GPU card of NVIDIA GeForce RTX 2080 Ti.

Table 1. Hyperparameter settings.

Hyperparameter	Value
Number of generations	50
Population size	20
Number of generated individuals	1000 for NAS-Bench-101 and 5000 for NAS-Bench-201
Encoding length	35 for NAS-Bench-101 and 30 for NAS-Bench-201
Latent vector length	16 for NAS-Bench-101 and 6 for NAS-Bench-201
Number of information transmissions	7 for NAS-Bench-101 and 6 for NAS-Bench-201
Number of hidden-layer neurons	[256,128]
Optimizer	Adam
Learning rate	0.001
Similarity threshold	0.5
Batch size	[32,64,128,256]
Epoch	[100,150,200,250]

| Show Table

DownLoad: CSV

4.3. Experimental results on NAS benchmarks

The experimental results for NAS-Bench-101 are shown in Table 2. Each experiment was run 10 times. We show the experimental results under specific generations during the evolution. The bolded values in the table are the optimal values for the column. The architecture in NAS-Bench-101 has three metrics with different random initializations. We report the accuracy of the architecture searched on the validation set of the first group of metrics and the accuracy of this architecture on the test set of the first group of metrics, as shown in the columns "Validation (trial 0)" and "Test (trial 0)". In addition, we list the average accuracy of the searched architecture on the test set for three metrics, as shown in the last column of Table 2. As can be seen from the table, with the increase of the number of generations, the classification accuracy of the validation set for the searched optimal architecture steadily increases, but its corresponding test set classification accuracy has some fluctuations. This is due to the rank-correlation gap between the validation set and the test set for the tabular benchmark. In order to better show the performance of the algorithm, we also present the boxplots to illustrate the performance of the searched architectures as the number of evolution generations increases, as shown in Figure 5.

Table 2. Experimental results on NAS-Bench-101.

	Validation (trial 0)		Test (trial 0)		Test (average)
Oracle	95.15		94.66		94.32
Generation	Mean $\pm$ std	Best	Mean $\pm$ std	Best	Mean $\pm$ std	Best
1	93.71 $\pm$ 0.42	94.39	92.99 $\pm$ 0.47	93.72	92.91 $\pm$ 0.33	93.59
3	94.28 $\pm$ 0.26	94.73	93.62 $\pm$ 0.32	94.21	93.33 $\pm$ 0.43	94.19
5	94.40 $\pm$ 0.24	94.73	93.71 $\pm$ 0.36	94.38	93.51 $\pm$ 0.45	94.19
7	94.55 $\pm$ 0.16	94.73	93.95 $\pm$ 0.26	94.38	93.67 $\pm$ 0.29	94.18
10	94.66 $\pm$ 0.12	94.91	93.93 $\pm$ 0.22	94.23	93.57 $\pm$ 0.25	93.89
15	94.75 $\pm$ 0.12	94.93	94.00 $\pm$ 0.18	94.23	93.70 $\pm$ 0.14	93.89
20	94.80 $\pm$ 0.10	94.93	93.87 $\pm$ 0.21	94.13	93.69 $\pm$ 0.11	93.90
30	94.87 $\pm$ 0.11	95.02	94.05 $\pm$ 0.27	94.42	93.73 $\pm$ 0.19	94.06
40	94.92 $\pm$ 0.11	95.11	94.17 $\pm$ 0.16	94.34	93.89 $\pm$ 0.13	94.09
50	94.96 $\pm$ 0.09	95.11	94.10 $\pm$ 0.20	94.34	93.83 $\pm$ 0.14	94.09

| Show Table

DownLoad: CSV

Figure 5. Boxplot of 10 experiments on NAS-Bench-101. The abscissa represents the evolution generations, and the ordinate represents the accuracy of the searched optimal architecture on the validation set.

DownLoad: Full-Size Img PowerPoint

Table 3 shows the results of the experiments on NAS-Bench-201. Each experiment was run 10 times. We show the experimental results under specific generations. The bolded values in the table are the optimal values for the column. Each experiment was designed to rely only on the architecture-related information on the validation set for the search. We report the performance of the optimal architectures searched on three datasets, i.e., CIFAR-10, CIFAR-100, and ImageNet-16-120. As can be seen in the table, the proposed algorithm can always search out the architecture with the best performance in the search space within 20 generations for CIFAR-10 and CIFAR-100. This means that, with a population size of 20 per generation, we only need 400 queries to search for architectures that perform well in the search space. With the increase of the number of generations, the accuracy of the searched architectures gradually increases and the standard deviation gradually decreases, which indicates that the proposed algorithm is not only effective, it is also relatively stable. We also show the boxplot (CIFAR-10) for the results of 10 experiments, as shown in Figure 6.

Table 3. Experimental results on NAS-Bench-201.

	CIFAR-10				CIFAR-100				ImageNet-16-120
	Validation accuracy		Test accuracy		Validation accuracy		Test accuracy		Validation accuracy		Test accuracy
Oracle	91.61		94.37		73.49		73.51		46.73		47.31
Generation	Mean $\pm$ std	Best	Mean $\pm$ std	Best	Mean $\pm$ std	Best	Mean $\pm$ std	Best	Mean $\pm$ std	Best	Mean $\pm$ std	Best
1	89.84 $\pm$ 0.44	90.90	93.06 $\pm$ 0.50	93.84	69.75 $\pm$ 0.68	70.78	69.85 $\pm$ 0.69	70.79	43.27 $\pm$ 1.14	44.82	43.32 $\pm$ 1.19	44.76
3	91.20 $\pm$ 0.36	91.61	93.94 $\pm$ 0.33	94.37	71.61 $\pm$ 0.57	72.51	71.60 $\pm$ 0.90	73.14	46.20 $\pm$ 0.37	46.56	45.95 $\pm$ 0.59	46.67
5	91.39 $\pm$ 0.21	91.61	94.15 $\pm$ 0.24	94.37	72.59 $\pm$ 0.66	73.49	72.44 $\pm$ 0.93	73.51	46.41 $\pm$ 0.22	46.73	43.38 $\pm$ 0.26	46.59
7	91.54 $\pm$ 0.09	91.61	94.18 $\pm$ 0.22	94.37	73.03 $\pm$ 0.53	73.49	72.90 $\pm$ 0.63	73.51	46.52 $\pm$ 0.12	46.73	46.47 $\pm$ 0.14	46.59
10	91.57 $\pm$ 0.04	91.61	94.32 $\pm$ 0.17	94.37	73.28 $\pm$ 0.45	73.49	73.20 $\pm$ 0.55	73.51	46.58 $\pm$ 0.10	46.73	46.43 $\pm$ 0.16	46.59
15	91.60 $\pm$ 0.02	91.61	94.32 $\pm$ 0.17	94.37	73.44 $\pm$ 0.12	73.49	73.44 $\pm$ 0.15	73.51	46.68 $\pm$ 0.08	46.73	46.32 $\pm$ 0.18	46.59
20	91.60 $\pm$ 0.00	91.61	94.37 $\pm$ 0.00	94.37	73.49 $\pm$ 0.00	73.49	73.51 $\pm$ 0.00	73.51	46.70 $\pm$ 0.07	46.73	46.24 $\pm$ 0.12	46.59
30	91.60 $\pm$ 0.00	91.61	94.37 $\pm$ 0.00	94.37	73.49 $\pm$ 0.00	73.49	73.51 $\pm$ 0.00	73.51	46.73 $\pm$ 0.00	46.73	46.20 $\pm$ 0.00	46.20
40	91.60 $\pm$ 0.00	91.61	94.37 $\pm$ 0.00	94.37	73.49 $\pm$ 0.00	73.49	73.51 $\pm$ 0.00	73.51	46.73 $\pm$ 0.00	46.73	46.20 $\pm$ 0.00	46.20
50	91.60 $\pm$ 0.00	91.61	94.37 $\pm$ 0.00	94.37	73.49 $\pm$ 0.00	73.49	73.51 $\pm$ 0.00	73.51	46.73 $\pm$ 0.00	46.73	46.20 $\pm$ 0.00	46.20

| Show Table

DownLoad: CSV

Figure 6. Boxplot of 10 experiments on NAS-Bench-201 CIFAR-10. The abscissa represents the evolution generations, and the ordinate represents the accuracy of the searched optimal architecture on the validation set.

DownLoad: Full-Size Img PowerPoint

4.4. Comparison of different methods

We compare the proposed algorithm with several state-of-the-art algorithms. The results for NAS-Bench-101 are shown in . The first column is the name of the algorithm, and the second column, "#Queries", represents the number of neural architectures queried from NAS-Bench-101. The third column of the table represents the average accuracy (mean $\pm$ std) for the three trials corresponding to the neural architecture with the highest validation dataset obtained by the algorithm under a specific number of queries. The fourth column of the table shows how the mean accuracy in the third column ranks over the entire tabular benchmark. It can be ascertained from the table that the proposed algorithm outperforms most algorithms in the case of the same number of queries. Even after only one generation of the evolutionary algorithm (#Queries = 40), the architectures searched still rank relatively high in the overall search space.

Table 4. Comparisons with state-of-the-art algorithms in NAS-Bench-101 search space. "-'' means unavailable results.

Algorithms	#Queries	Accuracy (average)	Rank (%)
NAR (statistics) ^[29]	4236	94.07 $\pm$ 0.09	0.0057
NAR (random) ^[29]	4236	94.06 $\pm$ 0.04	0.0064
Random ^[26,30]	1000	93.54^*	0.8179
RL ^[26,31]	1000	93.58^*	0.6310
BO ^[26,31]	1000	93.72^*	0.2226
RE ^[26,31]	1000	93.72^*	0.2226
Peephole ^[22,28]	1000	93.41 $\pm$ 0.34	1.6390
E2EPP ^[28,32]	1000	93.77 $\pm$ 0.13	0.1447
SSANA ^[23,28]	1000	94.01 $\pm$ 0.12	0.0113
HAAP ^[28]	1000	94.09 $\pm$ 0.11	0.0040
NAO ^[31,33]	1000	93.74^*	0.1903
RFGIAug ^[34]	1000	94.20^**	0.004
SSENAS (ours)	1000	93.83 $\pm$ 0.14	0.0857
MbML-NAS(GB) ^[35]	860	93.26^*	3.0468
GenNAS-N ^[35,36]	500	93.92^*	0.0323
BANANAS ^[30,37]	500	94.08^*	0.0050
ReNAS ^[18,24]	423	93.90 $\pm$ 0.21	0.0399
CTNAS ^[24]	423	93.92 $\pm$ 0.18	0.0323
Random ^[24]	423	89.31 $\pm$ 3.92	71.276
SSENAS (ours)	400	93.69 $\pm$ 0.11	0.2811
FBNet ^[24,38]	-	92.29 $\pm$ 1.25	18.204
SPOS ^[24,39]	-	89.85 $\pm$ 3.80	63.674
FairNAS ^[24,40]	-	91.10 $\pm$ 1.84	41.115
SSENAS (ours)	40	93.18 $\pm$ 0.26	4.0038
^* represents the mean result of multiple experiments. ^** represents the best result of multiple experiments.

| Show Table

DownLoad: CSV

The results in Table 5 for NAS-Bench-201 show that the proposed method is more competitive than other algorithms. In the case of only 400 queries of the tabular benchmark, the proposed algorithm is able to find the best neural architecture in the search space when applied to the CIFAR-10 and CIFAR-100 datasets. Similarly, on ImageNet-16-120, the proposed method is still superior to other algorithms.

Table 5. Comparisons with state-of-the-art algorithms in NAS-Bench-201 search space. "-'' means unavailable results.

Algorithm	#Queries	CIFAR-10		CIFAR-100		ImageNet-16-120
Algorithm	#Queries	Validation	Test	Validation	Test	Validation	Test
Oracle	N/A	91.61	94.37	73.49	73.51	46.73	47.31
NAR ^[29]	1000	91.44 $\pm$ 0.10	94.33 $\pm$ 0.05	72.54 $\pm$ 0.44	72.89 $\pm$ 0.37	46.16 $\pm$ 0.37	46.66 $\pm$ 0.23
GenNAS-N ^[35,36]	1000	-	94.18 $\pm$ 0.10	-	72.56 $\pm$ 0.74	-	45.59 $\pm$ 0.54
NASWOT ^[41]	1000	89.69 $\pm$ 0.73	92.96 $\pm$ 0.81	69.86 $\pm$ 1.21	69.98 $\pm$ 1.22	43.95 $\pm$ 2.05	44.44 $\pm$ 2.10
MbML-NAS (RF) ^[35]	860	-	93.36 $\pm$ 0.20	-	70.33 $\pm$ 0.85	-	42.99 $\pm$ 4.21
MbML-NAS (GB) ^[35]	860	-	93.03 $\pm$ 0.52	-	70.02 $\pm$ 1.17	-	44.28 $\pm$ 1.42
RFGIAug ^[34]	424	91.43	94.25^*	-	-	-	-
SSENAS (ours)	400	91.61 $\pm$ 0.00	94.37 $\pm$ 0.00	73.49 $\pm$ 0.00	73.51 $\pm$ 0.00	46.72 $\pm$ 0.05	46.24 $\pm$ 0.12
NASWOT ^[41]	100	89.55 $\pm$ 0.89	92.81 $\pm$ 0.99	69.35 $\pm$ 1.70	69.48 $\pm$ 1.70	42.81 $\pm$ 3.05	43.10 $\pm$ 3.16
Random ^[42]	100	91.07 $\pm$ 0.27	93.82 $\pm$ 0.24	71.44 $\pm$ 0.83	71.40 0.83	45.37 $\pm$ 0.63	45.36 $\pm$ 0.67
REA ^[8,42]	100	91.37 $\pm$ 0.25	94.06 $\pm$ 0.29	72.79 $\pm$ 0.69	72.72 $\pm$ 0.72	46.15 $\pm$ 0.45	45.99 $\pm$ 0.51
BANANAS ^[37,42]	100	91.50 $\pm$ 0.15	94.23 $\pm$ 0.30	73.27 $\pm$ 0.57	73.25 $\pm$ 0.63	46.45 $\pm$ 0.25	46.31 $\pm$ 0.31
GP_bayesopt ^[37,42]	100	91.45 $\pm$ 0.23	94.16 $\pm$ 0.31	73.11 $\pm$ 0.65	73.05 $\pm$ 0.75	46.51 $\pm$ 0.25	46.25 $\pm$ 0.34
DNGO ^[42,43]	100	91.41 $\pm$ 0.17	94.08 $\pm$ 0.26	72.71 $\pm$ 0.66	72.66 $\pm$ 0.67	46.11 $\pm$ 0.44	46.00 $\pm$ 0.48
Bohamiann ^[37,42]	100	91.41 $\pm$ 0.18	94.09 $\pm$ 0.26	72.73 $\pm$ 0.64	72.65 $\pm$ 0.66	46.15 $\pm$ 0.45	46.03 $\pm$ 0.48
GCN_Predictor ^[37,42]	100	91.02 $\pm$ 0.32	93.74 $\pm$ 0.33	71.43 $\pm$ 0.72	71.43 $\pm$ 0.77	45.47 $\pm$ 0.72	45.36 $\pm$ 0.78
BONAS ^[42]	100	91.56 $\pm$ 0.10	94.32 $\pm$ 0.15	73.32 $\pm$ 0.40	73.30 $\pm$ 0.46	46.56 $\pm$ 0.19	46.31 $\pm$ 0.30
Arch2vec_RL ^[31,42]	100	91.43 $\pm$ 0.28	94.23 $\pm$ 0.26	73.12 $\pm$ 0.66	73.06 $\pm$ 0.87	46.32 $\pm$ 0.20	46.35 $\pm$ 0.35
Arch2vec_BO ^[31,42]	100	91.51 $\pm$ 0.13	94.31 $\pm$ 0.14	73.38 $\pm$ 0.34	73.44 $\pm$ 0.22	46.33 $\pm$ 0.21	46.35 $\pm$ 0.27
NPENAS-SSRL ^[44], ^[42]	100	91.56 $\pm$ 0.14	94.32 $\pm$ 0.19	73.47 $\pm$ 0.22	73.47 $\pm$ 0.30	46.53 $\pm$ 0.33	45.83 $\pm$ 0.60
NPENAS-CCL ^[42,44]	100	91.57 $\pm$ 0.13	94.32 $\pm$ 0.19	73.48 $\pm$ 0.15	73.49 $\pm$ 0.23	46.62 $\pm$ 0.34	45.61 $\pm$ 0.41
SAENAS-NE ^[42]	100	91.58 $\pm$ 0.09	94.34 $\pm$ 0.12	73.46 $\pm$ 0.18	73.46 $\pm$ 0.20	46.59 $\pm$ 0.14	46.36 $\pm$ 0.26
ReNAS ^[18]	90	90.90 $\pm$ 0.31	93.99 $\pm$ 0.25	71.96 $\pm$ 0.99	72.12 $\pm$ 0.79	45.85 $\pm$ 0.47	45.97 $\pm$ 0.49
^* represents the best result of multiple experiments.

| Show Table

DownLoad: CSV

4.5. Ablation studies

Exploring similarities between architectures with similar performance: Take the NAS-Bench-101 search space as an example; to better understand the correlation between good architectures in the search space and the architectures found, we drew a chord diagram, as shown in . It contains a total of four chord diagrams. The operation types of the nodes have been arranged radially around a circle, and the connection relationships between the operations have been drawn as arcs connecting operations. With the exception of , the remaining three plots present the labels on the outside of the circle at scale intervals of 400 to indicate the number of operations. shows the chord diagram drawn by all architectures in the entire search space. As can be seen in the figure, the distribution of the operations that make up the architectures in the search space NAS-Bench-101 is quite uniform. Not only do the $input$ and $output$ operations appear an equal number of times, but, also, the three operations of $1\times1\; convolution$ , $3\times3\; convolution$ , and $3\times3\; maxpool$ appear an equal number of times. The uniformity of the distribution is also reflected in the number of connections between the operations. In the case of $input$ , for example, the operation has an equal number of connections to the operations of $1\times1\; convolution$ , $3\times3\; convolution$ , and $3\times3\; maxpool$ . Figure 7(b) represents the inter-relationships among the operations of the best 1000 architectures on the validation set. Figure 7(c) represents the inter-relationships among the operations of the best 1000 architectures on the test set. Figure 7(d) shows the mean of the inter-relationships among the operations of 5000 architectures obtained in five experiments. It is clear that the architectures found by our algorithm are very similar to the top 1000 architectures on both the validation and test sets, but they are far from the chord diagram of the entire search space shown in Figure 7(a). Comparing with , respectively, we can see that the number of operations occurring is roughly equal and the number of connections between operations is also roughly equal. In the case of $1\times1\; convolution$ , the number of times it appears in the three chords is very close, at a little over 2800. From the above analysis, it can be stated that there is a certain similarity between good neural architectures in the search space.

Figure 7. Chord diagram showing the inter-relationships among the node types of the architectures. The thicker the chord, the greater the number of operations from the current type of operation to another type of operation.

DownLoad: Full-Size Img PowerPoint

Efficiency of the proposed method: In order to more intuitively show the performance of the proposed algorithm, we chose to plot the validation and test accuracy of all models. As shown in Figure 8, the right figure is a zoomed version of the left figure, with an accuracy of 93.0% to 95.5%. The blue dots represent all of the neural architectures (423,634) in the NAS-Bench-101 search space, and the orange dots represent the 1000 neural architectures searched by the proposed algorithm. The horizontal coordinate is the validation accuracy of trial 0, and the vertical coordinate is the average test accuracy of three trials. It can be intuitively understood from the figure that the proposed algorithm can search the neural architecture while exhibiting excellent performance in the search space. However, due to the inconsistency between the validation accuracy and test accuracy of the neural architectures, the algorithm cannot accurately find the neural architecture with the highest test accuracy.

Figure 8. Validation and test accuracy in NAS-Bench-101. The blue dots represent all neural architectures in the search space, and the orange dots represent the 1000 neural architectures searched by SSENAS.

DownLoad: Full-Size Img PowerPoint

Efficiency of the surrogate model: To demonstrate the effectiveness of the similarity-based surrogate model, we have plotted the architectures selected from a large number of architectures during evolution, as shown in Figure 9. The red dots in the figure are the best architectures that have been evaluated so far, the blue dots are the neural architectures generated by the proposed evolutionary algorithm, and the green dots represent the architectures selected as offspring by calculating the similarity between the architectures generated by the proposed method and the current optimal architectures. As can be ascertained from the figure, the proposed surrogate model can efficiently select the architectures with excellent performance from the massive architectures. For the NAS-Bench-101 experiment, we employed an evolutionary algorithm to produce 1000 architectures per generation. Leveraging the proposed similarity surrogate model, we strategically selected 20 candidate architectures as offspring individuals. This approach obviates the necessity to query the performance of all 1000 architectures, as only the top 20 are considered. Consequently, this results in a significant computational reduction, equivalent to a 98% decrease in the amount of computation required for the thorough evaluation of network architectures.

Figure 9. Example of accuracy on NAS-Bench-101. In the evolution process, according to the currently known optimal architecture (red star), the proposed surrogate model selects the offspring architectures (green dot) from the massive generated architectures (blue dot).

DownLoad: Full-Size Img PowerPoint

Efficiency of the architecture generation: The validity of the proposed method is verified by comparing the Gaussian kernel density function of the initial randomly sampled architectures with those of the new architectures generated via genetic operation. As shown in Figure 10, the blue part represents the accuracy on the validation set of randomly sampled 1000 architectures and the corresponding Gaussian kernel density estimation curve, while the orange part represents the accuracy of 1000 architectures generated by the proposed architecture generation strategy and the corresponding Gaussian kernel density estimation curve. The green part represents the accuracy on the validation set of 20 architectures selected by the surrogate model. It is clear from the figure that, with this architecture generation strategy, we can get better architectures than those obtained by randomly sampling architectures. Of course, we will also get some architectures that are inferior to the current architecture. We chose to use the surrogate model based on similarity measurement to filter out the good architectures to gradually search the search space for the good neural architectures. It can be ascertained from the figure that the evolutionary algorithm can generate architectures that exhibit excellent performance, and the proposed surrogate model can find the architectures with excellent performance. It is worth noting that this figure presents the first round of generation information of the evolutionary algorithm.

Figure 10. Gaussian kernel density estimation curves illustrating the architectures' performance (NAS-Bench-101).

DownLoad: Full-Size Img PowerPoint

Sensitivity analysis for evolutionary parameters: The sensitivity analysis for the parameters of the evolutionary algorithm can help us to better understand the performance of the algorithm and possible directions for improvement. We conducted parameter sensitivity analysis experiments to evaluate the population size and the number of generations. These two parameters were set to widely used values according to the community convention ^[17]: 20^*50, 25^*40, 40^*25, ^* and 50^*20, respectively. For example, 20^*50 means that the population size is 20 and the number of generations is 50 during the evolutionary process. The reason for this setup is to ensure that the total number of query architectures per experiment is 1000. We conducted experiments on NAS-Bench-201, with each set of parameter experiments performed seven or more times. In Figure 11(a), (b), we respectively show the mean and variance of the performance of the searched architectures for different parameter combinations on the validation set and the test set on CIFAR-10 during the search. The information shown in these two figures indicates that the performance of the proposed algorithm will gradually degrade as the population size gradually exceeds the generations of evolution. As can be ascertained from the figure, from the performance trend of the searched architectures, our method can quickly find the architectures with the optimal performance for NAS-Bench-201 (CIFAR-10) on the validation set. It is worth noting that, except for the 50^*20 parameter combination, all parameter combinations can find the architecture with the best performance on the validation set under the condition of less than 400 architectures. This fully demonstrates that our proposed algorithm is not only effective, it is also relatively stable.

Figure 11. Sensitivity analysis for evolutionary parameters on NAS-Bench-201, showing the performance curves for architectures searched by the algorithm during evolution under different parameter combinations.

DownLoad: Full-Size Img PowerPoint

5. Conclusions

In this paper, we have proposed a novel approach called SSENAS for the purpose of efficiently searching for excellent neural architectures by using an evolutionary algorithm. A similarity surrogate model has been designed to prevent the surrogate model from being affected by poor-quality architecture information. It is used to select excellent neural architectures as offspring architectures from the massive candidate architectures generated by the evolutionary algorithm. In addition, we have incorporated the dual encoding strategy for evaluation of the surrogate model and generation of excellent neural architectures, respectively. Among them, the encoding strategy for surrogate evaluation focuses on a more comprehensive representation of neural architecture information, and it is achieved via information transmission and the aggregation of architecture nodes, while the encoding strategy for neural architecture generation realizes the decoupling of connections between architecture nodes and the operation types of architecture nodes and focuses more on exploring the architectures that are similar to the current optimal architecture in a search space. Experimental results on NAS benchmarks show that the proposed algorithm is efficient as a tool to progressively find well-performing neural architectures in the search space. In addition, we also show that neural architectures with similar performance have some similarity in their structures. In the search space of NAS-Bench-101, due to the large search space and low performance correlation between the validation set and the test set, this method can still yield promising architectures, although it cannot yield the optimal architecture. However, in the NAS-Bench-201 search space with a small search space and high performance correlation between the validation set and test set, this method only needs 400 queries to find the best performance architecture. In the future work, we will further study the surrogate model for ENAS based on similarity measurement by incorporating the method of weight inheritance, so as to search for the neural architecture with excellent performance under the condition of low resource overhead. In addition, how to generate neural architectures with excellent performance by using evolutionary algorithms is still the focus of our attention.

Use of AI tools declaration

The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (62376127, 61876089, 61876185, and 61902281), the Natural Science Foundation of Jiangsu Province (BK20141005), and the Postgraduate Research & Practice Innovation Program of Jiangsu Province (KYCX23_1373).

Conflict of interest

Yu Xue and Ferrante Neri are guest editors for [Electronic Research Archive] and were not involved in the editorial review or the decision to publish this article. All authors declare that there are no competing interests.

References

[1]	P. H. Leslie, Some further notes on the use of matrices in population mathematics, Biometrika, 35 (1948), 213–245. https://doi.org/10.2307/2332342 doi: 10.2307/2332342
[2]	J. T. Tanner, The stability and the intrinsic growth rates of prey and predator populations, Ecology, 56 (1975), 855–867. https://doi.org/10.2307/1936296 doi: 10.2307/1936296
[3]	Y. Kuang, Global stability of Gause-type predator-prey systems, J. Math. Biol., 28 (1990), 463–474. https://doi.org/10.1007/BF00178329 doi: 10.1007/BF00178329
[4]	G. J. Butler, S. B. Hsu, P. Waltman, Coexistence of competing predators in a chemostat, J. Math. Biol., 17 (1983), 133–151. https://doi.org/10.1007/BF00305755 doi: 10.1007/BF00305755
[5]	K. S. Cheng, S. B. Hsu, S. S. Lin, Some results on global stability of a predator-prey system, J. Math. Biol., 12 (1982), 115–126. https://doi.org/10.1007/BF00275207 doi: 10.1007/BF00275207
[6]	S. B. Hsu, T. W. Hwang, Hopf bifurcation analysis for a predator-prey system of Holling and Leslie type, Taiwan. J. Math., 3 (1999), 35–53. https://doi.org/10.11650/twjm/1500407053 doi: 10.11650/twjm/1500407053
[7]	S. B. Hsu, T. W. Huang, Global stability for a class of predator-prey systems, SIAM. J. Appl. Math., 55 (1995), 763–783. https://doi.org/10.1137/S0036139993253201 doi: 10.1137/S0036139993253201
[8]	M. A. Aziz-Alaoui, M. D. Okiye, Boundedness and global stability for a predator-prey model with modified Leslie-Gower and Holling-type Ⅱ schemes, Appl. Math. Lett., 16 (2013), 1069–1075. https://doi.org/10.1016/S0893-9659(03)90096-6 doi: 10.1016/S0893-9659(03)90096-6
[9]	C. Xiang, J. C. Huang, H. Wang, Linking bifurcation analysis of Holling-Tanner model with generalist predator to a changing environment, Stud. Appl. Math., 149 (2022), 124–163. https://doi.org/10.1111/sapm.12492 doi: 10.1111/sapm.12492
[10]	Y. H. Du, R. Peng, M. X. Wang, Effect of a protection zone in the diffusive Leslie predator-prey model, J. Differ. Equations, 246 (2009), 3932–3956. https://doi.org/10.1016/j.jde.2008.11.007 doi: 10.1016/j.jde.2008.11.007
[11]	R. P. Gupta, P. Chandra, Bifurcation analysis of modified Leslie-Gower predator-prey model with Michaelis-Menten type prey harvesting, J. Math. Anal. Appl., 398 (2013), 278–295. https://doi.org/10.1016/j.jmaa.2012.08.057 doi: 10.1016/j.jmaa.2012.08.057
[12]	Y. L. Zhu, W. Kai, Existence and global attractivity of positive periodic solutions for a predator-prey model with modified Leslie-Gower Holling-type Ⅱ schemes, J. Math. Anal. Appl., 384 (2011), 400–408. https://doi.org/10.1016/j.jmaa.2011.05.081 doi: 10.1016/j.jmaa.2011.05.081
[13]	C. Ji, D. Jiang, N. Shi, Analysis of a predator-prey model with modified Leslie-Gower and Holling-type Ⅱ schemes with stochastic perturbation, J. Math. Anal. Appl., 359 (2009), 482–498. https://doi.org/10.1016/j.jmaa.2009.05.039 doi: 10.1016/j.jmaa.2009.05.039
[14]	J. Xie, H. Liu, D. Luo, The Effects of harvesting on the dynamics of a Leslie-Gower model, Discrete Dyn. Nat. Soc., 2 (2021), 1–11. https://doi.org/10.1155/2021/5520758 doi: 10.1155/2021/5520758
[15]	Z. Shang, Y. Qiao, Bifurcation analysis of a Leslie-type predator-prey system with simplified Holling type Ⅳ functional response and strong Allee effect on prey, Nonlinear Anal.: Real World Appl., 64 (2022), 103–453. https://doi.org/10.1016/j.nonrwa.2021.103453 doi: 10.1016/j.nonrwa.2021.103453
[16]	Y. Huang, Z. Zhu, Z. Li, Modeling the Allee effect and fear effect in predator-prey system incorporating a prey refuge, Adv. Differ. Equations, 321 (2020), 1–13. https://doi.org/10.1186/s13662-020-02727-5 doi: 10.1186/s13662-020-02727-5
[17]	D. Sen, S. Ghorai, S. Sharma, M. Banerjee, Allee effect in prey's growth reduces the dynamical complexity in prey-predator model with generalist predator, Appl. Math. Modell., 91 (2021), 768–790. https://doi.org/10.1016/j.apm.2020.09.046 doi: 10.1016/j.apm.2020.09.046
[18]	A. Kumar, B. Dubey, Dynamics of prey-predator model with strong and weak Allee effect in the prey with gestation delay, Nonlinear Anal.-Model. Control, 25 (2020), 417–442. https://doi.org/10.15388/namc.2020.25.16663 doi: 10.15388/namc.2020.25.16663
[19]	V. Méndez, C. Sans, I. Lopis, D. Campos, Extinction conditions for isolated populations with Allee effect, Math. Biosci., 232 (2011), 78–86. https://doi.org/10.1103/PhysRevE.99.022101 doi: 10.1103/PhysRevE.99.022101
[20]	J. Ye, Y. Wang, Z. Jin, C. J. Dai, M. Zhao, Dynamics of a predator-prey model with strong allee effect and nonconstant mortality rate, Math. Biosci. Eng., 19 (2022), 3402–3426. https://doi.org/10.3934/mbe.2022157 doi: 10.3934/mbe.2022157
[21]	D. Hu, H. Cao, Stability and bifurcation analysis in a predator-prey system with Michaelis-Menten type predator harvesting, Nonlinear Anal.: Real World Appl., 33 (2017), 58–82. https://doi.org/10.1016/j.nonrwa.2016.05.010 doi: 10.1016/j.nonrwa.2016.05.010
[22]	C. Xiang, J. C. Huang, M. Lu, Degenerate Bogdanov-Takens bifurcation of codimension 4 in Holling-Tanner model with harvesting, J. Differ. Equations, 314 (2022), 370–417. https://doi.org/10.1016/j.jde.2022.01.016 doi: 10.1016/j.jde.2022.01.016
[23]	C. Xiang, J. C. Huang, H. Wang, Bifurcations in Holling-Tanner model with generalist predator and prey refuge, J. Differ. Equations, 343 (2023), 495–529. https://doi.org/10.1016/j.jde.2022.10.018 doi: 10.1016/j.jde.2022.10.018
[24]	C. Arancibia-Ibarra, J. D. Flores, G. Pettet, P. V. Heijster, A Holling-Tanner predator-prey model with strong Allee effect, Int. J. Bifurcation Chaos, 29 (2019), 1–16. https://doi.org/10.1142/S0218127419300325 doi: 10.1142/S0218127419300325
[25]	X. T. Jia, K. L. Huang, C. P. Li, Bifurcation analysis of a modified Leslie-Gower predator-prey System, Int. J. Bifurcat. Chaos, 33 (2023), 1–16. https://doi.org/10.1142/S0218127423500244 doi: 10.1142/S0218127423500244
[26]	J. J. Zhang, Y. H. Qiao, Bifurcation analysis of an SIR model considering hospital resources and vaccination, Math. Comput. Simul., 208 (2023), 157–185. https://doi.org/10.1016/j.matcom.2023.01.023 doi: 10.1016/j.matcom.2023.01.023
[27]	Z. F. Zhang, T. R. Ding, W. Z. Huang, Z. X. Dong, Qualitative Theory of Differential Equations, Amer. Math. Soc., 101 (1992). https://doi.org/10.1090/mmono/101 doi: 10.1090/mmono/101
[28]	L. Perko, Differential Equations and Dynamical Systems, 3 $^{nd}$ edition, Springer-Verlag, New York, 2013. https://doi.org/10.1007/978-1-4613-0003-8
[29]	A. Gasull, Limit cycles in the Holling-Tanner model, Publ. Mat., 41 (1997), 149–167. http://doi.org/10.5565/PUBLMAT_41197_09 doi: 10.5565/PUBLMAT_41197_09

This article has been cited by:

Xiaoping Zhao, Liwen Jiang, Adam Slowik, Zhenman Zhang, Yu Xue, Evolving blocks by segmentation for neural architecture search, 2024, 32, 2688-1594, 2016, 10.3934/era.2024092

Reader Comments

Your name:*

Email:*
© 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Mathematical Biosciences and Engineering

4.4

Metrics

Article views(2242) PDF downloads(140) Cited by(1)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Figures(6) / Tables(1)

Mathematical Biosciences and Engineering

Analysis of modified Holling-Tanner model with strong Allee effect

Related Papers:

Abstract

1. Introduction

2. Related work

2.1. ENAS

2.2. Surrogate model for NAS

3. Methods

3.1. Overall framework

3.2. Similarity-based surrogate model

3.3. Encoding strategy for surrogate evaluation

3.4. Encoding strategy for architecture generation

4. Experiments

4.1. NAS benchmarks and evaluation metrics

4.2. Training surrogate model

4.3. Experimental results on NAS benchmarks

4.4. Comparison of different methods

4.5. Ablation studies

5. Conclusions

Use of AI tools declaration

Acknowledgments

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

Mathematical Biosciences and Engineering

Analysis of modified Holling-Tanner model with strong Allee effect

Related Papers:

Abstract

1. Introduction

2. Related work

2.1. ENAS

2.2. Surrogate model for NAS

3. Methods

3.1. Overall framework

3.2. Similarity-based surrogate model

3.3. Encoding strategy for surrogate evaluation

3.4. Encoding strategy for architecture generation

4. Experiments

4.1. NAS benchmarks and evaluation metrics

4.2. Training surrogate model

4.3. Experimental results on NAS benchmarks

4.4. Comparison of different methods

4.5. Ablation studies

5. Conclusions

Use of AI tools declaration

Acknowledgments

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog