Feature selection (FS) is a classic and challenging optimization task in the field of machine learning and data mining. Gradient-based optimizer (GBO) is a recently developed metaheuristic with population-based characteristics inspired by gradient-based Newton's method that uses two main operators: the gradient search rule (GSR), the local escape operator (LEO) and a set of vectors to explore the search space for solving continuous problems. This article presents a binary GBO (BGBO) algorithm and for feature selecting problems. The eight independent GBO variants are proposed, and eight transfer functions divided into two families of S-shaped and V-shaped are evaluated to map the search space to a discrete space of research. To verify the performance of the proposed binary GBO algorithm, 18 well-known UCI datasets and 10 high-dimensional datasets are tested and compared with other advanced FS methods. The experimental results show that among the proposed binary GBO algorithms has the best comprehensive performance and has better performance than other well known metaheuristic algorithms in terms of the performance measures.
Citation: Yugui Jiang, Qifang Luo, Yuanfei Wei, Laith Abualigah, Yongquan Zhou. An efficient binary Gradient-based optimizer for feature selection[J]. Mathematical Biosciences and Engineering, 2021, 18(4): 3813-3854. doi: 10.3934/mbe.2021192
Related Papers:
[1]
Amir Yasseen Mahdi, Siti Sophiayati Yuhaniz .
Optimal feature selection using novel flamingo search algorithm for classification of COVID-19 patients from clinical text. Mathematical Biosciences and Engineering, 2023, 20(3): 5268-5297.
doi: 10.3934/mbe.2023244
[2]
Ziyu Jin, Ning Li .
Diagnosis of each main coronary artery stenosis based on whale optimization algorithm and stacking model. Mathematical Biosciences and Engineering, 2022, 19(5): 4568-4591.
doi: 10.3934/mbe.2022211
[3]
Lu Yu, Yuliang Lu, Yi Shen, Jun Zhao, Jiazhen Zhao .
PBDiff: Neural network based program-wide diffing method for binaries. Mathematical Biosciences and Engineering, 2022, 19(3): 2774-2799.
doi: 10.3934/mbe.2022127
Rami AL-HAJJ, Mohamad M. Fouad, Mustafa Zeki .
Evolutionary optimization framework to train multilayer perceptrons for engineering applications. Mathematical Biosciences and Engineering, 2024, 21(2): 2970-2990.
doi: 10.3934/mbe.2024132
[6]
Xiaodan Zhang, Shuyi Wang, Kemeng Xu, Rui Zhao, Yichong She .
Cross-subject EEG-based emotion recognition through dynamic optimization of random forest with sparrow search algorithm. Mathematical Biosciences and Engineering, 2024, 21(3): 4779-4800.
doi: 10.3934/mbe.2024210
[7]
Omar A. M. Salem, Feng Liu, Ahmed Sobhy Sherif, Wen Zhang, Xi Chen .
Feature selection based on fuzzy joint mutual information maximization. Mathematical Biosciences and Engineering, 2021, 18(1): 305-327.
doi: 10.3934/mbe.2021016
Jiali Tang, Yan Wang, Chenrong Huang, Huangxiaolie Liu, Najla Al-Nabhan .
Image edge detection based on singular value feature vector and gradient operator. Mathematical Biosciences and Engineering, 2020, 17(4): 3721-3735.
doi: 10.3934/mbe.2020209
[10]
C Willson Joseph, G. Jaspher Willsie Kathrine, Shanmuganathan Vimal, S Sumathi., Danilo Pelusi, Xiomara Patricia Blanco Valencia, Elena Verdú .
Improved optimizer with deep learning model for emotion detection and classification. Mathematical Biosciences and Engineering, 2024, 21(7): 6631-6657.
doi: 10.3934/mbe.2024290
Abstract
Feature selection (FS) is a classic and challenging optimization task in the field of machine learning and data mining. Gradient-based optimizer (GBO) is a recently developed metaheuristic with population-based characteristics inspired by gradient-based Newton's method that uses two main operators: the gradient search rule (GSR), the local escape operator (LEO) and a set of vectors to explore the search space for solving continuous problems. This article presents a binary GBO (BGBO) algorithm and for feature selecting problems. The eight independent GBO variants are proposed, and eight transfer functions divided into two families of S-shaped and V-shaped are evaluated to map the search space to a discrete space of research. To verify the performance of the proposed binary GBO algorithm, 18 well-known UCI datasets and 10 high-dimensional datasets are tested and compared with other advanced FS methods. The experimental results show that among the proposed binary GBO algorithms has the best comprehensive performance and has better performance than other well known metaheuristic algorithms in terms of the performance measures.
1.
Introduction
With the rapid development of information technology, big data has been a persistently hot topic and the hotly anticipated artificial intelligence relies on the development of big data [1]. One of the reasons why big data has become a buzzing topic is that due to the increasing computing power of computers, the datasets to be processed have become larger and larger, where the datasets contain more and more attributes, making the task of machine learning in data mining complicated. If a dataset contains n features, then 2n solutions need to be generated and evaluated [2,3]. If n is small, then the total number of feature subsets is small and the optimal feature subset can usually be obtained by exhaustive search. However, when n becomes large enough to reach a defined value, it is no longer possible to enumerate all the feature subsets, which is computationally expensive. How to handle these large datasets becomes paramount. Because these datasets often contain unimportant, redundant, and noisy features that reduce the efficiency of the classifier, choosing the right features is the key to solving this problem [4,5]. Pre-processing functions in the data mining process include Feature selection (FS), which aims to reduce the dimensionality of the data by eliminating irrelevant, redundant, or noisy features, thereby improving the efficiency of machine learning algorithms, such as classification accuracy [6]. FS approach, an important technique in machine learning and data mining, has been extensively researched over the past 20 years. Feature selection has been widely applied in areas including e.g., text classification [7,8], face recognition [9,10], cancer classification [11], genetic classification [12,13], financial [14], recommendation systems [15], customer relationship management [16], cancer diagnosis [17], image classification [18], medical technology [19], etc.
As usually, FS methods can be classified as Filter [20,21] or Wrapper [22,23], depending on whether they are independent of the subsequent learning algorithm. Filter is independent of the subsequent learning algorithm and generally uses the statistical performance of all training data to evaluate features directly, which has the advantage of being fast, but the evaluation deviates significantly from the performance of the subsequent learning algorithm. Wrapper uses the training accuracy of subsequent learning algorithms to evaluate a subset of features, which has the advantage of being less biased but is computationally intensive and relatively difficult for large data sets. It is based on the fact that the selected subset is ultimately used to construct the classification model so that if the features that achieve high classification performance are used directly in the construction of the classification model, a classification model with high classification performance is obtained. This method is slower than the Filter method, but the size of the optimized feature subset is much smaller, which is good for identifying key features; it is also more accurate, but less generalized and has higher time complexity.
There is another FS method which is the embedded FS method [24,25]. In the filtered and wrapped FS methods, the FS process is clearly separated from the learner training process. In contrast, embedded FS automatically performs FS during the learner training process, which is an integration of the FS process and the learner training process, and both are done in the same optimization process, i.e., FS is performed automatically during the learner training process. Embedded FS is most commonly used for L1 regularization and L2 regularization. Generally, the larger the regularization term, the simpler the model and the smaller the coefficients, when the regularization term increases to a certain level, all the feature coefficients will tend to 0. In this process, some of the feature coefficients will become 0 first, and the feature selection process is realized. Logistic regression, linear regression, and decision tree can be used as base learners for regularized feature selection, and only algorithms that can get feature coefficients or can get feature importance can be used as base learners for embedded selection.
The FS process can be divided into supervised feature selection [26] and unsupervised feature selection [27,28], depending on whether the original data sample contains information about the pattern category or not. Supervised feature selection is the process of selecting a feature set using the relationships between features and between features and categories, given a pattern category. Unsupervised feature selection refers to the selection of features in the original dataset by the relationship between the features themselves in the dataset.
How to adopt the right method to solve the FS problem is crucial. Through the study of many researchers, the FS problems can be solved by using a variety of search methods, such as exhaustive search, greedy algorithms, and random search. However, most of the existing FS methods tend to fall into local optima or are computationally expensive. FS problem is an NP-hard problem where the optimal solution can only be guaranteed by exhaustive search. The use of metaheuristics makes it possible to obtain suboptimal solutions without examining the entire space of solutions. The superiority of metaheuristics stems from the ability to find an acceptable solution in a reasonable amount of time. [29] Recently, metaheuristic algorithms have shown superior performance in FS problems to find the best features. [30,31] Genetic Algorithms (GA) [32], Particle Swarm Optimization (PSO) [33], Ant Colony Optimization (ACO) [34], Whale Optimization Algorithm (WOA) [35], Grey Wolf Optimization (GWO) [36], Grasshopper Optimization Algorithm (GOA) [37], Gravitational Search Algorithms (GSA) [38], Slime Mould Algorithm (SMA) [39], Hunger Games Search (HGS) [40], Gradient-based optimizer (GBO) [41], and Dragonfly Algorithms (DA) [42] are some of the well-known metaheuristic algorithms. Compared to traditional algorithms, metaheuristics algorithms have better solutions.
The Gradient-based optimizer (GBO) is a novel gradient-based Newton's method [43] optimization algorithm proposed by Iman Ahmadianfar in 2020. The GBO is a new metaheuristic approach with population-based characteristics inspired by the gradient-based Newton's method, uses two main operators: gradient search rule (GSR) and local escaping operator (LEO), and a set of vectors to explore the search space. In general, the GBO has better optimization outcomes compared to other established metaheuristics. The GBO demonstrates its best performance among competitor algorithms in terms of exploration and exploitation. The GSR adopts the gradient-based approach, which enhances the exploration tendency and speeds up the convergence, hence obtaining a better position in the search space. The LEO enables the proposed GBO to get rid of local optima. The original version of GBO was designed to solve continuous optimization problems. However, many optimization problems (e.g., FS) have discrete decision variables and search spaces, and for discrete problems with discrete search spaces, the GBO does not provide the best solution. We are inspired to improve the binary version of the GBO by the superior performance of the GBO in continuous search space. In this paper, a binary version of GBO is proposed and designed to solve the FS problem. In this approach, two transfer functions [44] belonging to two families (i.e., S-shaped and V-shaped) are used to convert the continuous solution into the binary solution. This method is called BGBO_S and BGBO_V. The proposed method provides a new idea for solving the FS problem, which should not be limited to the improvement of the inherent algorithm, but can try to get good results as well when the newly proposed algorithm is applied to the FS problem. Meanwhile, many real-world problems are discrete, and GBO itself is a metaheuristic algorithm with a good mechanism. Since it has just been proposed, there is no binary version yet. The proposed method also provides a wider application of the GBO algorithm
The remaining paper is organized as follows. The related works on feature selection are described in Section 2. The original GBO is introduced in Section 3. The proposed binary GBO (BGBO) is proposed in Section 4. In Section 5, the BGBO are tested on 18 standard UCI datasets and 10 high-dimensional datasets, and the results are compared with the well known binary metaheuristic algorithms. We conclude in Section 6.
2.
Related works
There has been quite a bit of work in the area of metaheuristics for the FS problem, and variables and FS problem have been the focus of research in many application areas [45], and FS plays an important role in classification [46]. A large number of metaheuristic algorithms for FS problem have been proposed in the literature. The search capability of binary metaheuristic algorithms is dominated by wrapper-based feature selection algorithms. Traditional optimization methods cannot solve complex optimization problems better and it is difficult to obtain satisfactory solutions. Thus, a more effective method, the metaheuristic algorithm, has been proposed and applied by an increasing number of scholars. Metaheuristic algorithms are the product of combining stochastic algorithms and local search algorithms. Metaheuristic algorithms can be divided into four categories: evolutionary algorithms, swarm intelligence, physics-based methods, and human-based methods.
The first category of metaheuristic algorithms is inspired by the Darwinian Theory of evolution, and evolutionary algorithms simulate the rules of evolution in nature. Genetic algorithms (GA) [32] are one of the representatives of evolutionary algorithms. The three steps of selection, crossover, and mutation are the main steps of genetic algorithms to update the population for optimization purposes. There are various improved versions of genetic algorithms as well as applications to feature selection problems. Siedlecki et al. [47] used genetic algorithms (GA) to select features and find near-optimal feature subsets from large feature sets. Leardi et al. [48] demonstrated that the subsets of variables selected by genetic algorithms are generally more efficient than those obtained by classical feature selection methods. Oh et al. [49] proposed a novel hybrid genetic algorithm for feature selection. The method embeds the local search operation into the hybrid GA to tune the search, which improves the algorithm performance and effectively controls the subset size. The convergence of the algorithm is enhanced. In addition, evolutionary algorithms include Differential Evolution (DE) algorithms [50], and Xue et al. [51] studied Differential Evolution (DE) for multi-objective feature selection in classification.
The second class of metaheuristic algorithms: swarm intelligence, inspired by the social behavior of animals in a herd, shares information about all individuals in the optimization process. Particle Swarm Optimization (PSO) [33], Gray Wolf Optimizer (GWO) [36], Ant Colony Optimization (ACO) [34], Artificial Bee Colony (ABC) [52], Whale Optimization Algorithm (WOA) [35], Harris hawks optimization (HHO) [53], and Marine Predators Algorithm (MPA) [54] are among the representative algorithms of this class of metaheuristics. In particular, QO Saber et al. [55] proposed a particle swarm optimization algorithm with a logistic regression model and proved that the proposed method has a competitive performance. K Chen et al. [56] proposed hybrid particle swarm optimization (HPSO-SSM) with a spiral mechanism, and the results showed that the HPSO-SSM method has high performance in solving classification problems. B Xue et al. [57] proposed a multi-objective particle swarm optimization (PSO) study for feature selection. Emery, E et al. [58] proposed a novel binary version of gray wolf optimization (GWO) and used it to select the best subset of features for classification purposes. P Hu et al. [59] analyzed the range of values of an under-binary condition and proposed a new parameter update equation to balance the ability of global search and local search. The proposed method introduces new transfer functions and proposes new parameter update equations. These methods show remarkable performance and have fast convergence, but they tend to fall into local optimal states. Q. Tu et al. [60] proposed a multi-strategy ensemble GWO (MEGWO). The proposed MEGWO contains three different search strategies to update the solution. Mafarja et al. [61] proposed two binary variants of the WOA algorithm to search for the optimal feature subset for classification purposes. MM Mafarja et al. [62] used two hybridization models to design different feature selection techniques based on the whale optimization algorithm (WOA). R. K Agrawal et al. [63] proposed the quantum whale optimization algorithm (QWOA) for feature selection, which is a combination of quantum concept and whale optimization algorithm (WOA). The approach enhances the versatility and convergence of the classical WOA for feature selection and extends the prospects of nature-inspired feature selection methods based on high-performance but low-complexity wrappers.
The third category of metaheuristic algorithms: physics-based methods, inspired by the physical laws of nature, simulate the physical laws in the optimization process to find the best. Common algorithms include Simulated Annealing (SA) [64], Gravitational Search Algorithm (GSA) [38], Lightning Search Algorithm (LSA) [65], Multi-verse Optimizer (MVO) [66], Electromagnetic Field Optimization (EFO) [67], Chemical Reaction Optimization (CRO) [68], and Henry Gas Solubility Optimization (HGSO) [69]. Meiri et al. [70] used simulated annealing (SA) method for specifying large-scale linear regression models. Lin et al. [71] proposed a simulated annealing (SA) method for parameter determination and feature selection of SVM, called SA-SVM. Rashedi et al. [72] introduced a binary version of the algorithm (GSA). Sushama et al. [73] used a wrapper-based approach for disease prediction analysis of medical data using GSA and k-NN. Rao et al. [74] proposed a feature selection technique based on a hybrid of binary chemical reaction optimization (BCRO) and binary chemical reaction optimization-binary particle swarm optimization (HBCRO-BPSO) in this paper to optimize the number of selected features and improve the classification accuracy. This method optimizes the number of features and improves the classification accuracy and computational efficiency of the ML algorithm. However, it still does not attempt to handle ultra-high-dimensional datasets with a large number of samples in FS. Neggaz et al. [75] proposed a novel dimensionality reduction method by using Henry Gas Solubility Optimization (HGSO) algorithm to select important features to improve classification accuracy. The method generates 100% accuracy for classification problems with more than 11,000 features and is valid on both low and high-dimensional datasets. However, HGSO also maintains certain limitations. Since HGSO maintains multiple control parameters, this may compromise its applicability compared to other well-known methods such as GWO [36] and HHO [53].
The last type of metaheuristic algorithms: human-based approaches, inspired by human interactions or human behavior in society. For examples: Teaching–learning-based optimization (TLBO) [76], Imperialist Competitive Algorithm (ICA) [77], Volleyball Premier League Algorithm (VPL) [78] and Cultural Evolution Algorithm (CEA) [79]. In which, Mohan Allam et al. [80] proposed a new wrapper-based feature selection method called the binary teaching learning based optimization (FS-BTLBO) algorithm. Mousavirad et al. [81] proposed an improved imperialist competition algorithm to apply this proposed algorithm to the feature selection process. Keramati et al. [82] proposed a new FS method based on cultural evolution. The proposed methods provide new solutions to the feature selection problem, but there are still problems of insufficient accuracy and excessive number of features.
From the above work related to metaheuristics, we can know that a great number of metaheuristics are successful applied to the FS problem. The above work related to different types of metaheuristic algorithms on feature selection problem proves that metaheuristic algorithms have promising performance in solving FS problems, especially in terms of accuracy. In addition, they can produce better results by using a smaller number of features. However, there are still some problems, such as the number of tested datasets is small and not comprehensive enough to include low-dimensional, high-dimensional or different types of datasets, the problem of slow convergence that exists in most metaheuristics, and the lack of significant effect on the number of features selected for high-dimensional datasets. Based on the No Free Lunch (NFL) rule [83], no single metaheuristic algorithm can solve all problems, which indicates that a particular algorithm may provide very promising results for a set of problems, but the same method may be inefficient for a different set of problems. Gradient-based optimizer is a novel gradient-based metaheuristic algorithm proposed by Iman Ahmadianfar [41] in 2020. The GBO algorithm is a new metaheuristic algorithm that has been recently proposed and has not been systematically applied to feature selection problems. The gradient-based optimization-seeking mechanism (GSR and LEO) of the GBO algorithm makes it feasible to make an appropriate trade-off between exploration and exploitation. Therefore, in this paper, we propose binary GBO for solving FS problems, mapping continuous GBO into discrete forms using S-shaped and V-shaped transfer functions, and try to apply it in solving high-dimensional dataset problems. The following is a brief description of the GBO algorithm.
3.
Gradient-based optimizer (GBO)
The metaheuristic algorithm was first proposed by Iman Ahmadianfar et al. in 2020 to solve optimization problems related to engineering applications. Exploration and exploitation are the two main phases in metaheuristic algorithms, which aim to improve the speed of convergence and/or local optimum avoidance of the algorithm when searching for a target/position. The GBO is managed to create a proper trade-off between exploration and exploitation to uses two main operators: gradient search rule (GSR) and local escaping operator (LEO). A simple introduction of this algorithm is described below:
3.1. Gradient search rule (GSR)
First, GBO proposes the first operator GSR, which helps the GBO to consider stochastic behavior in the optimization process to facilitate the exploration and avoidance of local optima. And the direction movement (DM) is added to GSR, which is used to perform a suitable local search trend to facilitate the convergence speed of the GBO algorithm. Based on the GSR and DM, the following equation is used to update the position of current vector (xmn).
where βmin and βmax are 0.2 and 1.2, respectively, m is the number of iterations, and M is the total number of iterations. randn is a normally distributed random number, andεis a small number within the range of [0, 0.1]. ρ2 can be given by:
ρ2=2×rand×α−α
(5)
Δx=rand(1:N)×|step|
(6)
step=(xbest−xmr1)+δ2
(7)
δ=2×rand×(|xmr1+xmr2+xmr3+xmr44−xmn|)
(8)
where rand(1:N) is a random number with N dimensions, r1,r2,r3andr4r1≠r2≠r3≠r4≠n are different integers randomly chosen from [1, N], step is a step size, which is determined by xbest and xmr1. By replacing the position of the best vector (xbest) with the current vector (xmn) in Eq (1), the new vector (X2mn) can be generated as follows:
Based on the positions X1mn, X2mn, and the current position (Xmn), the new solution at the next iteration (xm+1n) can be defined as:
xm+1n=ra×(rb×X1mn+(1−ra)×X2mn)+(1−ra)×X3mn
(12)
X3mn=Xmn−ρ1×(X2mn−X1mn)
(13)
3.2. Local escaping operator (LEO)
LEO is the second operator introduced by GBO. LEO is introduced to make GBO still effective in the face of complex high-dimensional problems. The LEO generates a solution with a superior performance (xmLEO) by using several solutions, which include the best position (xbest), the solutions X1mn and X2mn, two random solutions xmr1 and xmr2, and a new randomly generated solution (xmk). The solution XmLEO is generated by the following scheme:
where f1 is a uniform random number in the range of [-1, 1], f2 is a random number from a normal distribution with mean of 0 and standard deviation of 1, pr is the probability, and u1,u2,andu3 are three random numbers, which are defined as:
u1={2×randifμ1<0.51otherwise
(16)
u2={randifμ1<0.51otherwise
(17)
u3={randifμ1<0.51otherwise
(18)
where rand is a random number in the range of [0, 1], and μ1 is a number in the range of [0, 1]. The above equations can be simplified:
u1=L1×2×rand+(1−L1)
(19)
u2=L1×rand+(1−L1)
(20)
u3=L1×rand+(1−L1)
(21)
where L1 is a binary parameter with a value of 0 or 1. If parameter μ1 is less than 0.5, the value of L1 is 1, otherwise, it is 0. To determine the solution xmk in Eq (12), the following scheme is suggested.
xmk={xrandifμ2<0.5xmpotherwise
(22)
xrand=Xmin+rand(0,1)×(Xmax−Xmin)
(23)
where xrand is a new solution, xmp is a randomly selected solution of the population (p∈[1,2,⋯,N]), and μ2 is a random number in the range of [0, 1]. Eq (22) can be simplified as:
xmk=L2×xmp+(1−L2)×xrand
(24)
where L2is a binary parameter with a value of 0 or 1. If μ2is less than 0.5, the value of L2is 1, otherwise, it is 0. The pseudo code of the GBO algorithm is shown in Algorithm 1.
Algorithm 1. Pseudo code of the GBO algorithm.
1.Initialization 2. Assign values for parameters pr,εandM 3. Generate an initial population X0=[x0,1,x0,2,⋯,x0,D] 4. Evaluate the objective function value f(X0),n=1,2,..,N. 5. Specify the best and worst solutions Xmbest and Xmworst 6.While (m < M) 7.forn = 1 : N 8.fori = 1 : D 9. Select randomly r1≠r2≠r3≠r4≠n in the range of [1, N] 10. Calculate the position xm+1n,i using Eq (12) 11.end for 12.Local escaping operator 13.ifrand < pr 14. Calculate the position XmLEO using Eq (14) or Eq (15) 15.Xm+1n=xmLEO 16.end 17. Update the positionsXmbest and Xmworst 18.end for 19.m = m + 1 20.end 21. Return Xmbest
The GBO algorithm is a novel population-based metaheuristic search method to solving the continuous problem [84]. The GBO algorithm is derived from a gradient-based search method that uses Newton's method to explore the better regions in the search space. Two operators (i.e., gradient-based rule (GSR) and local escape operator (LEO)) were introduced in GBO and mathematically computed to facilitate the exploration and exploitation of the search. Newton's method is used as a search engine in GSR to enhance the exploration and exploitation process, while LEO is used to deal with complex problems in GBO. GBO shows superior performance in solving optimization problems compared to other optimization methods in the literature. Due to the above advantages of GBO and since it has not been used to solve FS problems, searching for the best subset of features in FS is a challenging problem, especially in wrapper-based methods. This is because the selected subset needs to be evaluated by a learning algorithm (e.g., classifier) in each optimization step. Therefore, a suitable optimization method is needed to reduce the number of evaluations in this paper; we propose a method for solving the FS problem. On this basis, we present the motivation for using this algorithm as a search method in a wrapper-based FS process. According to the nature of FS probability, the search space can be represented by binary values [0, 1], and binary arithmetic is much simpler than continuous arithmetic, so we propose a binary version of GBO to solve the FS problem.
4.2. Our proposed binary GBO (BGBO)
The proposed binary GBO has two key parts, the first one is how to map from continuous values to [0, 1] and the second one is how to update the position of the population. Transfer function [44] is the simplest method in binary GBO, which maps continuous values to [0, 1] and then decomposes them into 0 and 1 based on probability. It preserves the structure of GBO and other operations that move the position of the population in the binary space. The transfer functions are divided into two main categories according to their shapes: S-shaped and V-shaped. Figure 1 shows these two families of transfer functions.
In the S-shaped transfer function, the vectors values are converted into probability values within the [0, 1] range. As shown in Eq (25) [85].
T(Xdi)=11+e−Xdi
(25)
where Xdi represents the position in the d-th dimension of the i-th individual in the GBO algorithm. The location of each vector, based on the probability values obtained from T(Xdi), will be updated.
Xdt+1={1ifrand<T(Xdt)0ifrand⩾T(Xdt)
(26)
where Xdt+1 represents the t-th vectors' position at next-iteration in the d-th dimension.
For next, the Hyperbolic tan (V-shaped) function [86] is another transfer function and mathematical formulation is given below:
T(Xdi)=|tanh(Xdi)|
(27)
where Xdi represents the position in the d-th dimension of the i-th individual in the GBO algorithm.
Based on the probability values obtained from Eq (27), the current position of each vector is updated in the next iteration. Using Eq (28), the search space is transformed into a binary search space.
Xdt+1={¬Xdtifrand<T(Xdt)Xdtifrand⩾T(Xdt)
(28)
Table 1 shows the mathematical expressions of all transfer functions that S-shaped and V-shaped transfer functions are included. There are four S-shaped transfer functions: S1, S2, S3, and S4. There are also four V-shaped transfer functions, namely: V1, V2, V3, and V4.
S-shaped functions and V-shaped functions, with a suitable probability, the exploration and exploitation ability of individuals in the population is the best. And at the beginning of the iteration, the exploration ratio of these functions is high relative to the exploitation ratio. As the value is higher, the probability of individuals in the population changing their current position is higher. These transfer functions are designed to facilitate the exploration and utilization of the binary search space by the individuals of the population. Therefore, both families of transfer functions (S-shaped and V-shaped) are used to transfer continuous solutions of GBO into the binary search space "0" and "1", and the methods are called BGBO_S and BGBO_V. Algorithm 2 is the pseudo-code for this method, as shown in the following. Meanwhile the flow chart of BGBO algorithm is shown in Figure 1.
Algorithm 2 Pseudo code of the BGBO algorithm
1. Initialization 2. Assign values for parameters pr,εandM 3. Generate an initial population X0=[x0,1,x0,2,⋯,x0,D] 4. Evaluate the objective function value f(X0),n=1,2,..,N 5. Specify the best and worst solutions Xmbest and Xmworst 6. while (m < M)
7. forn = 1 : N 8. fori = 1 : D 9. Select randomly r1≠r2≠r3≠r4≠n in the range of [1, N]
10. Calculate the position xm+1n,iusing Eq (12)
11. end for 12. Local escaping operator 13. ifr and < pr 14. Calculate the position XmLEO using Eq (14) or Eq (15)
15. Xm+1n=xmLEO 16. end 17. The probability of converting a population to a binary space 0 or 1. Using Eq (26) or Eq (28)
18. Compute the fitness of all populations 19. Update the positions Xmbest and Xmworst 20. end for 21. m = m + 1 22. end 23. Return Xmbest
There are usually a large number of noisy and irrelevant features in data mining, and for these irrelevant features, partial processing is usually needed, otherwise, it will cause the waste of data processing resources and make the error probability of processing results more difficult, thus increasing the difficulty of the learning task. For example, in the classification task, if there are a large number of irrelevant features, the learning time of the classifier may be longer and the classification accuracy may be lower. As the dimensionality of the data increases and the dataset contains more and more messages, it is very difficult to handle large data, and the cost of computation time increases. Therefore, it needs to reduce the features of the dataset effectively and keep the key features. Datasets are usually represented by a matrix whose rows represent instances (or samples) and columns represent attributes (or features). Feature selection is a common technique used in data mining and machine learning and is an effective and well-proven method. Most researchers focus on high precision and low feature methods, and feature selection is one such method. In this section, the wrapper method is used to implement feature selection. The binary version of the algorithm corresponding to the above eight transfer functions (BGBO_S1, BGBO_S2, BGBO_S3, BGBO_S4, BGBO_V1, BGBO_V2, BGBO_V3, and BGBO_V4) is applied to the feature selection problem.
4.3.1. Fitness function
As already known in the previous subsections, it is significant to choose an effective search strategy for FS methods. Since the proposed method is a wrapper-based approach, then a learning algorithm (e.g., a classifier) should be involved in the evaluation process. In this work, a well-known classifier, namely, the k-NN classifier [87] is used as an evaluator, and the k-NN classifier classifies unlabeled instances by measuring the distance between a given unlabeled instance and its k nearest instances [88]. It is based on the principle that if a majority of the k most similar samples of a sample in a given feature space belongs to a certain class, then that sample also belongs to that class. The classification accuracy of the selected features is incorporated into the proposed fitness function. The classification accuracy obtained is better when the features in the subset are correlated. The classification accuracy obtained is better when the number of features in the subset is smaller. Having a higher classification accuracy is one of the goals of the FS method, and another important goal is to recalculate the number of selected features; the fewer the number of features in the solution, the better the solution will be. The mathematical formulation of the fitness function is shown below:
fitness=αγR(D)+β|M||N|
(29)
where γR(D) represents the classification error rate corresponding to the currently selected feature subset of the classifier, |M| represents the number of features currently selected, |N| is the total number of features, α and β are two weight coefficients to reflect the classification rate and length of the subset, α is the random number between [0, 1], andβ=1−α.
4.3.2. Computational complexity
To have a better understanding of the implementation process of the BGBO feature selection algorithm proposed in this paper, the following subsections analyze the computational complexity of BGBO. The time complexity is first analyzed as follows.
1) Population initialization O (n*d), where n is the population size and d is the dimension size (i.e., the number of features size).
2) The k-NN classifier training takes O (n*m), where n is still the population size and m is the instance size.
3) The time required to evaluate the fitness value of each population individual is O (n).
4) The time required for BGBO execution (i.e., the process of updating the position of population individuals) is O (n*d).
5) The time required to map the population of individuals to the binary space O (n*d).
6) Repeat steps 2-5 above until the maximum number of iterations T is satisfied.
The time complexity required to perform steps 2-6 above is O (n*d*m*T), and the final time complexity is O (n*d*m*T*K) when run independently K times.
The space complexity is then analyzed as follows.
1) The space required for population initialization is O (n*d).
2) The space required for the k-NN classifier is O (m*s), where s is the number of features that have been selected.
From the above analysis, n and s can be ignored, so the space complexity is O (m*d).
The above is the analysis of the time complexity and space complexity of the BGBO algorithm proposed in this paper when applied to feature selection, for which the time complexity and space complexity are mutually affected. When pursuing a better time complexity, the performance of the space complexity may be worse, i.e., it may lead to occupying more storage space; on the contrary, when pursuing a better space complexity, the performance of the time complexity may be worse, i.e., it may lead to occupying a longer running time. Time complexity and space complexity are incompatible, and we need to strike a balance between them.
5.
Experiment results and discuss
The detailed descriptions of the 18 benchmark datasets are shown in Table 2. These datasets have distinct characteristics in nature. These datasets were extracted from the UCI repository [89]. These datasets were used to test the proposed method. All experiments were implemented in MatlabR2017a and executed on an Intel Core i3 machine with a CPU frequency of 3.70 GHz and 8 GB of RAM. The maximum number of iterations was 100.
As a preliminary study, we conducted experiments on the effect of different population sizes on the basic method (i.e., on the classification accuracy of BGBO), and thus evaluated BGBO at different population sizes (i.e., 10, 20, 30, and 50). The PenglungEW dataset was used for the experiments, and the average accuracy and time spent by the classifier were used as evaluation criteria. Because of the large dimensionality of this dataset, the sensitivity is high. The higher sensitivity allows the algorithm to respond significantly too small changes in parameters. Table 3 shows the experimental results for the PenglungEW dataset with different population sizes and the maximum number of iterations.
Table 3.
Average accuracy results and Time according to different combinations of population size (n) and the number of max iteration on the PenglungEW dataset.
The effect of varying the population size (10, 20, 30, and 50) and varying the number of iterations (100 and 150) on the classification accuracy of the dataset PenglungEW can be seen in Table 3, where it can be seen that the range of variation in classification accuracy is small and that increasing the population size does not always improve the results, and similarly, increasing the number of iterations has little effect on the results. Therefore, we set the population size to 10 and set the number of iterations to 100 in all the next experiments as a trade-off between classification accuracy and the overhead of the algorithm running time.
As mentioned in the previous subsection, there are two weighting coefficientsαandβin the fitness function. They reflect the importance of the classification error rate and the number of selected features on the performance of the algorithm, respectively. To further investigate the effect of different weight coefficientsαandβon the experimental results, different combinations are set up for experiments, and Table 4 shows the experimental results for different combinations.
Table 4.
Average accuracy results, Time, Average fitness and Average number of feature according to different combinations of αandβon the PenglungEW dataset.
The results are shown in Table 4. In general, decreasing the value ofβand increasing the value of α, the accuracy has improved, but when α is 0.7 and β is 0.3, the accuracy has reached 1. Therefore, increasing the reference standard, the fitness value, and the number of selected features, according to the results, when the value of α is determined to be 0.99 and the value of β is 0.01, the fitness value and the number of selected features are the best, so to be able to make a reasonable comparison with other methods. In this paper, α and β are set to 0.99 and 0.01 in the subsequent experiments, respectively. Additionally, k in the Euclidean distance formula used to evaluate the feature subset of the k-NN classifier was set to 5 [90].
Table 5.
Parameters setup for the variations of BGBO.
The parameters used in BGBO are shown in Table 5. Each variation of BGBO has tested 30 independent runs. The measurement criteria used in the comparison of BGBO are fitness values, classification accuracy and selection feature size.
1) Fitness values are obtained from fitness function by using the selected features on the benchmark datasets. (The mean, standard deviation of the fitness function is calculated).
2) Classification accuracy is obtained from the classifier by using the selected features on the benchmark datasets.
3) Selection feature size is the mean number of the selected features.
5.1. Comparison between the BGBO based approaches
The results of the average classification accuracy are shown in Table 7. Among them, BGBO_V3 achieved the best results, it managed to reach the highest precision in nine data sets, and in addition, it had the highest precision value among all methods in five data sets, so it was placed in the first place in the overall ranking. BGBO_V2 ranked second, it achieved the highest precision in six data sets, but it ranked better overall, so it ranked second. BGBO_V4 is in third place, and although it has six datasets that achieve the highest precision and additionally has the highest precision value of all methods in one dataset. It ranks relatively low in the overall ranking compared to BGBO_V2. Similarly, BGBO_S3 ranks fourth and BGBO_V1 ranks fifth. BGBO_S1, BGBO_S4, and BGBO_S2 rank sixth seventh, and eighth, respectively. In addition, their standard deviation, we can see that the standard deviation of BGBO_V3 in all nine data sets is 0, which is sufficient to prove that it is BGBO_V3 a robust method.
Table 6.
Comparison between different variations of BGBO on the mean of fitness values (as Mean) and the standard deviation of fitness values (as S.D).
Table 7.
Comparison between different variations of BGBO on the mean of accuracy values (M.Acc) and the standard deviation of accuracy values (as S.D).
In Table 8, the proposed BGBO method is compared in terms of the average number of selected features.BGBO_V3 has the smallest average number of features in 14 datasets, while BGBO_V1 and BGBO_V4 have the smallest values in 7 datasets. BGBO_V3 and BGBO-S3 had the smallest values in 6 data sets, while BGBO_S1, BGBO_S2, and BGBO-S4 had the smallest number of features in 5, 4, and 5 data sets, respectively. As for the standard deviation values, BGBO_V3 proved to be a robust method that obtained small deviation values in the data set of 7. The superiority of BGBO_V3 was also confirmed by the average adaptation results shown in Table 6.
Table 8.
Comparison between different variations of BGBO on the mean number of the selected features (M.NF) and the standard deviation of the mean number of the selected features (as S.D).
5.2. Comparison with other metaheuristic-based approaches
In this section, BGBO_V3, which has the best performance among the eight methods introduced above, is considered as the best method among the methods proposed in this work. The proposed BGBO_V3 is compared with other existing state-of-the-art binary metaheuristics such as BPSO, BWOA, BDA, BBA, BGOA, BGWO, and BHHO. Table 9 shows the used parameters of all algorithms in the next experiments. Again, the performance is compared and analyzed by the average fitness value, the average classification accuracy, and the average number of selected features.
Table 9.
Parameters of BGBO and comparison algorithms.
Table 10 shows the experimental results of the average fitness values of BGBO_V3 and other metaheuristic-based methods. From Table 10, it can be seen that BGBO_V3 has the best results in 89% of the datasets (16 out of 18), which indicates that BGBO_V3 has the best performance. BDA and BBA achieved first place in two datasets, KrvskpEW and SonarEW, respectively, and BPSO, BWOA, BGOA, BGWO, and BHHO are not ranked first in any of the 18 datasets. This indicates that the performance of the proposed method is significant. In addition, it can be seen from Table 10 that BGBO_V3 performs significantly better than other methods in some datasets, such as Lymphography and IonosphereEW. In addition, comparing the standard deviations of the various methods, BGBO_V3 is also more stable. In terms of the final average ranking, BGBO_V3 ranked first, followed by BHHO, BDA, BWOA, BGOA, BBA, BGWO, and BPSO.
Table 10.
Comparison between BGBO_V3 and state-of-art methods based on the mean of fitness values (Mean) and the standard deviation of fitness values (as S.D).
In Table 11, BGBO_V3 is compared with other methods based on the average classification accuracy. Among the 18 datasets, BGBO_V3 ranked first with 13 datasets compared to 1, 1, 4, 6, 2, 3, and 7 for BPSO, BWOA, BDA, BBA, BGOA, BGWO, and BHHO, respectively. The comparison with k-NN classifier also shows that BGBO_V3 has higher classification accuracy. In terms of the number of top-ranked methods, BGBO_V3 performed much better than the other methods. In addition, BGBO_V3 achieves an average classification accuracy of 100% on the Breast_cancer_wisconsin, Zoo, Parliment1984, Iris, Glass, Wine, Segmentation, and Vote datasets. Specifically, the classification accuracy of BGBO_V3 reaches 99% for both the high-dimensional datasets PenglungEW and Coil, indicating that the proposed method in this paper can better solve high-dimensional data. This indicates that the proposed method is important in practical applications. In addition, the standard deviation results show that BGBO_V3 is more stable than other methods on some data sets. The final ranking BGBO_V3 is also in the first place. It is followed by BHHO, BBA, BDA, BGOA, BGWO, BPSO, and BWOA.
Table 11.
Comparison between BGBO_V3 and state-of-art methods based on the mean of accuracy values (M.Acc) and the standard deviation of fitness values (as S.D).
Inspecting the results of the number of selected features from Table 12, we observe that BGBO_V3 outperforms the other algorithms, followed by BPSO, BBA, BDA, BWOA, BHHO, BGWO, and BGOA, with very competitive results. For most of the datasets, the differences between the average numbers of features selected between the algorithms are very small. However, it is worth noting that for the PenglungEW dataset, the highest dimensional dataset with 325 features, BGBO_V3 achieves the best classification accuracy of 99.78% with the smallest number of features (only 12.5667 features on average). As well as for Coil, the highest dimensional dataset with 1024 features, BGBO_V3 achieves the best classification accuracy of 99.46% with the smallest number of features (only 53.4667 features on average). This shows the advantage of this method when dealing with high-dimensional datasets.
Table 12.
Comparison between BGBO_V3 and state-of-art methods based on the mean number of the selected features (M.NF) and the standard deviation of the mean number of the selected features (asS.D).
In terms of the fitness function of feature selection, the fitness function is optimal only when the accuracy and the number of selected features are simultaneously good. From the analysis of the experimental results in Tables 10–12, the fitness value of the BGBO_V3 algorithm proposed in this paper is superior to other algorithms in most data sets. Besides, the BGBO_V3 algorithm obtains a better subset of features while maintaining high accuracy. Although the other algorithms have higher accuracy, the optimal feature subsets obtained by them are not satisfactory. Figures 2–10 shows the average convergence curves of various algorithms on different data sets. It is clear from the convergence plots that BGBO_V3 outperforms the other methods in terms of the average fitness value from the beginning of the iteration. It is worth mentioning that BGBO_V3 has the fastest convergence on most of the datasets, obtaining better results in balancing the exploration and exploitation capabilities.
Figure 2.
Convergence curves for the Breast_cancer_wisconsin and IonosphereEW dataset.
A nonparametric Wilcoxon rank-sum test was performed at the 5% significance level to verify whether there was a statistical difference between the results of the fitness values of BGBO_V3 and the respective comparison methods. Table 13 shows the p-values of BGBO_V3 concerning each comparison method. In terms of fitness values, we observed that BGBO_V3 exhibited statistically significant classification accuracy in all datasets compared to BPSO, BWOA, BDA, and BGWO. Significant differences (p-values less than 0.05) exist for all datasets. Comparing BGBO_V3 with BBA, BGOA, and BHHO, we can note that BGBO_V3 has p-values greater than 0.05 in the SonarEW, Semeion, and KrvskpEW datasets, respectively.
Table 13.
The p-values of Wilcoxon signed-rank test for the fitness value result of the BGBO_V3 vs. other algorithms (p⩾ 0.05 are underlined).
From the previous subsection, we know that BGBO_V3 achieves excellent results on high-dimensional datasets (PenglungEW and Coil). Therefore, to further test the performance of the algorithm proposed in this paper, this section conducts experiments with 10 high-dimensional datasets. These datasets were downloaded from the UCI database and the database of Arizona State University (ASU) [91]. The information (number of instances, features, and classes) of the high-dimensional datasets is shown in Table 14. Tables 15–17 show the experimental results of the average fitness value, average classification accuracy and, the average number of selected features obtained by various methods on the high-dimensional datasets.
Table 15.
Comparison between different variations of BGBO on the mean of fitness values (as Mean) and the standard deviation of fitness values (as S.D).
Table 16.
Comparison between BGBO_V3 and state-of-art methods based on the mean of accuracy values (M.Acc) and the standard deviation of fitness values (as S.D).
Table 17.
Comparison between BGBO_V3 and state-of-art methods based on the mean number of the selected features (M.NF) and the standard deviation of the mean number of the selected features (as S.D).
We first inspect the fitness function averages of all the compared algorithms on the high-dimensional datasets. The results are shown in Table 15. From Table 15, we can notice that BGBO_V3 proposed in this paper achieves the best results in 80% of the datasets (8 out of 10), among the other methods, only BWOA and BBA achieve the best results in ORL and Yale, respectively. In addition, looking at the standard deviation again, we can see that BGBO_V3 achieves 0 for all four datasets (i.e., arcene, PCMAC, RELATHE, and Leukemia), indicating that the method has excellent stability on high-dimensional datasets. In the final comparative ranking, BGBO_V3 is first, followed by BDA, BGWO, BBA, BPSO, BHHO, BWOA, and BGOA in order.
Table 16 shows the experimental results of the average classification accuracy of all methods on high-dimensional datasets. From the comparison of the average classification accuracy, BGBO_V3 ranks first in the average classification accuracy on 8 out of 10 high-dimensional datasets, accounting for 80% of all datasets. Other methods ranked first in the number of datasets far less than BGBO_V3. In particular, the average classification accuracy of BGBO_V3 reached 100% on the three datasets of arcene, RELATHE, and Leukemia. In addition, the standard deviation, it can be seen that BGBO_V3 achieves 0 on all four datasets (i.e., arcene, PCMAC, RELATHE, and Leukemia), which indicates from both the average fitness value and the average accuracy that BGBO_V3 has excellent performance as well as good stability on high-dimensional datasets. In the final ranking of the average classification accuracy comparison, BGBO_V3 is still the first, while the KNN machine learning method is ranked last among all methods. The remaining rankings are BGWO, BDA, BHHO, BBA, BPSO, BWOA, and BGOA in that order.
Finally, Table 17 shows the experimental results of all methods for the average number of selected features. As can be seen from the table, BGBO_V3 can obtain the smallest number of features on all data sets. Especially from the numerical point of view, the feature values obtained by BGBO_V3 are far smaller than those obtained by the other methods, even with a difference of hundreds of times. This indicates that BGBO_V3 can find a smaller subset of features compared to other methods. Similarly, the standard deviation of BGBO_V3 is within the acceptable range. In terms of the final ranking, BGBO_V3 ranks first, followed by BHHO, BDA, BBA, BPSO, BWOA, BGOA, and BGWO. Figures 11–15 show the average convergence curves of the various algorithms on the high-dimensional dataset. From the convergence plots, it is clear that BGBO_V3 achieves a good convergence rate from the beginning of the iteration to about the 50th generation, and the average fitness value is better than the other methods. The convergence curve of BGBO_V3 can be clearly distinguished from 50 to 100 generations, which shows the advantage of the method relative to other methods. This is due to the excellent mechanism of the GBO algorithm itself, which enables us to balance the process of exploration and exploitation in the algorithm.
Figure 11.
Convergence curves for the arcene and ORL dataset.
5.4. Comparisons with metaheuristics in the literature
In this section, the average classification accuracy of BGBO_V3 is compared with other methods in the literature. These experimental results are also obtained in the same experimental setting with a good reference effect. Since the selected datasets are only partially the same, 10 datasets are selected for comparison and analysis in this work, which contains some important low-dimensional, high-dimensional datasets, respectively. Table 18 shows the experimental data of the average classification accuracy of BGBO_V3 with various methods in the literature.
Table 18.
Classification accuracies of the BGBO_V3 versus other metaheuristics from the specialized literature.
From Table 18, we can see that the proposed method in this paper ranks first in seven datasets, including low-dimensional datasets and high-dimensional datasets, respectively, and we can see that BGBO_V3 can achieve excellent results not only in low-dimensional but also in high-dimensional with good performance. Secondly, RBDA has four datasets ranked first, and BSSA_S3_CP has one dataset ranked first. The final ranking of BGBO_V3 is first, and RBDA, BSSA_S3_CP, BGOA-M, WOA-CM, BGSA, GA, and bGWO1 are ranked in order. Thus, the BGBO_V3 proposed in this paper achieves better performance than other methods in solving the FS problem.
In this paper, eight variants of BGBO are proposed using transfer functions. This corresponds to s-shaped and v-shaped eight transfer functions, respectively, for mapping continuous search spatial to discrete search spatial, and in applying to the feature selection problem. The experimental results show that very promising results are achieved compared to the currently popular wrapper-based FS methods. The primary reason is that the algorithm is very reliable and efficient due to the excellent performance of GBO itself, making the algorithm a better choice for the wrapper-based FS method. Moreover, on challenging high-dimensional datasets, the results show clearly that the proposed BGBO has better results in solving high-dimensional problems compared to other methods. Thus, the algorithm has excellent performance and outstanding stability in dealing with challenging problems such as high-dimensional datasets, which shows the potential of the algorithm in solving high-dimensional problems. Among the eight variants proposed in this paper, BGBO_V3 has the best performance. The main reason for the analysis is that different transfer functions have different slopes of curves and different probabilities of changing the position of population individuals. A reasonable probability can balance the exploration and exploitation ability of population individuals, and it is not idealized that the larger the probability of changing the location of the population individuals is better. In addition, an excellent exploration mechanism of the algorithm itself is essential. In the BGBO algorithm based on Newton's method, the GBO algorithm has good two operators (GSR) and (LEO), both of which have their own merit-seeking mechanism, and its perfect mechanism and simple variable parameters balance the exploration and exploitation of the algorithm well, making the individuals better able to make the algorithm proceed in the optimal direction. The transfer function also deserves credit for its own simple and easy-to-understand mechanism. So the combination of the algorithm itself and the transfer function has a clear advantage overall.
6.
Conclusions and future works
In this paper, a binary version of the GBO algorithm is proposed to solve the FS problem, and eight variants of the binary gradient optimizer are proposed using transfer functions (i.e., s-shaped and v-shaped). The transfer functions are used to map the continuous search space to the discrete search space and are used in the proposed algorithm. To benchmark the proposed algorithm, 18 standard UCI benchmark datasets are used. The experimental results show that BGBO_V3 has the best performance among the proposed algorithms. The experimental results compare BGBO_V3 with the most popular and better-performing methods in the literature. By analyzing the experimental results, it is demonstrated that the BGBO_V3 algorithm has a high performance among the existing methods for solving FS problems, especially in high-dimensional data sets. Besides, the proposed algorithm has a high convergence speed, which enables the algorithm to find the exact minimum feature set faster. Combining the above experimental results, process discussion and conclusion analysis, it can be concluded that the proposed binary version of the GBO algorithm has advantages in solving FS, and it is worth considering when facing high-dimensional data sets.
In future work, more practical discrete applications combining the method, such as 0-1 backpack problem, TSP problem, scheduling tasks, etc.. The use of other classifiers as evaluators is also a very good research direction, such as extreme learning machines, neural networks and support vector machines, etc.. The performance of the algorithm can be further investigated by comparing different classifiers.
Acknowledgments
This work is supported by National Science Foundation of China under Grants No. 62066005, and Project of Guangxi Natural Science Foundation under Grant No. 2018GXNSFAA138146.
Conflict of interests
The authors declare no conflict of interest.
References
[1]
M. Chen, S. Mao, Y. Liu, Big data: A Survey, Mobile Netw. Appl., 19 (2014), 171-209. doi: 10.1007/s11036-013-0489-0
[2]
I. Guyon, A. Elisseeff, An introduction of variable and feature selection, J. Mach. Learn Res., 3 (2003), 1157-1182.
[3]
Y. Wan, M. Wang, Z. Ye, X. Lai, A feature selection method based on modified binary coded ant colony optimization algorithm, Appl. Soft Comput., 49 (2016), 248-258. doi: 10.1016/j.asoc.2016.08.011
[4]
H. Liu, H. Motoda, Feature selection for knowledge discovery and data mining, Kluwer Academic, 2012.
[5]
Z. Sun, G. Bebis, R. Miller, Object detection using feature subset selection, Pattern Recogn., 37 (2004), 2165-2176. doi: 10.1016/j.patcog.2004.03.013
[6]
H. Liu, H. Motoda, Feature Extraction, Construction and Selection: A Data Mining Perspective Springer Science & Business Media, Boston, MA, 1998.
[7]
Z. Zheng, X. Wu, R. K. Srihari, Feature selection for text categorization on imbalanced data, ACM Sigkdd Explor. Newsl., 6 (2004), 80-89. doi: 10.1145/1007730.1007741
[8]
H. Uguz, A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm, Knowl.-Based Syst., 24 (2011), 1024-1032. doi: 10.1016/j.knosys.2011.04.014
[9]
H. K. Ekenel, B. Sankur, Feature selection in the independent component subspace for face recognition, Pattern Recogn. Lett., 25 (2004), 1377-1388. doi: 10.1016/j.patrec.2004.05.013
[10]
H. R. Kanan, K. Faez, An improved feature selection method based on ant colony optimization (ACO) evaluated on face recognition system, Appl. Math. Comput., 205 (2008), 716-725.
[11]
F. Model, P. Adorjan, A. Olek, C. Piepenbrock, Feature selection for DNA methylation based cancer classification, Bioinformatics, 17 (2001), S157-S164. doi: 10.1093/bioinformatics/17.suppl_1.S157
[12]
N. Chuzhanova, A. J. Jones, S. Margetts, Feature selection for genetic sequence classification, Bioinformatics, 14 (1998), 139-143. doi: 10.1093/bioinformatics/14.2.139
[13]
S. Tabakhi, A. Najafi, R. Ranjbar, P. Moradi, Gene selection for microarray data classification using a novel ant colony optimization, Neurocomputing, 168 (2015), 1024-1036. doi: 10.1016/j.neucom.2015.05.022
[14]
D. Liang, C. F. Tsai, H. T. Wu, The effect of feature selection on financial distress prediction, Knowl.-Based Syst., 73 (2015), 289-297. doi: 10.1016/j.knosys.2014.10.010
[15]
M. Ramezani, P. Moradi, F. A. Tab, Improve performance of collaborative filtering systems using backward feature selection, in The 5th Conference on Information and Knowledge Technology, (2013), 225-230.
[16]
B. Tseng, Tzu Liang Bill, C. C. Huang, Rough set-based approach to feature selection in customer relationship management, Omega, 35 (2007), 365-383. doi: 10.1016/j.omega.2005.07.006
[17]
R. Sawhney, P. Mathur, R. Shankar, A firefly algorithm based wrapper-penalty feature selection method for cancer diagnosis, in International Conference on Computational Science and Its Applications, Springer, Cham, (2018), 438-449.
[18]
B. Guo, R. I. Damper, S. R. Gunn, J. D. B. Nelson, A fast separability-based feature-selection method for high-dimensional remotely sensed image classification, Pattern Recogn., 41 (2008), 1653-1662. doi: 10.1016/j.patcog.2007.11.007
[19]
R. Abraham, J. B. Simha, S. S. Iyengar, Medical datamining with a new algorithm for feature selection and naive bayesian classifier, in 10th International Conference on Information Technology (ICIT 2007), IEEE, 2007, 44-49.
[20]
L. Yu, H. Liu, Feature selection for high-dimensional data: A fast correlation-based filter solution, in Proceedings of the 20th international conference on machine learning (ICML-03), (2003), 856-863.
[21]
L. Cosmin, J. Taminau, S. Meganck, D. Steenhoff, A survey on filter techniques for feature selection in gene expression microarray analysis, IEEE/ACM Trans. Comput. Biol. Bioinf., 9 (2012), 1106-1119. doi: 10.1109/TCBB.2012.33
[22]
S. Maldonado, R. Weber, A wrapper method for feature selection using support vector machines, Inform. Sci., 179 (2009), 2208-2217. doi: 10.1016/j.ins.2009.02.014
[23]
J. Huang, Y. Cai, X. Xu, A hybrid genetic algorithm for feature selection wrapper based on mutual information, Pattern Recogn. Lett., 28 (2007), 1825-1844. doi: 10.1016/j.patrec.2007.05.011
[24]
C. Tang, X. Liu, X. Zhu, J. Xiong, M. Li, J. Xia, et al., Feature selective projection with low-rank embedding and dual laplacian regularization, IEEE Trans. Knowl. Data. Eng., 32 (2019), 1747-1760.
[25]
C. Tang, M. Bian, X. Liu, M. Li, H. Zhou, P. Wang, et al., Unsupervised feature selection via latent representation learning and manifold regularization, Neural Networks, 117 (2019), 163-178. doi: 10.1016/j.neunet.2019.04.015
[26]
S. Sharifzadeh, L. Clemmensen, C. Borggaard, S. Støier, B. K. Ersbøll, Supervised feature selection for linear and non-linear regression of L*a*b color from multispectral images of meat, Eng. Appl. Artif. Intel., 27 (2013), 211-227.
[27]
C. Tang, X. Zheng, X. Liu, L. Wang, Cross-view locality preserved diversity and consensus learning for multi-view unsupervised feature selection, IEEE Trans. Knowl. Data. Eng., 2021.
[28]
C. Tang, X. Zhu, X. Liu, L. Wang, Cross-View local structure preserved diversity and consensus learning for multi-view unsupervised feature selection, in Proceedings of the AAAI Conference on Artificial Intelligence, 33 (2019), 5101-5108.
[29]
M. Dorigo, Optimization, Learning and Natural Algorithms, Phd Thesis Politecnico Di Milano, 1992.
[30]
C. Lai, M. J. T. Reinders, L. Wessels, Random subspace method for multivariate feature selection, Pattern Recogn. Lett., 27 (2006), 1067-1076. doi: 10.1016/j.patrec.2005.12.018
[31]
B. Xue, M. Zhang, W. N. Browne, X. Yao, A survey on evolutionary computation approaches to feature selection, IEEETrans. Evol. Comput., 20 (2016), 606-626. doi: 10.1109/TEVC.2015.2504420
[32]
J. J. Grefenstette, Optimization of control parameters for genetic algorithms, IEEETrans.Syst. Man Cybern., 16 (1986), 122-128. doi: 10.1109/TSMC.1986.289288
[33]
B. G. Obaiahnahatti, J. Kennedy, A new optimizer using particle swarm theory, in MHS'95. Proceedings of the Sixth International Symposium on Micro Machine and Human Science, IEEE, (1995), 39-43.
[34]
K. Chen, F. Y. Zhou, X. F. Yuan, Hybrid particle swarm optimization with spiral-shaped mechanism for feature selection, Expert Syst. Appl., 128 (2019), 140-156. doi: 10.1016/j.eswa.2019.03.039
[35]
S. Mirjalili, A. Lewis, The whale optimization algorithm, Adv. Eng. Softw., 95 (2016), 51-67. doi: 10.1016/j.advengsoft.2016.01.008
[36]
S. Mirjalili, S.M. Mirjalili, A. Lewis, Grey wolf optimizer, Adv. Eng. Softw., 69 (2014), 46-61. doi: 10.1016/j.advengsoft.2013.12.007
[37]
S. Shahrzad, S. Mirjalili, A. Lewis, Grasshopper optimisation algorithm: Theory and application, Adv. Eng. Softw., 105 (2017), 30-47. doi: 10.1016/j.advengsoft.2017.01.004
[38]
E. Rashedi, H. Nezamabadi-Pour, S. Saryazdi, GSA: a gravitational search algorithm, Inform. Sci., 179 (2009), 2232-2248. doi: 10.1016/j.ins.2009.03.004
[39]
S. Li, H. Chen, M. Wang, A. A. Heidari, S. Mirjalili, Slime mould algorithm: A new method for stochastic optimization, Future Gener. Comput. Syst., 111 (2020), 300-323. doi: 10.1016/j.future.2020.03.055
[40]
Y. Yang, H. Chen, A. A. Heidari, A. H. Gandomi, Hunger games search: Visions, conception, implementation, deep analysis, perspectives, and towards performance shifts, Expert Syst. Appl., 171 (2021), 114864.
[41]
I. Ahmadianfar, O. Bozorg-Haddad, X. Chu, Gradient-based optimizer: A new metaheuristic optimization algorithm, Inform. Sci., 540 (2020), 131-159. doi: 10.1016/j.ins.2020.06.037
[42]
S. Mirjalili, Dragonfly algorithm: a new meta-heuristic optimization technique for solving single-objective, discrete, and multi-objective problems, Neural Comput. Appl., 27 (2016), 1053-1073. doi: 10.1007/s00521-015-1920-1
[43]
T. J. Ypma, Historical development of the Newton-Raphson method, SIAM Rev., 37 (1995), 531-551. doi: 10.1137/1037125
[44]
S. Mirjalili, A. Lewis, S-shaped versus V-shaped transfer functions for binary particle swarm optimization, Swarm Evol. Comput., 9 (2013), 1-14. doi: 10.1016/j.swevo.2012.09.002
[45]
H. Liu, J. Li, L. Wong, A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns, Genome Inform., 13 (2002), 51-60.
[46]
M. Dash, H. Liu, Feature selection for classification, Intell. Data Anal., 1 (1997), 131-156. doi: 10.3233/IDA-1997-1302
[47]
W. Siedlecki, J. Sklansky, A note on genetic algorithms for large-scale feature selection, Pattern Recogn. Lett., 10 (1989), 335-347. doi: 10.1016/0167-8655(89)90037-8
[48]
R. Leardi, R. Boggia, M. Terrile, Genetic algorithms as a strategy for feature selection, J. Chemom., 6 (1992), 267-281. doi: 10.1002/cem.1180060506
[49]
I. S. Oh, J. S. Lee, B. R. Moon, Hybrid genetic algorithms for feature selection, IEEE Trans. Pattern Anal., 26 (2004), 1424-1437. doi: 10.1109/TPAMI.2004.105
[50]
R. Storn, K. Price, Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces, J. Global Optim., 11 (1997), 341-359. doi: 10.1023/A:1008202821328
[51]
B. Xue, W. Fu, M. Zhang, Differential evolution (DE) for multi-objective feature selection in classification, in Proceedings of the Companion Publication of the 2014 Annual Conference on Genetic and Evolutionary Computation, (2014), 83-84.
[52]
D. Karaboga, B. Akay, A comparative study of artificial bee colony algorithm, Appl. Math. Comput., 214 (2009), 108-132.
[53]
A. A. Heidari, S. Mirjalili, H. Faris, I. Aljarah, M. Mafarja, H. Chen, Harris hawks optimization: Algorithm and applications, Future Gener. Comput. Syst., 97 (2019), 849-872. doi: 10.1016/j.future.2019.02.028
[54]
A. Faramarzi, M. Heidarinejad, S. Mirjalili, A. H. Gandomi, Marine predators algorithm: A nature-inspired metaheuristic, Expert Syst. Appl., 152 (2020), 113377. doi: 10.1016/j.eswa.2020.113377
[55]
O. S. Qasim, Z. Algamal, Feature selection using particle swarm optimization-based logistic regression model, Chemom. Intell. Lab. Syst., 182 (2018), 41-46. doi: 10.1016/j.chemolab.2018.08.016
[56]
K. Chen, F. Y. Zhou, X. F. Yuan, Hybrid particle swarm optimization with spiral-shaped mechanism for feature selection, Expert Syst. Appl., 128 (2019), 140-156. doi: 10.1016/j.eswa.2019.03.039
[57]
B. Xue, M. Zhang, W. N. Browne, Particle swarm optimization for feature selection in classification: a multi-objective approach, IEEE Trans. Cybern., 43 (2013), 1656-1671. doi: 10.1109/TSMCB.2012.2227469
[58]
E. Emary, H. M. Zawbaa, A. E. Hassanien, Binary grey wolf optimization approaches for feature selection, Neurocomputing, 172 (2016), 371-381. doi: 10.1016/j.neucom.2015.06.083
[59]
P. Hu, J. S. Pan, S. C. Chu, Improved binary grey wolf optimizer and its application for feature selection, Knowl.-Based Syst., 195 (2020), 105746. doi: 10.1016/j.knosys.2020.105746
[60]
T. Qiang, X. Chen, X. Liu, Multi-strategy ensemble grey wolf optimizer and its application to feature selection, Appl. Soft Comput., 76 (2019), 16-30. doi: 10.1016/j.asoc.2018.11.047
[61]
M. M. Mafarja, S. Mirjalili, Whale optimization approaches for wrapper feature selection, Appl. Soft Comput., 62 (2018), 441-453. doi: 10.1016/j.asoc.2017.11.006
[62]
M. M. Mafarja, S. Mirjalili, Hybrid whale optimization algorithm with simulated annealing for feature selection, Neurocomputing, 260 (2017), 302-312. doi: 10.1016/j.neucom.2017.04.053
[63]
R. K. Agrawal, B. Kaur, S. Sharma, Quantum based whale optimization algorithm for wrapper feature selection, Appl. Soft Comput., 89 (2020), 106092. doi: 10.1016/j.asoc.2020.106092
[64]
C. R. Hwang, Simulated annealing: Theory and applications, Acta. Appl. Math., 12 (1988), 108-111.
[65]
H. Shareef, A. A. Ibrahim, A. H. Mutlag, Lightning search algorithm, Appl. Soft Comput., 36 (2015), 315-333. doi: 10.1016/j.asoc.2015.07.028
[66]
S. Mirjalili, S. M. Mirjalili, A. Hatamlou, Multi-verse optimizer: a nature-inspired algorithm for global optimization, Neural Comput. Appl., 27 (2016), 495-513. doi: 10.1007/s00521-015-1870-7
[67]
H. Abedinpourshotorban, S. M. Shamsuddin, Z. Beheshti, D. N. A. Jawawib, Electromagnetic field optimization: A physics-inspired metaheuristic optimization algorithm, Swarm Evol. Comput., 26 (2016), 8-22. doi: 10.1016/j.swevo.2015.07.002
[68]
A. Y. S. Lam, V. O. K. Li, Chemical-reaction-inspired metaheuristic for optimization, IEEE Trans. Evol. Comput., 14 (2009), 381-399.
[69]
F. A. Hashim, E. H. Houssein, M. S. Mabrouk, W. Al-Atabany, S. Mirjalili, Henry gas solubility optimization: A novel physics-based algorithm, Future Gener. Comp. Syst., 101 (2019), 646-667. doi: 10.1016/j.future.2019.07.015
[70]
R. Meiri, J. Zahavi, Using simulated annealing to optimize the feature selection problem in marketing applications, Eur. J. Oper. Res., 171 (2006), 842-858. doi: 10.1016/j.ejor.2004.09.010
[71]
S. W. Lin, Z. J. Lee, S. C. Chen, T. Y. Tseng, Parameter determination of support vector machine and feature selection using simulated annealing approach, Appl. Soft Comput., 8 (2008), 1505-1512. doi: 10.1016/j.asoc.2007.10.012
[72]
E. Rashedi, H. Nezamabadi-Pour, S. Saryazdi, BGSA: Binary gravitational search algorithm, Nat. Comput., 9 (2010), 727-745. doi: 10.1007/s11047-009-9175-3
[73]
S. Nagpal, S. Arora, S. Dey, Feature selection using gravitational search algorithm for biomedical data, Procedia Comput. Sci., 115 (2017), 258-265. doi: 10.1016/j.procs.2017.09.133
[74]
P. C. S. Rao, A. J. S. Kumar, Q. Niyaz, P. Sidike, V. K. Devabhaktuni, Binary chemical reaction optimization based feature selection techniques for machine learning classification problems, Expert Syst. Appl., 167 (2021), 114169. doi: 10.1016/j.eswa.2020.114169
[75]
N. Neggaz, E. H. Houssein, K. Hussain, An efficient henry gas solubility optimization for feature selection, Expert Syst. Appl., 152 (2020), 113364. doi: 10.1016/j.eswa.2020.113364
[76]
R. V. Rao, V. J. Savsani, D. P. Vakharia, Teaching-learning-based optimization: a novel method for constrained mechanical design optimization problems. Comput. Aided Design, 43 (2011), 303-315. doi: 10.1016/j.cad.2010.12.015
[77]
S. Hosseini, A. A. Khaled, A survey on the imperialist competitive algorithm metaheuristic: implementation in engineering domain and directions for future research, Appl. Soft Comput., 24 (2014), 1078-1094. doi: 10.1016/j.asoc.2014.08.024
[78]
R. Moghdani, K. Salimifard, Volleyball premier league algorithm, Appl. Soft Comput., 64 (2017), 161-185.
[79]
H. C. Kuo, C. H. Lin, Cultural evolution algorithm for global optimizations and its applications, J. Appl. Res. Technol., 11 (2013), 510-522. doi: 10.1016/S1665-6423(13)71558-X
[80]
M. Allam, M. Nandhini, Optimal feature selection using binary teaching learning based optimization algorithm, J. King Saud Univ.-Comput. Inform. Sci., 10 (2018).
[81]
S. J. Mousavirad, H. Ebrahimpour-Komleh, Feature selection using modified imperialist competitive algorithm, in ICCKE 2013, IEEE, (2013), 400-405.
[82]
A. Keramati, M. Hosseini, M. Darzi, A. A. Liaei, Cultural algorithm for feature selection, in The 3rd International Conference on Data Mining and Intelligent Information Technology Applications, IEEE, (2011), 71-76.
[83]
D. H. Wolpert, W. G. Macready, No free lunch theorems for optimization, IEEE Trans. Evol. Comput., 1 (1997), 67-82. doi: 10.1109/4235.585893
[84]
A. Fink, S. Vo, Solving the continuous flow-shop scheduling problem by metaheuristics, Eur. J. Oper. Res., 151 (2003), 400-414. doi: 10.1016/S0377-2217(02)00834-2
[85]
J. Kennedy, R. C. Eberhart, A discrete binary version of the particle swarm algorithm, in 1997 IEEE International conference on systems, man, and cybernetics. Computational cybernetics and simulation, IEEE, 5 (1997), 4104-4108.
[86]
E. Rashedi, H. Nezamabadi-Pour, S. Saryazdi, BGSA: binary gravitational search algorithm, Nat. Comput., 9 (2010), 727-745. doi: 10.1007/s11047-009-9175-3
[87]
N. S. Altman, An introduction to kernel and nearest-neighbor nonparametric regression, Am. Stat., 46 (1992), 175-185.
[88]
F. Pernkopf, Bayesian network classifiers versus selective k-NN classifier, Pattern Recogn., 38 (2005), 1-10. doi: 10.1016/j.patcog.2004.05.012
[89]
A. Asuncion, D. Newman, UCI Machine Learning Repository, University of California, 2007.
[90]
E. Emary, H. M. Zawbaa, A. E. Hassanien, Binary ant lion approaches for feature selection, Neurocomputing, 213 (2016), 54-65. doi: 10.1016/j.neucom.2016.03.101
A. I. Hammouri, M. Mafarja, M. A. Al-Betar, M. A. Awadallah, I. Abu-Doush, An improved Dragonfly Algorithm for feature selection, Knowl.-Based Syst., 203 (2020), 106131. doi: 10.1016/j.knosys.2020.106131
[93]
H. Faris, M. M. Mafarja, A. A. Heidari, I. Aljarah, A. M. Al-Zoubi, S. Mirjalili, et al., An Efficient Binary Salp Swarm Algorithm with Crossover Scheme for Feature Selection Problems, Knowl.-Based Syst., 154 (2018), 43-67. doi: 10.1016/j.knosys.2018.05.009
[94]
M. Mafarja, S.Mirjalili, Whale optimization approaches for wrapper feature selection, Appl. Soft Comput., 62 (2018), 441-453. doi: 10.1016/j.asoc.2017.11.006
[95]
M. Mafarja, I. Aljarah, H. Faris, A. I. Hammouri, A. M. Al-Zoubi, S. Mirjalili, Binary grasshopper optimisation algorithm approaches for feature selection problems, Expert Syst. Appl., 117 (2019), 267-286. doi: 10.1016/j.eswa.2018.09.015
[96]
E. Emary, H. M. Zawbaa, A. E. Hassanien, Binary grey wolf optimization approaches for feature selection, Neurocomputing, 172 (2016), 371-381. doi: 10.1016/j.neucom.2015.06.083
This article has been cited by:
1.
Mokhtar Said, Essam H. Houssein, Sanchari Deb, Amel A. Alhussan, Rania M. Ghoniem,
A Novel Gradient Based Optimizer for Solving Unit Commitment Problem,
2022,
10,
2169-3536,
18081,
10.1109/ACCESS.2022.3150857
2.
Laith Abualigah, Muhammad Alkhrabsheh,
Amended hybrid multi-verse optimizer with genetic algorithm for solving task scheduling problem in cloud computing,
2022,
78,
0920-8542,
740,
10.1007/s11227-021-03915-0
3.
Mohammad Sh. Daoud, Mohammad Shehab, Hani M. Al-Mimi, Laith Abualigah, Raed Abu Zitar, Mohd Khaled Yousef Shambour,
Gradient-Based Optimizer (GBO): A Review, Theory, Variants, and Applications,
2022,
1134-3060,
10.1007/s11831-022-09872-y
4.
Rebika Rai, Arunita Das, Krishna Gopal Dhal,
Nature-inspired optimization algorithms and their significance in multi-thresholding image segmentation: an inclusive review,
2022,
13,
1868-6478,
889,
10.1007/s12530-022-09425-5
5.
Laith Abualigah, Ali Diabat,
Chaotic binary reptile search algorithm and its feature selection applications,
2022,
1868-5137,
10.1007/s12652-022-04103-5
6.
Mohamed Abd Elaziz, Rolla Almodfer, Iman Ahmadianfar, Ibrahim Anwar Ibrahim, Mohammed Mudhsh, Laith Abualigah, Songfeng Lu, Ahmed A. Abd El-Latif, Dalia Yousri,
Static models for implementing photovoltaic panels characteristics under various environmental conditions using improved gradient-based optimizer,
2022,
52,
22131388,
102150,
10.1016/j.seta.2022.102150
7.
Bugra Alkan, Malarvizhi Kaniappan Chinnathai,
Performance Comparison of Recent Population-Based Metaheuristic Optimisation Algorithms in Mechanical Design Problems of Machinery Components,
2021,
9,
2075-1702,
341,
10.3390/machines9120341
Mohammad H. Nadimi-Shahraki, Shokooh Taghian, Seyedali Mirjalili, Laith Abualigah, Mohamed Abd Elaziz, Diego Oliva,
EWOA-OPF: Effective Whale Optimization Algorithm to Solve Optimal Power Flow Problem,
2021,
10,
2079-9292,
2975,
10.3390/electronics10232975
10.
Rana Muhammad Adnan, Reham R. Mostafa, Ahmed Elbeltagi, Zaher Mundher Yaseen, Shamsuddin Shahid, Ozgur Kisi,
Development of new machine learning model for streamflow prediction: case studies in Pakistan,
2022,
36,
1436-3240,
999,
10.1007/s00477-021-02111-z
Mohammad Khishe, Mokhtar Mohammadi, Ali Ramezani Varkani,
Underwater Backscatter Recognition Using Deep Fuzzy Extreme Convolutional Neural Network Optimized via Hunger Games Search,
2022,
1370-4621,
10.1007/s11063-022-11068-1
13.
Mona A. S. Ali, Fathimathul Rajeena P. P., Diaa Salama Abd Elminaam,
An Efficient Heap Based Optimizer Algorithm for Feature Selection,
2022,
10,
2227-7390,
2396,
10.3390/math10142396
14.
Ahmed A. Ewees, Zakariya Yahya Algamal, Laith Abualigah, Mohammed A. A. Al-qaness, Dalia Yousri, Rania M. Ghoniem, Mohamed Abd Elaziz,
A Cox Proportional-Hazards Model Based on an Improved Aquila Optimizer with Whale Optimization Algorithm Operators,
2022,
10,
2227-7390,
1273,
10.3390/math10081273
15.
Laith Abualigah, Ali Diabat, Davor Svetinovic, Mohamed Abd Elaziz,
Boosted Harris Hawks gravitational force algorithm for global optimization and industrial engineering problems,
2022,
0956-5515,
10.1007/s10845-022-01921-4
16.
Ahmed A. Ewees, Fatma H. Ismail, Ahmed T. Sahlol,
Gradient-based optimizer improved by Slime Mould Algorithm for global optimization and feature selection for diverse computation problems,
2023,
213,
09574174,
118872,
10.1016/j.eswa.2022.118872
17.
Saleh Masoud Abdallah Altbawi, Saifulnizam Bin Abdul Khalid, Ahmad Safawi Bin Mokhtar, Hussain Shareef, Nusrat Husain, Ashraf Yahya, Syed Aqeel Haider, Lubna Moin, Rayan Hamza Alsisi,
An Improved Gradient-Based Optimization Algorithm for Solving Complex Optimization Problems,
2023,
11,
2227-9717,
498,
10.3390/pr11020498
18.
Iman Ahmadianfar, Ali Asghar Heidari, Saeed Noshadian, Huiling Chen, Amir H Gandomi,
INFO: An efficient optimization algorithm based on weighted mean of vectors,
2022,
195,
09574174,
116516,
10.1016/j.eswa.2022.116516
19.
Hoda Zamani, Mohammad H. Nadimi-Shahraki, Amir H. Gandomi,
Starling murmuration optimizer: A novel bio-inspired algorithm for global and engineering optimization,
2022,
392,
00457825,
114616,
10.1016/j.cma.2022.114616
20.
Ghareeb Moustafa, Mostafa Elshahed, Ahmed R. Ginidi, Abdullah M. Shaheen, Hany S. E. Mansour,
A Gradient-Based Optimizer with a Crossover Operator for Distribution Static VAR Compensator (D-SVC) Sizing and Placement in Electrical Systems,
2023,
11,
2227-7390,
1077,
10.3390/math11051077
21.
Mohamed Abd Elaziz, Ahmed A. Ewees, Dalia Yousri, Laith Abualigah, Mohammed A. A. Al-qaness,
Modified marine predators algorithm for feature selection: case study metabolomics,
2022,
64,
0219-1377,
261,
10.1007/s10115-021-01641-w
22.
Laith Abualigah, Mohamed Abd Elaziz, Nima Khodadadi, Agostino Forestiero, Heming Jia, Amir H. Gandomi,
2022,
Chapter 19,
978-3-030-99078-7,
481,
10.1007/978-3-030-99079-4_19
23.
Lamiaa M. El Bakrawy, Nadjem Bailek, Laith Abualigah, Shabana Urooj, Abeer S. Desuky,
Feature Selection Based on Mud Ring Algorithm for Improving Survival Prediction of Children Undergoing Hematopoietic Stem-Cell Transplantation,
2022,
10,
2227-7390,
4197,
10.3390/math10224197
24.
Marcelo Becerra-Rozas, José Lemus-Romani, Felipe Cisternas-Caneo, Broderick Crawford, Ricardo Soto, Gino Astorga, Carlos Castro, José García,
Continuous Metaheuristics for Binary Optimization Problems: An Updated Systematic Literature Review,
2022,
11,
2227-7390,
129,
10.3390/math11010129
25.
Zahra Nassiri, Hesam Omranpour,
Learning the transfer function in binary metaheuristic algorithm for feature selection in classification problems,
2023,
35,
0941-0643,
1915,
10.1007/s00521-022-07869-z
26.
Laith Abualigah, Mohamed Abd Elaziz, Putra Sumari, Zong Woo Geem, Amir H. Gandomi,
Reptile Search Algorithm (RSA): A nature-inspired meta-heuristic optimizer,
2022,
191,
09574174,
116158,
10.1016/j.eswa.2021.116158
Xiaobing Yu, Wen Zhang,
A teaching-learning-based optimization algorithm with reinforcement learning to address wind farm layout optimization problem,
2024,
151,
15684946,
111135,
10.1016/j.asoc.2023.111135
29.
Honghao Li, Liangzhen Jiang, Kaixiang Yang, Shulin Shang, Mingxin Li, Zhibin Lv,
iNP_ESM: Neuropeptide Identification Based on Evolutionary Scale Modeling and Unified Representation Embedding Features,
2024,
25,
1422-0067,
7049,
10.3390/ijms25137049
30.
Kangjian Sun, Ju Huo, Heming Jia, Qi Liu, Jiaming Yang, Chen Cai,
Nonlinear optimization of optical camera multiparameter via triple integrated Gradient-based optimizer algorithm,
2024,
179,
00303992,
111294,
10.1016/j.optlastec.2024.111294
31.
Liuyan Feng, Yongquan Zhou, Qifang Luo,
Binary Hybrid Artificial Hummingbird with Flower Pollination Algorithm for Feature Selection in Parkinson’s Disease Diagnosis,
2024,
21,
1672-6529,
1003,
10.1007/s42235-023-00478-z
32.
Edjola Naka,
A Competitive Parkinson-Based Binary Volleyball Premier League Metaheuristic Algorithm for Feature Selection,
2023,
23,
1314-4081,
91,
10.2478/cait-2023-0038
33.
Megha Varshney, Pravesh Kumar, Musrrat Ali, Yonis Gulzar,
Using the Grey Wolf Aquila Synergistic Algorithm for Design Problems in Structural Engineering,
2024,
9,
2313-7673,
54,
10.3390/biomimetics9010054
34.
Dawei Yun, Bing Zheng, Haiwei Wu, Fengrun Gu, Jiaoli Zhou,
Improving the short-term prediction of dissolved carbon monoxide using a combination of Light GBM and meta-heuristic algorithms,
2024,
12,
22133437,
114043,
10.1016/j.jece.2024.114043
35.
Srividhya Ramanathan, M. Anto Bennet,
Optimized precoding for massive MU-MIMO systems with KLDA dimension reduction and RNN-crossover GBO algorithm,
2024,
86,
1018-4864,
363,
10.1007/s11235-024-01135-4
36.
Mohamed Abdel-Basset, Reda Mohamed, Ibrahim Alrashdi, Karam M. Sallam, Ibrahim A. Hameed,
Evolution-based energy-efficient data collection system for UAV-supported IoT: Differential evolution with population size optimization mechanism,
2024,
245,
09574174,
123082,
10.1016/j.eswa.2023.123082
37.
Ahmet Cevahir Cinar,
A comprehensive comparison of accuracy-based fitness functions of metaheuristics for feature selection,
2023,
27,
1432-7643,
8931,
10.1007/s00500-023-08414-3
38.
Zhentao Tang, Kaiyu Wang, Sichen Tao, Yuki Todo, Rong-Long Wang, Shangce Gao,
Hierarchical Manta Ray Foraging Optimization with Weighted Fitness-Distance Balance Selection,
2023,
16,
1875-6883,
10.1007/s44196-023-00289-4
39.
Mengjun Li, Qifang Luo, Yongquan Zhou,
BGOA-TVG: Binary Grasshopper Optimization Algorithm with Time-Varying Gaussian Transfer Functions for Feature Selection,
2024,
9,
2313-7673,
187,
10.3390/biomimetics9030187
40.
Ali M. El-Rifaie, Abdullah M. Shaheen, Mohamed A. Tolba, Idris H. Smaili, Ghareeb Moustafa, Ahmed R. Ginidi, Mostafa A. Elshahed,
Modified Gradient-Based Algorithm for Distributed Generation and Capacitors Integration in Radial Distribution Networks,
2023,
11,
2169-3536,
120899,
10.1109/ACCESS.2023.3326758
41.
Yue Zheng, Cheng Xing, Jie-Sheng Wang, Hao-Ming Song, Yin-Yin Bao, Xing-Yue Zhang,
An improved reptile search algorithm based on mathematical optimization accelerator and elementary functions,
2023,
45,
10641246,
4179,
10.3233/JIFS-223210
42.
Zhongyuan Gu, Xin Xiong, Chengye Yang, Miaocong Cao, Chun Xu,
Research on prediction of PPV in open pit mine used on intelligent hybrid model of extreme gradient boosting,
2024,
371,
03014797,
123248,
10.1016/j.jenvman.2024.123248
43.
Nima Khodadadi, Ehsan Khodadadi, Qasem Al-Tashi, El-Sayed M. El-Kenawy, Laith Abualigah, Said Jadid Abdulkadir, Alawi Alqushaibi, Seyedali Mirjalili,
BAOA: Binary Arithmetic Optimization Algorithm With K-Nearest Neighbor Classifier for Feature Selection,
2023,
11,
2169-3536,
94094,
10.1109/ACCESS.2023.3310429
44.
Oguz Emrah Turgut, Mert Sinan Turgut,
Chaotic gradient based optimizer for solving multidimensional unconstrained and constrained optimization problems,
2024,
17,
1864-5909,
1967,
10.1007/s12065-023-00876-6
45.
S. Umarani, N. Alangudi Balaji, K. Balakrishnan, Nageswara Guptha,
Binary northern goshawk optimization for feature selection on micro array cancer datasets,
2024,
15,
1868-6478,
1551,
10.1007/s12530-024-09580-x
46.
Shaik Mohammed Shareef, M. Venu Gopala Rao,
Development of Sparse Time Frequency Distribution Reconstruction Using a Gradient Slime Mould Renyie Entropy Shrinkage Model,
2024,
0278-081X,
10.1007/s00034-024-02927-4
47.
Yaoxian Ji, Chenglang Lu, Lei Liu, Ali Asghar Heidari, Chengwen Wu, Huiling Chen,
Advancing bankruptcy prediction: a study on an improved rime optimization algorithm and its application in feature selection,
2025,
1868-8071,
10.1007/s13042-024-02462-3
48.
Seyed Vahid Razavi-Termeh, Abolghasem Sadeghi-Niaraki, Farman Ali, Soo-Mi Choi,
Improving flood-prone areas mapping using geospatial artificial intelligence (GeoAI): A non-parametric algorithm enhanced by math-based metaheuristic algorithms,
2025,
375,
03014797,
124238,
10.1016/j.jenvman.2025.124238
49.
Tianbao Liu, Yang Li, Xiwen Qin,
Enhanced Gradient‐Based Optimizer Algorithm With Multi‐Strategy for Feature Selection,
2025,
37,
1532-0626,
10.1002/cpe.70034
50.
Shivankur Thapliyal, Narender Kumar,
Enhanced snow ablation optimizer using advanced quadratic and newton interpolation with taylor neighbourhood search and second-order differential perturbation strategies for high-dimensional feature selection,
2025,
18,
1864-5909,
10.1007/s12065-025-01030-0
51.
Rohit Salgotra, Supreet Singh, Pooja Verma, Laith Abualigah, Amir H Gandomi,
Mutation adaptive cuckoo search hybridized naked mole rat algorithm for industrial engineering problems,
2025,
15,
2045-2322,
10.1038/s41598-025-01033-y
1.Initialization 2. Assign values for parameters pr,εandM 3. Generate an initial population X0=[x0,1,x0,2,⋯,x0,D] 4. Evaluate the objective function value f(X0),n=1,2,..,N. 5. Specify the best and worst solutions Xmbest and Xmworst 6.While (m < M) 7.forn = 1 : N 8.fori = 1 : D 9. Select randomly r1≠r2≠r3≠r4≠n in the range of [1, N] 10. Calculate the position xm+1n,i using Eq (12) 11.end for 12.Local escaping operator 13.ifrand < pr 14. Calculate the position XmLEO using Eq (14) or Eq (15) 15.Xm+1n=xmLEO 16.end 17. Update the positionsXmbest and Xmworst 18.end for 19.m = m + 1 20.end 21. Return Xmbest
1. Initialization 2. Assign values for parameters pr,εandM 3. Generate an initial population X0=[x0,1,x0,2,⋯,x0,D] 4. Evaluate the objective function value f(X0),n=1,2,..,N 5. Specify the best and worst solutions Xmbest and Xmworst 6. while (m < M)
7. forn = 1 : N 8. fori = 1 : D 9. Select randomly r1≠r2≠r3≠r4≠n in the range of [1, N]
10. Calculate the position xm+1n,iusing Eq (12)
11. end for 12. Local escaping operator 13. ifr and < pr 14. Calculate the position XmLEO using Eq (14) or Eq (15)
15. Xm+1n=xmLEO 16. end 17. The probability of converting a population to a binary space 0 or 1. Using Eq (26) or Eq (28)
18. Compute the fitness of all populations 19. Update the positions Xmbest and Xmworst 20. end for 21. m = m + 1 22. end 23. Return Xmbest
Table 3.
Average accuracy results and Time according to different combinations of population size (n) and the number of max iteration on the PenglungEW dataset.
Table 4.
Average accuracy results, Time, Average fitness and Average number of feature according to different combinations of αandβon the PenglungEW dataset.
Table 6.
Comparison between different variations of BGBO on the mean of fitness values (as Mean) and the standard deviation of fitness values (as S.D).
Table 7.
Comparison between different variations of BGBO on the mean of accuracy values (M.Acc) and the standard deviation of accuracy values (as S.D).
Table 8.
Comparison between different variations of BGBO on the mean number of the selected features (M.NF) and the standard deviation of the mean number of the selected features (as S.D).
Table 10.
Comparison between BGBO_V3 and state-of-art methods based on the mean of fitness values (Mean) and the standard deviation of fitness values (as S.D).
Table 11.
Comparison between BGBO_V3 and state-of-art methods based on the mean of accuracy values (M.Acc) and the standard deviation of fitness values (as S.D).
Table 12.
Comparison between BGBO_V3 and state-of-art methods based on the mean number of the selected features (M.NF) and the standard deviation of the mean number of the selected features (asS.D).
Table 15.
Comparison between different variations of BGBO on the mean of fitness values (as Mean) and the standard deviation of fitness values (as S.D).
Table 16.
Comparison between BGBO_V3 and state-of-art methods based on the mean of accuracy values (M.Acc) and the standard deviation of fitness values (as S.D).
Table 17.
Comparison between BGBO_V3 and state-of-art methods based on the mean number of the selected features (M.NF) and the standard deviation of the mean number of the selected features (as S.D).
1.Initialization 2. Assign values for parameters pr,εandM 3. Generate an initial population X0=[x0,1,x0,2,⋯,x0,D] 4. Evaluate the objective function value f(X0),n=1,2,..,N. 5. Specify the best and worst solutions Xmbest and Xmworst 6.While (m < M) 7.forn = 1 : N 8.fori = 1 : D 9. Select randomly r1≠r2≠r3≠r4≠n in the range of [1, N] 10. Calculate the position xm+1n,i using Eq (12) 11.end for 12.Local escaping operator 13.ifrand < pr 14. Calculate the position XmLEO using Eq (14) or Eq (15) 15.Xm+1n=xmLEO 16.end 17. Update the positionsXmbest and Xmworst 18.end for 19.m = m + 1 20.end 21. Return Xmbest
S-shaped Transfer function
V-shaped Transfer function
Name
Mathematical formulation
Name
Mathematical formulation
S1
T(x)=11+e−2x
V1
T(x)=|erf(√π2x)|
S2
T(x)=11+e−x
V2
T(x)=|tanh(x)|
S3
T(x)=11+e(−x/2)
V3
T(x)=|(x)/√1+x2|
S4
T(x)=11+e(−x/3)
V4
T(x)=|2πarctan(π2x)|
Algorithm 2 Pseudo code of the BGBO algorithm
1. Initialization 2. Assign values for parameters pr,εandM 3. Generate an initial population X0=[x0,1,x0,2,⋯,x0,D] 4. Evaluate the objective function value f(X0),n=1,2,..,N 5. Specify the best and worst solutions Xmbest and Xmworst 6. while (m < M)
7. forn = 1 : N 8. fori = 1 : D 9. Select randomly r1≠r2≠r3≠r4≠n in the range of [1, N]
10. Calculate the position xm+1n,iusing Eq (12)
11. end for 12. Local escaping operator 13. ifr and < pr 14. Calculate the position XmLEO using Eq (14) or Eq (15)
15. Xm+1n=xmLEO 16. end 17. The probability of converting a population to a binary space 0 or 1. Using Eq (26) or Eq (28)
18. Compute the fitness of all populations 19. Update the positions Xmbest and Xmworst 20. end for 21. m = m + 1 22. end 23. Return Xmbest