
Citation: Xiang-Sheng Wang, Luoyi Zhong. Ebola outbreak in West Africa: real-time estimation and multiple-wave prediction[J]. Mathematical Biosciences and Engineering, 2015, 12(5): 1055-1063. doi: 10.3934/mbe.2015.12.1055
[1] | Chunmei He, Hongyu Kang, Tong Yao, Xiaorui Li . An effective classifier based on convolutional neural network and regularized extreme learning machine. Mathematical Biosciences and Engineering, 2019, 16(6): 8309-8321. doi: 10.3934/mbe.2019420 |
[2] | Mengya Zhang, Qing Wu, Zezhou Xu . Tuning extreme learning machine by an improved electromagnetism-like mechanism algorithm for classification problem. Mathematical Biosciences and Engineering, 2019, 16(5): 4692-4707. doi: 10.3934/mbe.2019235 |
[3] | Vinh Huy Chau . Powerlifting score prediction using a machine learning method. Mathematical Biosciences and Engineering, 2021, 18(2): 1040-1050. doi: 10.3934/mbe.2021056 |
[4] | Michele La Rocca, Cira Perna . Nonlinear autoregressive sieve bootstrap based on extreme learning machines. Mathematical Biosciences and Engineering, 2020, 17(1): 636-653. doi: 10.3934/mbe.2020033 |
[5] | Yunpeng Ma, Shilin Liu, Shan Gao, Chenheng Xu, Wenbo Guo . Optimizing boiler combustion parameters based on evolution teaching-learning-based optimization algorithm for reducing NOx emission concentration. Mathematical Biosciences and Engineering, 2023, 20(11): 20317-20344. doi: 10.3934/mbe.2023899 |
[6] | Ming Zhu, Kai Wu, Yuanzhen Zhou, Zeyu Wang, Junfeng Qiao, Yong Wang, Xing Fan, Yonghong Nong, Wenhua Zi . Prediction of cooling moisture content after cut tobacco drying process based on a particle swarm optimization-extreme learning machine algorithm. Mathematical Biosciences and Engineering, 2021, 18(3): 2496-2507. doi: 10.3934/mbe.2021127 |
[7] | Sungwon Kim, Meysam Alizamir, Youngmin Seo, Salim Heddam, Il-Moon Chung, Young-Oh Kim, Ozgur Kisi, Vijay P. Singh . Estimating the incubated river water quality indicator based on machine learning and deep learning paradigms: BOD5 Prediction. Mathematical Biosciences and Engineering, 2022, 19(12): 12744-12773. doi: 10.3934/mbe.2022595 |
[8] | Yufeng Qian . Exploration of machine algorithms based on deep learning model and feature extraction. Mathematical Biosciences and Engineering, 2021, 18(6): 7602-7618. doi: 10.3934/mbe.2021376 |
[9] | Wang Cai, Jianzhuang Wang, Longchao Cao, Gaoyang Mi, Leshi Shu, Qi Zhou, Ping Jiang . Predicting the weld width from high-speed successive images of the weld zone using different machine learning algorithms during laser welding. Mathematical Biosciences and Engineering, 2019, 16(5): 5595-5612. doi: 10.3934/mbe.2019278 |
[10] | Anastasia-Maria Leventi-Peetz, Kai Weber . Probabilistic machine learning for breast cancer classification. Mathematical Biosciences and Engineering, 2023, 20(1): 624-655. doi: 10.3934/mbe.2023029 |
Artificial Neural Networks (ANNs) simulate the process of human brain information processing through a large number of neurons that are interconnected in a certain way and efficient network learning algorithms [1]. Over the past few decades, ANNs have been widely used in various fields of human needs due to their powerful nonlinear mapping capabilities and parallel computing capabilities [2].
So far, many neural network learning algorithms have been proposed and improved. Jian et al. summarized well-known learning algorithms among them [3]. Among them, the backpropagation algorithm is one of the most mature neural network learning algorithms, and it is also a famous representative of all iterative gradient descent algorithms for supervised learning in neural networks. It was first proposed by Paul Werbos in the 1970s [4] and widely used after it was rediscovered by Rumelhart and McClelland in 1986 [5]. However, the BP algorithm is not perfect, and there are still inevitable defects, such as easy to fall into a local minimum during the training process, and the convergence speed is slow. In response to these shortcomings, researchers have proposed many methods to improve the backpropagation technique [6]. For example, genetic algorithm is used to optimize the connection weights of BP network [7], add white noise to the weighted sum of BP [8], add momentum term, regularization operator and Adaboost integration algorithm [9], and dynamically change the learning rate according to the change of mean square error [10]. However, these problems of the BP algorithm still restrict its development in many application fields, especially since the learning speed cannot meet the actual needs. Recently, Huang et al. proposed extreme learning machine (ELM), which is a simple and efficient learning algorithm for single hidden layer feedforward neural network (SLFN) [11]. The core idea of the ELM algorithm is that the input weights and hidden layer biases of the network are randomly selected and kept unchanged during the training process, and then the output weights are directly obtained by Moore-Penrose generalized inverse operation. The advantages of ELM are: only the number of hidden layer neurons needs to be optimized, with less human intervention; it avoids the process of iterative optimization of parameters in traditional SLFN training algorithm, which greatly saves training time; the resulting solution is the only optimal solution, which guarantees the generalization performance of the network [12]. Therefore, ELM has been widely used in many fields such as disease diagnosis [13,14,15], traffic sign recognition [16,17], and prediction [18,19].
In recent years, the significant advantages of ELM have attracted the attention of a large number of researchers in academia and industry, and the research on this algorithm and model has achieved fruitful results [20,21]. The learning process of the standard ELM algorithm can be considered to be based on empirical risk minimization, which tends to produce overfitted models. In addition, since ELM does not consider heteroskedasticity in practical applications, its generalization ability and robustness will be greatly affected when there are many outliers in the training samples. To effectively overcome the above problems, regularization methods are applied to ELM [22,23,24]. Lu et al. proposed a probabilistic RELM method to reduce the influence of noise data and outliers on the model [25]. Yıldırım and Revan Özkale combined the advantages of Liu estimator and Lasso regression method to deal with the shortcomings of traditional ELM instability and poor generalization [26]. Huang et al. [27] introduced the kernel function into ELM and proposed a general framework-KELM that can be used for regression, binary classification and multi-classification problems, which effectively improved the problem of generalization and stability degradation caused by random parameters. However, kernel selection is an important part of KELM, and it may not always be appropriate to choose empirically. Liu and Wang [28] proposed a multiple kernel extreme learning machine (MK-ELM) to solve this problem. However, MK-ELM cannot effectively handle large-scale datasets, because it needs to optimize more parameters, resulting in high computational complexity of the algorithm. To meet the needs of online real-time applications, online sequential extreme learning machine algorithms (OS-ELM) have been proposed [29], and OS-ELM was improved [30,31,32]. Online sequential class-specific extreme learning machine (OSCSELM) supports online learning techniques of both chunk-by-chunk and one-by-one learning modes, which is used to solve the class imbalance problem of small and large data sets [33]. Lu et al. used the OS-ELM training method of Kalman filter to improve its stability [34]. OS-ELM was extended to solve the problem of increasing classes [35].
Before the training starts, the user needs to specify the number of hidden layer neurons for ELM. However, how to choose the appropriate number of hidden layer neurons for different applications has always been a difficult and hot topic in the field of neural network research. At present, the methods used by ELM to adjust the hidden layer structure mainly include swarm intelligence optimization [36,37,38,39,40], incremental method [41,42,43,44,45], pruning [46,47,48,49,50] and adaptive [51,52,53]. With the advent of the era of big data, storing and processing large-scale data has become an urgent need for enterprises, and the ensemble and parallelism of ELM have therefore become a research hotspot [54,55,56,57]. Lam and Wunsch learn features through unsupervised feature learning (UFL) algorithm, and then train features through fast radial basis function (RBF) extreme learning machine (ELM), which improves the accuracy and speed of the algorithm [58]. Yao and Ge proposed distributed parallel extreme learning machine (dp-ELM) and hierarchical extreme learning machine [59]. Duan et al. proposed an efficient ELM with three parallel sub-algorithms based on the Spark framework (SELM) for big data classification. [60]. Many researchers have turned their attention to deep ELM and conducted some innovative research works [61,62]. Dai et al. proposed multilayer one-class extreme learning machine (OC-ELM) [63]. Zhang et al. proposed multi-layer extreme learning machine (ML-ELM) [64]. Yahia et al. proposed a new structure based on extreme learning machine auto-encoder with deep learning structure and a composite wavelet activation function for hidden nodes [65].
The inappropriate initial parameters of the hidden layer (input weights, hidden layer biases and the number of nodes) in the original ELM will lead to poor classification results of ELM [21], although the improved algorithms mentioned above for the original ELM improve its generalization performance, they greatly increase the computational complexity. Therefore, we need a network learning algorithm with fast learning speed and higher generalization performance.
In this paper, we propose a new regression and classification model without iterative optimization parameters in the spirit of extreme learning, called functional extreme learning machine (FELM). FELM aims to use functional neurons (FNs) model as the basic units, and use functional equation-solving theory to guide the modeling process of functional extreme learning machine [66,67,68,69]. Like ELM, the FELM parameter matrix is obtained by solving the generalized inverse of the hidden layer neuron output matrix. FELM is a generalization of ELM. Its unique network structure and simple and efficient learning algorithm make it not only solve the problems that ELM can solve, but also solve many problems that ELM cannot solve. However, FELM is also different from ELM. The activation function of ELM is fixed, and ELM has weights and biases. The neuron function of FELM is not fixed, and there are no weights and biases, only parameters (coefficients), so it avoids the influence of random parameters (input weights, hidden layer biases) on the generalization performance and stability of ELM model. Its neuron functions are linear combinations of given basic functions, and the basic functions are selected according to the problem to be solved without specifying the number. The learning essence of FELM is the learning of parameters, and the parameter learning algorithm proposed in this paper does not need iterative calculation and has high accuracy. FELM is compared with other popular technologies in terms of generalization performance and training time on several artificial datasets, benchmark regression and classification datasets. The results show that FELM is not only fast, but also has good generalization performance.
The rest of this paper is organized as follows: Section 2 provides an overview of FN and ELM. In Section 3, the topology of functional extreme learning machine, the theory of structural simplification, and the parameter learning algorithm are described. Section 4 presents the performance comparison results of FELM, classical ELM, OP-ELM, classical SVM and LSSVM on regression and classification problems. Section 5 draws conclusions and discusses future research directions.
Functional neuron model and extreme learning machine (ELM) will be briefly discussed in the following section.
Functional neuron was proposed by Enrique Castillo [66]. Figure 1(a) is a functional neuron model, Figure 1(b) is its expansion model and Figure 1(c) is the expanded model for the green part of Figure 1(b). The mathematical expression of the functional neuron is:
$ O = f\left(X\right) $ | (1) |
where, $ X = \{{x}_{1}, {x}_{2}, ..., {x}_{m}\} $, $ O = \{{o}_{1}, {o}_{2}, ..., {o}_{m}\} $, $ f(\cdot) $ is functional neuron function, $ X $ and $ O $ are the input and output of functional neuron function, respectively. Functional neuron function can be expressed by a linear combination of basic functions:
$ f\left(X\right) = {\sum }_{j = 1}^{n}{a}_{ij}{\varphi }_{ij}\left(X\right) = {a}_{i}^{T}{\varphi }_{i}\left(X\right) $ | (2) |
where, $ \left\{{\varphi }_{i}\right(X) = ({\varphi }_{i1}\left(X\right), {\varphi }_{i2}\left(X\right), ..., {\varphi }_{in}\left(X\right)\left)\right|i = \mathrm{1, 2}, ..., m\} $ is any given basic function family, and different function families can be selected according to specific problems and data. $ {\varphi }_{1}\left(X\right), ..., {\varphi }_{m}\left(X\right) $ are pairwise independent basic function families. Commonly used basic functions are the trigonometric function family and the Fourier family. $ \{{a}_{i} = ({a}_{i1}, {a}_{in}, ..., {a}_{in}{)}^{T}|i = \mathrm{1, 2}, ..., m\} $ is a learnable set of parameters.
Based on the generalized inverse matrix theory, Huang et al. proposed a new type of single hidden layer feedforward neural network algorithm with excellent performance-extreme learning machine (ELM) [11]. The extreme learning machine network structure is shown in Figure 2.
Given $ N $ different training samples $ \left\{{x}_{i}, {t}_{i}\mid {x}_{i}\in {R}^{D}, {t}_{i}\in {R}^{m}, i = \mathrm{1, 2}, \dots, N\right\} $, $ {x}_{i} = {\left[{x}_{i1}, {x}_{i2}, \cdots, {x}_{iD}\right]}^{T} $ as the input vector, $ {t}_{i} = \left[{t}_{i1}, \right.{\left.{t}_{i2}, \cdots, {t}_{im}\right]}^{T} $ as the corresponding expected output. $ g\left(x\right) $ is an activation function, which is a nonlinear piecewise continuous function that satisfies the ELM general approximation ability theorem. The commonly used functions are Sigmoid function, Gaussian function, etc. So the mathematical model in Figure 2 is expressed as follows:
$ H\beta = T $ | (3) |
where, $ H = {\left[\begin{array}{c}\begin{array}{cc}\begin{array}{cc}\begin{array}{c}{h}_{1}\left({x}_{1}\right)\\ \vdots \end{array}
In ELM, $ H $ is called a random feature mapping matrix, $ {\omega }_{i} = [{\omega }_{i1}, {\omega }_{i2}, ..., {\omega }_{iD}] $ represents the input weight that connects the $ i $th hidden layer neuron and the input layer neuron, $ {b}_{i} $ represents the bias of the $ i $th hidden layer neuron, and $ \beta = {\left[{\beta }_{1}, \dots, {\beta }_{L}\right]}^{T} $ represents the weight matrix between the output layer and the hidden layer. Hidden layer node parameters $ ({\omega }_{i}, {b}_{i}) $ are randomly generated and remain unchanged.
Calculate the output weight:
$ \beta = {H}^{+}T $ | (4) |
where, $ {H}^{+} $ represents the Moore-Penrose generalized inverse of the hidden layer output matrix $ H $.
According to the example of functional extreme learning machine in Figure 3(a), it can be seen that a functional extreme learning machine network consists of the following elements:
1) Several layers of storage units: One layer of input units ($ \{{x}_{1}, {x}_{2}, {x}_{3}\} $), one layer of output storing units ($ \left\{d\right\} $). Several intermediate storage units ($ \left\{G\right({x}_{1}, {x}_{2}), {x}_{3}\} $), they are used to store intermediate information produced by functional neurons. Storage units are represented by solid circles with corresponding names.
2) One or more layers of processing units (i.e., functional neurons): Each functional neuron is a computing unit, which processes the input values from input units or the previous layer of functional neurons, and provides input data to the next layer of neurons or output units. As in Figure 3(a) $ \{G, I, F\} $.
3) A set of directed links: They connect the input units to the first layer of processing units, one layer of processing units to the next layer of processing units, and the last layer of computing units to the output units. The arrows indicate the direction in which information flows. Information flows only from the input layer to the output layer.
All these elements together constitute the network architecture of the functional extreme learning machine (FELM). The network architecture corresponds to the functional equation one by one. The functional equation is the key to the FELM learning process. Therefore, the network structure is determined, and the generalization ability of the FELM is also defined.
Note the following differences between standard neural networks and FELM networks:
1) The functional neuron as shown in Figure 1(a) is the basic computing unit of FELM. It is different from the M-P neuron (Figure 4), which has no weights $ \left\{{w}_{k}\right\} $ and biases, only parameters, and can have multiple outputs $ \{{O}_{1}, {O}_{2}, ..., {O}_{3}\} $.
2) In standard neural networks, the functions are given and the weights must be learned. In FELM networks, the functional neuron functions can be linear combinations of any nonlinear correlation basic function families $ f({x}_{1}, {x}_{2}, \dots, {x}_{k}) = {\sum }_{j = 1}^{n}{a}_{j}{\varphi }_{j}({x}_{1}, {x}_{2}, \dots, {x}_{k}) $. Where, $ \left\{{\varphi }_{j}\right({x}_{1}, {x}_{2}, ..., {x}_{k}\left)\right|j = \mathrm{1, 2}, ..., n\} $ is a given appropriate basic function family, which means that FELM can choose different basic function families for functional neurons depending on the specific problem and data. $ n $ represents the number of basic functions. $ \left\{{a}_{j}\right|j = \mathrm{1, 2}, ..., n\} $ is the learnable parameter set. The following are some commonly used function families: Polynomial family $ \{1, x, {x}^{2}, ..., {x}^{m}\} $, Fourier family $ \{1, \mathit{sin}(x), \mathit{cos}(x), ..., , \mathit{sin}(mx), \mathit{cos}(mx\left)\right\} $ and exponential family $ \{1, {e}^{x}, {e}^{-x}, ..., {e}^{mx}, {e}^{-mx}\} $.
Structural simplification: Each initial network structure corresponds to a functional equation set, then the functional equation set solution method is used to simplify the initial structure to obtain an equivalent FELM which is optimal. The functional equation corresponding to Figure 3(a) is:
$ d = F\left(G\right({x}_{1}, {x}_{2}), {x}_{3}) $ | (5) |
Theorem 1 of [66]: The general solution is continuous on a real rectangle of the functional equation $ F\left[G\right(x, y), z] = K[x, N(y, z\left)\right] $ and is $ G $ invertible in two variables. For a fixed value of the second variable, $ F $ is invertible in the first variable. For a fixed value of the first variable, $ K $ and $ N $ are invertible in the second variable:
$ \begin{array}{c}\begin{array}{c}F\left(x, y\right) = k\left[f\left(x\right)+g\left(y\right)\right], G\left(x, y\right) = {f}^{-1}\left[p\left(x\right)+q\left(y\right)\right], \\ K\left(x, y\right) = k\left[p\left(x\right)+n\left(y\right)\right], N\left(x, y\right) = {n}^{-1}\left[q\left(x\right)+g\left(y\right)\right], \end{array} \begin{array}{c}\begin{array}{c}F\left(x, y\right) = k\left[f\left(x\right)+g\left(y\right)\right], G\left(x, y\right) = {f}^{-1}\left[p\left(x\right)+q\left(y\right)\right], \\ K\left(x, y\right) = k\left[p\left(x\right)+n\left(y\right)\right], N\left(x, y\right) = {n}^{-1}\left[q\left(x\right)+g\left(y\right)\right], \end{array} \end{array} $
|
(6) |
where, $ f, k, n, p, q $ and $ g $ are arbitrary continuous and strictly monotonic functions. Therefore, according to Theorem 1 of [66], the general solution of functional Eq (5) is Eq (7):
$ F({x}_{1}, {x}_{2}) = k\left[f\right({x}_{1})+r({x}_{2}\left)\right], G({x}_{1}, {x}_{2}) = {f}^{-1}\left[p\right({x}_{1})+q({x}_{2}\left)\right] $ | (7) |
According to Eq (7), Eq (5) can be written as:
$ d = F\left(G\right({x}_{1}, {x}_{2}), {x}_{3}) = k\left[p\right({x}_{1})+q({x}_{2})+r({x}_{3}\left)\right] $ | (8) |
According to Eq (8), the corresponding topology structure can be drawn, as shown in Figure 3(b). Figure 3(b), (a) are equivalent, indicating that they get the same output when they have the same input. In the initial structure of FELM, the functional neuron function is multi-parameter. In the simplified FELM structure, the functional neuron function is a single parameter.
Expression uniqueness of FELM: After structural simplification, the functional equation corresponding to the simplified functional network is $ d = k\left[p\right({x}_{1})+q({x}_{2})+r({x}_{3}\left)\right] $, but whether the expression of the functional equation is unique needs to be verified. The following is the verification process, assuming that there are two functional neuron function sets $ \{{k}_{1}, {p}_{1}, {q}_{1}, {r}_{1}\} $ and $ \{{k}_{2}, {p}_{2}, {q}_{2}, {r}_{2}\} $, such that :
$ {k}_{1}\left[{p}_{1}\right({x}_{1})+{q}_{1}({x}_{2})+{r}_{1}({x}_{3}\left)\right] = {k}_{2}\left[{p}_{2}\right({x}_{1})+{q}_{2}({x}_{2})+{r}_{2}({x}_{3}\left)\right].\forall {x}_{1}, {x}_{2}, {x}_{3} $ | (9) |
Let $ {k}_{2}\left(v\right) = {k}_{1}\left(\frac{v-b-c-d}{a}\right) $, then $ v = {p}_{2}\left({x}_{1}\right)+{q}_{2}\left({x}_{2}\right)+{r}_{2}\left({x}_{3}\right) $, $ \frac{v-b-c-d}{a} = {p}_{1}\left({x}_{1}\right)+{q}_{1}\left({x}_{2}\right)+{r}_{1}\left({x}_{3}\right) $, and $ v = a{p}_{1}\left({x}_{1}\right)+a{q}_{1}\left({x}_{2}\right)+a{r}_{1}\left({x}_{3}\right)+b+c+d $. So the solution of the functional equation is:
$ {p}_{2}\left({x}_{1}\right) = a{p}_{1}\left({x}_{1}\right)+b;\text{}{q}_{2}\left({x}_{2}\right) = a{q}_{1}\left({x}_{2}\right)+c;\text{}{r}_{2}\left({x}_{3}\right) = a{r}_{1}\left({x}_{3}\right)+d $ | (10) |
where $ a, b, c, d $ are arbitrary constants. Because any values $ (a, b, c, d) $, Eq (10) into Eq (7), will get the following result:
$ F({x}_{1}, {x}_{2}) = {k}_{2}\left[a{p}_{1}\right({x}_{1})+a{q}_{1}({x}_{2})+a{r}_{1}({x}_{3})+b+c+d] =\\{k}_{1}\left[{p}_{1}\right({x}_{1})+{q}_{1}({x}_{2})+{r}_{1}({x}_{3}\left)\right] $ | (11) |
Therefore, the expression of Eq (8) is unique.
The FELM in Figure 3(b) is taken as an example to illustrate its parameter learning process. Write Eq (8) as follows
$ {k}^{-1}\left({x}_{4}\right) = p\left({x}_{1}\right)+q\left({x}_{2}\right)+r\left({x}_{3}\right), $ | (12) |
where $ {x}_{4} $ represents $ d $.
Each neuron function is a linear combination of given nonlinear correlation basic functions, that is
$ p(x1)=∑m1j=1a1jφ1j(x1);q(x2)=∑m2j=1a2jφ2j(x2),r(x3)=∑m3j=1a3jφ3j(x3);k−1(x4)=∑m4j=1a4jφ4j(x4), $
|
(13) |
where $ {m}_{1} $, $ {m}_{2} $, $ {m}_{3} $ and $ {m}_{4} $ are the numbers of basic functions of $ p $, $ q $, $ r $ and $ k $ respectively, and $ {a}_{ij} $ is the parameter coefficient of FELM.
Let $ {a}_{i} = \left[{a}_{i1}, {a}_{i2}, \dots, {a}_{i{m}_{i}}\right], i = \mathrm{1, 2}, \mathrm{3, 4}.A = {\left[{a}_{1}, {a}_{2}, {a}_{3}, {a}_{4}\right]}^{T} $ is the parameter to be optimized. $ f{f}_{ir} = \left[{\varphi }_{i1}\left({x}_{ir}\right), {\varphi }_{i2}\left({x}_{ir}\right), \dots, {\varphi }_{i{m}_{i}}\left({x}_{ir}\right)\right],$$ F{f}_{r} = \left[f{f}_{1r}, f{f}_{2r}, f{f}_{3r}, f{f}_{4r}\right]; r = \mathrm{1, 2}, \dots, n.Ff = \left[F{f}_{1};F{f}_{2};\dots; F{f}_{n}\right]; $ $ B = {\left[{k}^{-1}\left({x}_{41}\right), {k}^{-1}\left({x}_{42}\right), \dots, {k}^{-1}\left({x}_{4n}\right)\right]}^{T} $, $ n $ is the number of observed samples. Then Eq (14) is obtained:
$ Ff•A = B $ | (14) |
The parameters of FELM can be obtained by Eq (15).
$ A = F{f}^{+}B $ | (15) |
where $ F{f}^{+} $ is the generalized inverse of $ Ff $.
The above example illustrates the process of model learning. The steps of constructing and simplifying the FELM network and then performing parameter learning are as follows:
Step 1: Based on the characteristics of the problem to be solved, the initial network model is established;
Step 2: Write the functional equation corresponding to the initial network model;
Step 3: Using the functional equation solving method to solve the functional equation and obtain the general solution expression;
Step 4: Based on the general solution expression, use its one-to-one correspondence with the FELM to redraw the corresponding FELM network (simplified FELM);
Step 5: The FELM learning algorithm is used to obtain the optimal parameters of the model.
In this section, on many benchmark practical problems in the field of function approximation and classification, the performance of the proposed FELM learning algorithm is compared with the commonly used network algorithms (ELM, OP-ELM, SVM, LS-SVM) on two artificial datasets, 20 different datasets (16 for regression, 4 for classification) and XOR classification problem to verify the effectiveness and superiority of FELM. Experimental environment description for FELM and comparison algorithms: 11th Gen Intel (R) Core (TM) i5-11320H @ 3.20 GHz, 16 GB RAM and MATLAB 2019b. ELM source code used in all experiments: http://www.ntu.edu.sg/home/egbhuang/, OP-ELM source code: https://research.cs.aalto.fi//aml/software.shtml, SVM source code: http://www.csie.ntu.edu.tw/cjlin/libsvm/, and the most popular LS-SVM implementation: http://www.esat.kuleuven.ac.be/sista/lssvmlab/. The sigmoidal activation function is used for ELM, the Gaussian kernel function is used for OP-ELM, and the radial basis function is used for SVM and LS-SVM. The basic functions of our proposed algorithm FELM will be set according to the following specific problems to be solved. In Section 3.1 and Section 3.2, FELM adopts the network structure of Figure 3(b).
It is well known that the performance of SVM is sensitive to the combination of $ (C, \gamma) $. Similar to SVM, the generalization performance of LS-SVM is also closely dependent on the combination of $ (C, \gamma) $. Therefore, in order to achieve good generalization performance, it is necessary to select appropriate cost parameter $ C $ and kernel parameter $ \gamma $ for SVM and LS-SVM in each dataset. We tried 17 different values of $ C $ and $ \gamma $, that is, for each dataset, we used 17 different $ C $ values and 17 different $ \gamma $ values, a total of 289 pairs $ (C, \gamma) $. Each problem is tested 50 times, and the training data set and the test data set are randomly generated from the entire dataset. For each dataset, two-thirds is the training set and the rest is the test set. This section gives the simulation results, including average training and test accuracy, corresponding standard deviation (Dev), and training time. In experiments, all inputs (attributes) and outputs (targets) have been normalized into [-1, 1].
To test the performance of FELM on regression problems, we first use the objective function 'sinc' function, which is defined as:
$ y = S\text{inc}\left(x\right) = \frac{\mathit{sin}x}{x}, x\in [-4\pi , 4\pi ]. $ |
To effectively reflect the performance of our algorithm, some different forms of zero mean Gaussian noise pollution are added to the training data points. In particular, we have the following training samples $ ({x}_{i}, {y}_{i}), i = \mathrm{1, 2}, ..., l $.
(Type A) $ {y}_{i} = \frac{\mathit{sin}{x}_{i}}{{x}_{i}}+{\xi }_{i}, {x}_{i} \sim U[-4\pi, 4\pi], {\xi }_{i} \sim N(\mathrm{0, 0}.{1}^{2}) $
(Type B) $ {y}_{i} = \frac{\mathit{sin}{x}_{i}}{{x}_{i}}+{\xi }_{i}, {x}_{i} \sim U[-4\pi, 4\pi], {\xi }_{i} \sim N(\mathrm{0, 0}.{2}^{2}) $
Next, we proceed to compare the performance of the proposed FELM with other algorithms using the following two synthetic datasets.
$ g\left(x\right) = \left|\frac{x-1}{4}\right|+\left|\mathit{sin}\left(\pi \left(1+\frac{x-1}{4}\right)\right)\right|+1, -10\le x\le 10. $ |
Again, all training data points are shifted by adding different forms of Gaussian noise below.
(Type C) $ {y}_{i} = g\left({x}_{i}\right)+{\xi }_{i}, {x}_{i} \sim U[-\mathrm{10, 10}], {\xi }_{i} \sim N(\mathrm{0, 0}.{2}^{2}) $
(Type D) $ {y}_{i} = g\left({x}_{i}\right)+{\xi }_{i}, {x}_{i} \sim U[-\mathrm{10, 10}], {\xi }_{i} \sim N(\mathrm{0, 0}.{4}^{2}) $
where $ U[a, b] $ and $ N(c, {d}^{2}) $ denote the uniform random variable in $ [a, b] $ and the Gaussian random variable with mean $ c $ and variance $ {d}^{2} $, respectively. The training set and the test set have 5000 data respectively, which are evenly and randomly distributed on the interval $ [a, b] $. To make the regression problem 'real', Types A–D added four different forms of Gaussian noise to all training samples, while the test data remained noise-free.
As shown in Table 1, appropriate basic functions are assigned to our FELM algorithm on four different synthetic datasets. The initial node number of ELM algorithm is 5, and the optimal number of hidden layer nodes is found by interval 5 nodes in 5–100, and the initial maximum number of neurons for OP-ELM is 100. The results of 50 experiments on all algorithms are shown in Table 2, where bold indicates optimal test accuracy. Figure 5 plots the one-time fit curves of FELM and other regressors on these synthetic datasets with different noise types. It can be seen from Table 2 that the proposed FELM learning algorithm achieves the highest test accuracy (root mean square error, RMS) on artificial datasets with noise types A, B and D. On the artificial dataset with noise type C, FELM is superior to ELM, OP-ELM and LSSVR, and is only lower than SVR. Table 2 also shows the optimal parameter combinations of SVR and LSSVR on these synthetic datasets and the required support vectors (SVs), the network complexity (nodes) of FELM, ELM and OP-ELM. In addition, Table 2 also compares the training and testing time of these five methods. It can be seen that the proposed FELM is the fastest learning method, which is several times or dozens of times faster than ELM, and hundreds of times faster than OP-ELM, SVR and LSSVR. This is because compared with ELM, SVR and LSSVR, FELM has the smallest network complexity, so it requires less learning time. Compared with OP-ELM, FELM does not need to cut out redundant nodes, so it requires less training time. Since the number of support vectors required for SVR and the support vectors required for LSSVR are much larger than the network complexity of FELM, they all take more test time than FELM, at least 60 times more than it does, which means that FELM trained in actual deployment may respond to new external unknown data much faster than SVM and LS-SVM. In short, the proposed FELM outperforms the other four comparison algorithms in approaching four artificial datasets with different types of noise.
Datasets | Basic functions |
Types A and B | $ \{1, {x}^{2}, {x}^{4}, {x}^{6}, {x}^{8}, {x}^{10}, \mathit{cos}(x), \mathit{cos}(3x), \mathit{cos}(5x)\} $ |
Type C | $ \{x, {x}^{2}, {x}^{3}, \mathit{sin}(\frac{x}{2}), \mathit{cos}(\frac{x}{2}), \mathit{sin}(x), \mathit{cos}(x), \mathit{cos}(2x), {e}^{\frac{x}{3}}, {e}^{-\frac{x}{3}}, {e}^{\frac{x}{5}}, \mathit{cos}(\frac{\pi }{2}(1+x)), $ $ \mathit{cos}(\pi (1+x)), \mathit{cos}(2\pi (1+x\left)\right), \mathit{cos}(3\pi (1+x)), \mathit{arctan}\left(\frac{x}{7}\right), \mathit{arctan}(\frac{x}{6}), $ $ \mathit{arctan}(\frac{x}{5}), \mathit{arctan}(\frac{x}{4}), \mathit{arctan}(\frac{x}{3}), \mathit{arctan}(\frac{x}{2}), \mathit{arctan}(x), \mathit{arctan}(2x), $ $ \mathit{arctan}(3x), \mathit{arctan}(4x), \mathit{arctan}(5x\left)\right\} $ |
Type D | $ \{x, \mathit{sin}(\frac{x}{2}), \mathit{cos}(\frac{x}{2}), \mathit{cos}(\frac{x}{3}), \mathit{sin}(x), \mathit{sin}(2x), \mathit{cos}(x), \mathit{cos}(2x), \mathit{cos}(3x), {e}^{\frac{x}{3}}, {e}^{-\frac{x}{3}}, {e}^{\frac{x}{5}}, $ $ \mathit{cos}(\frac{\pi }{2}(1+x)), \mathit{cos}(\pi (1+x\left)\right), \mathit{cos}(2\pi (1+x)), \mathit{cos}(3\pi (1+x\left)\right)\} $ |
Noise | Regressor | Time(s) | Testing | SVs/nodes | ||
(C, γ) | Training | Testing | RMS | DEV | ||
Type A | FELM | 0.0024 | 0.0014 | 0.0065 | 0.0006 | 9 |
ELM | 0.0141 | 0.0047 | 0.0065 | 0.0012 | 20 | |
OP-ELM | 0.8194 | 0.0025 | 0.0060 | 0.0010 | 15.50 | |
SVR (27, 2-2) | 2.2711 | 0.3150 | 0.0145 | 0.0020 | 1613.46 | |
LSSVR (28, 20) | 2.0607 | 0.3824 | 0.0087 | 0.0010 | 5000 | |
Type B | FELM | 0.0027 | 0.0014 | 0.0098 | 0.0015 | 9 |
ELM | 0.0138 | 0.0084 | 0.0127 | 0.0021 | 20 | |
OP-ELM | 0.7905 | 0.0024 | 0.0115 | 0.0024 | 14.70 | |
SVR (22, 2-2) | 1.7248 | 0.6016 | 0.0188 | 0.0026 | 3089.38 | |
LSSVR (28, 20) | 1.9938 | 0.3710 | 0.0177 | 0.0018 | 5000 | |
Type C | FELM | 0.0053 | 0.0024 | 0.0290 | 0.0011 | 26 |
ELM | 0.4159 | 0.0194 | 0.0328 | 0.0012 | 90 | |
OP-ELM | 0.8390 | 0.0031 | 0.0686 | 0.0029 | 19.20 | |
SVR (28, 2-1) | 35.1289 | 0.5969 | 0.0283 | 0.0014 | 3107.00 | |
LSSVR (28, 2-8) | 2.2814 | 0.3572 | 0.0601 | 0.0022 | 5000 | |
Type D | FELM | 0.0042 | 0.0015 | 0.0384 | 0.0019 | 16 |
ELM | 0.4656 | 0.0209 | 0.0429 | 0.0032 | 100 | |
OP-ELM | 0.7586 | 0.0027 | 0.0725 | 0.0046 | 18.10 | |
SVR (27, 2-1) | 15.4804 | 0.8234 | 0.0434 | 0.0040 | 4020.82 | |
LSSVR (28, 20) | 1.9812 | 0.3792 | 0.0396 | 0.0037 | 5000 |
For further evaluation, 16 different regression datasets are selected. These datasets are usually used to test machine learning algorithms, mainly from UCI Machine Learning repository [70] and StatLib [71]. The different attributes of 16 datasets are summarized in Table 3.
Datasets | #Train | #Test | #Total | #Features |
Abalone | 2784 | 1393 | 4177 | 8 |
Mpg | 261 | 131 | 392 | 7 |
Autoprice | 106 | 53 | 159 | 15 |
Balloon | 1334 | 667 | 2001 | 2 |
Baskball | 64 | 32 | 96 | 4 |
Cleveland | 202 | 101 | 303 | 13 |
Cloud | 72 | 36 | 108 | 9 |
Concrete CS | 686 | 344 | 1030 | 8 |
Diabetes | 28 | 15 | 43 | 2 |
Housing | 337 | 169 | 506 | 13 |
Machine CPU | 139 | 70 | 209 | 6 |
Mg | 923 | 462 | 1385 | 6 |
Quake | 1452 | 726 | 2178 | 3 |
Servo | 111 | 56 | 167 | 4 |
Strike | 416 | 209 | 625 | 6 |
Wisconsin B.C. | 129 | 65 | 194 | 32 |
As shown in Table 4, we assign appropriate basic functions to our FELM algorithm on 16 different datasets. It can be seen that these basic functions are relatively short in length, indicating that the structural complexity of the networks is low. The initial number of nodes in ELM is 5, and the optimal number of hidden layer nodes is found at intervals of 5 nodes within 5–100. The optimal number of nodes obtained by ELM on each dataset is shown in Table 5. The table also shows the best parameter combination and support vector number of SVR and LSSVR on each dataset, the initial maximum number of neurons and the number of neurons after pruning of OP-ELM.
Datasets | Basic functions | Datasets | Basic functions |
Abalone | $ \{{e}^{-x}, {e}^{x}\} $ | Diabetes | $ \{{e}^{-x}, {e}^{x}\} $ |
Mpg | $ \{1, {e}^{-\frac{x}{2}}, {e}^{-\frac{x}{3}}, {e}^{-\frac{x}{4}}\} $ | Housing | $ \{1, {e}^{-x}, {e}^{x}\} $ |
Autoprice | $ \{1, {e}^{\frac{x}{4}}\} $ | Machine CPU | $ \{1, {e}^{-x}, {e}^{x}\} $ |
Balloon | $ \{{e}^{-x}, {e}^{x}, {e}^{-2x}, {e}^{2x}, 1, x, x, {x}^{2}, {x}^{3}\} $ | Mg | $ \left\{\mathit{sin}(3x\right), \mathit{sin}(5x), {x}^{3}, {e}^{\frac{x}{2}}, {e}^{-\frac{x}{2}}, {e}^{\frac{x}{5}}, {e}^{-\frac{x}{5}}\} $ |
Baskball | $ \{{e}^{-\frac{x}{5}}, {e}^{-\frac{x}{6}}\} $ | Quake | $ \{1, \mathit{sin}(x), \mathit{sin}(3x), \mathit{sin}(5x), x, {x}^{3}, {x}^{5}, {e}^{-x}, {e}^{x}\} $ |
Cleveland | $ \{1, {e}^{-\frac{x}{2}}\} $ | Servo | $ \{1, {e}^{-3x}, {e}^{3x}, \mathit{sin}(\frac{x}{4}), \mathit{cos}(\frac{x}{4}), \mathit{sin}(\frac{x}{6}), \mathit{cos}(\frac{x}{6}\left)\right\} $ |
Cloud | $ \{1, {e}^{-\frac{x}{5}}\} $ | Strike | $ \{1, \mathit{sin}(\frac{x}{4})\} $ |
Concrete CS | $ \{1, \mathit{sin}(x), \mathit{sin}(3x), \mathit{sin}(5x), x, {x}^{2}, {x}^{3}, {e}^{-x}, {e}^{x}\} $ | Wisconsin B.C. | $ \left\{\mathit{sin}(\frac{x}{4}\right), \mathit{cos}(\frac{x}{4}\left)\right\} $ |
Algorithm | SVR | LSSVR | ELM | OP-ELM | ||||
$ \varepsilon $ | $ (C, \gamma) $ | # SVs | $ (C, \gamma) $ | # SVs | # nodes | init | final | |
Abalone | $ {2^{ - 5}} $ | $ ({2}^{1}, {2}^{0}) $ | 1051.28 | $ ({2}^{5}, {2}^{1}) $ | 2784 | 35 | 100 | 33 |
Mpg | $ {2^{ - 8}} $ | $ ({2}^{0}, {2}^{-1}) $ | 104.08 | $ ({2}^{2}, {2}^{1}) $ | 261 | 30 | 100 | 36 |
Autoprice | $ {2^{ - 6}} $ | $ ({2}^{3}, {2}^{-3}) $ | 42.66 | $ ({2}^{6}, {2}^{2}) $ | 106 | 15 | 100 | 14 |
Balloon | $ {2^{ - 3}} $ | $ ({2}^{7}, {2}^{-2}) $ | 6 | $ ({2}^{8}, {2}^{\text{-}1}) $ | 1334 | 20 | 100 | 41 |
Baskball | $ {2^{ - 8}} $ | $ ({2}^{3}, {2}^{-4}) $ | 44.38 | $ ({2}^{2}, {2}^{2}) $ | 64 | 10 | 62 | 7 |
Cleveland | $ {2^{ - 6}} $ | $ ({2}^{1}, {2}^{-7}) $ | 142.72 | $ ({2}^{4}, {2}^{8}) $ | 198 | 20 | 100 | 9 |
Cloud | $ {2^{ - 6}} $ | $ ({2}^{7}, {2}^{-8}) $ | 20.62 | $ ({2}^{8}, {2}^{6}) $ | 72 | 15 | 70 | 20 |
Concrete CS | $ {2^{ - 8}} $ | $ ({2}^{4}, {2}^{-1}) $ | 174.42 | $ ({2}^{7}, {2}^{\text{-1}}) $ | 686 | 90 | 100 | 87 |
Diabetes | $ {2^{ - 5}} $ | $ ({2}^{1}, {2}^{-2}) $ | 21.12 | $ ({2}^{2}, {2}^{0}) $ | 28 | 5 | 26 | 6 |
Housing | $ {2^{ - 8}} $ | $ ({2}^{3}, {2}^{-2}) $ | 123.42 | $ ({2}^{8}, {2}^{2}) $ | 337 | 80 | 100 | 58 |
Machine CPU | $ {2^{ - 7}} $ | $ ({2}^{8}, {2}^{-6}) $ | 8.74 | $ ({2}^{7}, {2}^{3}) $ | 139 | 30 | 100 | 15 |
Mg | $ {2^{ - 6}} $ | $ ({2}^{1}, {2}^{0}) $ | 592.84 | $ ({2}^{1}, {2}^{\text{-2}}) $ | 923 | 70 | 100 | 84 |
Quake | $ {2^{ - 1}} $ | $ ({2}^{-4}, {2}^{7}) $ | 493.28 | $ ({2}^{0}, {2}^{8}) $ | 1452 | 30 | 100 | 11 |
Servo | $ {2^{ - 7}} $ | $ ({2}^{5}, {2}^{-3}) $ | 46.36 | $ ({2}^{6}, {2}^{1}) $ | 111 | 25 | 100 | 41 |
Strike | $ {2^{ - 7}} $ | $ ({2}^{8}, {2}^{-8}) $ | 88.74 | $ ({2}^{-2}, {2}^{2}) $ | 416 | 10 | 100 | 11 |
Wisconsin B.C. | $ {2^{ - 8}} $ | $ ({2}^{7}, {2}^{-8}) $ | 14.92 | $ ({2}^{8}, {2}^{6}) $ | 129 | 75 | 100 | 51 |
The results of 50 trials on 16 datasets by the proposed FELM and other comparison algorithms are shown in Tables 6–8. The bold body in Table 6 indicates the optimal test accuracy. The comparison of FELM and the other four comparison algorithms on testing RMSE is shown in Table 6. The minimum test RMSE is obtained on 10 datasets of Autoprice, Balloon, Baskball, Cleveland, Cloud, Diabetes, Machine CPU, Servo, Strike and Wisconsin B.C. On other datasets, although the accuracy obtained by our algorithm is lower than SVR and LSSVR, it is higher than ELM and OP-ELM. The comparison of the five algorithms in training and testing time is shown in Table 7. The table shows that our FELM spends similar training and testing time as ELM, but much less than OP-ELM, SVR and LSSVR. The comparison results of FELM and other algorithms on the standard deviation of testing RMSE are shown in Table 8. According to the table, our FELM is a stable learning method. Figure 6 shows the test RMSE comparison of FELM and other comparison algorithms running 50 times on four datasets (Autoprice, Cleveland, Abalone and Quake). In short, combined with Tables 6–8 and the intuitive display of Figure 6, we can know that the proposed method FELM not only has good versatility and stability, but also has fast training speed.
Datasets | ELM | OP-ELM | SVR | LSSVR | FELM |
Abalone | 0.1566 | 0.2186 | 0.1513 | 0.1501 | 0.1525 |
Mpg | 0.1575 | 0.1568 | 0.1403 | 0.1389 | 0.1415 |
Autoprice | 0.1927 | 0.1804 | 0.1712 | 0.1669 | 0.1600 |
Balloon | 0.0157 | 0.0175 | 0.0420 | 0.0097 | 0.0072 |
Baskball | 0.2631 | 0.2669 | 0.2623 | 0.2576 | 0.2477 |
Cleveland | 0.4538 | 0.4359 | 0.4370 | 0.4281 | 0.4204 |
Cloud | 0.1458 | 0.1761 | 0.1240 | 0.1234 | 0.1168 |
Concrete CS | 0.1249 | 0.1196 | 0.1008 | 0.0792 | 0.1122 |
Diabetes | 0.3787 | 0.4635 | 0.3456 | 0.3242 | 0.2938 |
Housing | 0.1972 | 0.2305 | 0.1458 | 0.1478 | 0.1754 |
Machine CPU | 0.0474 | 0.0933 | 0.0731 | 0.0398 | 0.0395 |
Mg | 0.2748 | 0.2722 | 0.2719 | 0.2657 | 0.2678 |
Quake | 0.3475 | 0.3466 | 0.3420 | 0.3441 | 0.3436 |
Servo | 0.2300 | 0.1979 | 0.1819 | 0.1785 | 0.1694 |
Strike | 0.1517 | 0.1614 | 0.1541 | 0.1423 | 0.1320 |
Wisconsin B.C. | 0.0782 | 0.0269 | 0.0670 | 0.0264 | 0.0210 |
Datasets | ELM (s) | OP-ELM (s) | SVR(s) | LSSVR(s) | FELM (s) | |||||
Train | Test | Train | Test | Train | Test | Train | Test | Train | Test | |
Abalone | 0.0020 | 0.0008 | 0.5504 | 0.0019 | 0.3782 | 0.0756 | 0.5977 | 0.0791 | 0.0021 | 0.0006 |
Mpg | 0.0004 | 0.0004 | 0.0405 | 0.0008 | 0.0040 | 0.0007 | 0.0066 | 0.0017 | 0.0006 | 0.0002 |
Autoprice | 0.0002 | 0.0004 | 0.0248 | 0.0003 | 0.0010 | 0.0002 | 0.0037 | 0.0012 | 0.0007 | 0.0004 |
Balloon | 0.0006 | 0.0005 | 0.1419 | 0.0014 | 0.0015 | 0.0003 | 0.1311 | 0.0203 | 0.0011 | 0.0003 |
Baskball | 0.0001 | 0.0004 | 0.0108 | 0.0002 | 0.0006 | 0.0001 | 0.0027 | 0.0012 | 0.0001 | 0.0001 |
Cleveland | 0.0003 | 0.0007 | 0.0325 | 0.0003 | 0.0047 | 0.0010 | 0.0052 | 0.0018 | 0.0004 | 0.0002 |
Cloud | 0.0001 | 0.0004 | 0.0136 | 0.0004 | 0.0005 | 0.0001 | 0.0027 | 0.0011 | 0.0002 | 0.0001 |
Concrete CS | 0.0024 | 0.0008 | 0.1046 | 0.0025 | 0.0427 | 0.0028 | 0.0355 | 0.0077 | 0.0033 | 0.0009 |
Diabetes | 0.0001 | 0.0004 | 0.0048 | 0.0002 | 0.0002 | 0.0000 | 0.0024 | 0.0011 | 0.0001 | 0.0000 |
Housing | 0.0013 | 0.0005 | 0.0516 | 0.0015 | 0.0099 | 0.0012 | 0.0073 | 0.0029 | 0.0011 | 0.0003 |
Machine CPU | 0.0003 | 0.0004 | 0.0260 | 0.0004 | 0.0004 | 0.0001 | 0.0038 | 0.0020 | 0.0002 | 0.0001 |
Mg | 0.0016 | 0.0006 | 0.1538 | 0.0029 | 0.0628 | 0.0114 | 0.0634 | 0.0089 | 0.0038 | 0.0016 |
Quake | 0.0010 | 0.0005 | 0.2497 | 0.0005 | 0.0656 | 0.0130 | 0.1651 | 0.0186 | 0.0023 | 0.0006 |
Servo | 0.0002 | 0.0004 | 0.0268 | 0.0008 | 0.0027 | 0.0002 | 0.0035 | 0.0011 | 0.0004 | 0.0002 |
Strike | 0.0002 | 0.0003 | 0.0577 | 0.0003 | 0.0139 | 0.0012 | 0.0125 | 0.0039 | 0.0004 | 0.0001 |
Wisconsin B.C. | 0.0012 | 0.0004 | 0.0289 | 0.0010 | 0.0007 | 0.0001 | 0.0041 | 0.0019 | 0.0014 | 0.0006 |
Datasets | ELM | OP-ELM | SVR | LSSVR | FELM | |||||
Abalone | 0.0068 | 0.2685 | 0.0040 | 0.0043 | 0.0043 | |||||
Mpg | 0.0134 | 0.0176 | 0.0135 | 0.0130 | 0.0123 | |||||
Autoprice | 0.0340 | 0.0556 | 0.0364 | 0.0397 | 0.0240 | |||||
Balloon | 0.0102 | 0.0121 | 0.0121 | 0.0019 | 0.0005 | |||||
Baskball | 0.0285 | 0.0288 | 0.0293 | 0.0272 | 0.0278 | |||||
Cleveland | 0.0312 | 0.0356 | 0.0351 | 0.0343 | 0.0316 | |||||
Cloud | 0.0309 | 0.0672 | 0.0444 | 0.0363 | 0.0317 | |||||
Concrete CS | 0.0083 | 0.0085 | 0.0085 | 0.0084 | 0.0050 | |||||
Diabetes | 0.1095 | 0.3457 | 0.0575 | 0.0524 | 0.0443 | |||||
Housing | 0.0356 | 0.1718 | 0.0192 | 0.0189 | 0.0197 | |||||
Machine CPU | 0.0620 | 0.0623 | 0.0191 | 0.0249 | 0.0221 | |||||
Mg | 0.0100 | 0.0088 | 0.0105 | 0.0079 | 0.0083 | |||||
Quake | 0.0088 | 0.0093 | 0.0112 | 0.0090 | 0.0085 | |||||
Servo | 0.0407 | 0.0442 | 0.0584 | 0.0405 | 0.0423 | |||||
Strike | 0.0383 | 0.0312 | 0.0305 | 0.0380 | 0.0331 | |||||
Wisconsin B.C. | 0.0183 | 0.0091 | 0.0097 | 0.0030 | 0.0041 |
The XOR problem dataset randomly generates 1000 samples here, with one class containing 478 samples and the other containing 522 samples. The binary problem is not linearly separable. On this problem, if FELM adopts the structure shown in Figure 3(b), it is not easy to find suitable basic function families for its neurons. Therefore, we adopt the multi-input single-output FELM structure shown in Figure 7 to achieve better generalization performance by increasing the number of hidden layer nodes. FELM sets the number of hidden layer nodes on this problem to be 73, and the node functions correspond to the following:
{$ \mathit{sin}({x}_{2}) $; $ \mathit{sin}(3{x}_{2}) $; $ \mathit{sin}(5{x}_{2}) $; $ {x}_{2} $; $ {e}^{2{x}_{2}} $; $ \mathit{sin}({x}_{1}\left)\mathit{sin}({x}_{2}\right) $; $ \mathit{sin}({x}_{1}\left)\mathit{sin}(3{x}_{2}\right) $; $ \mathit{sin}({x}_{1}\left)\mathit{sin}(5{x}_{2}\right) $; $ \mathit{sin}({x}_{1}){x}_{2} $; $ \mathit{sin}({x}_{1}){x}_{2}^{3} $; $ \mathit{sin}({x}_{1}){e}^{{x}_{2}} $; $ \mathit{sin}({x}_{1}){e}^{{x}_{2}} $; $ \mathit{sin}(3{x}_{1}\left)\mathit{sin}(5{x}_{2}\right) $; $ \mathit{sin}(3{x}_{1}){x}_{2} $; $ \mathit{sin}(3{x}_{1}){x}_{2}^{2} $; $ \mathit{sin}(3{x}_{1}){e}^{-2{x}_{2}} $; $ \mathit{sin}(3{x}_{1}){e}^{2{x}_{2}} $; $ \mathit{sin}(5{x}_{1}\left)\mathit{sin}({x}_{2}\right) $; $ \mathit{sin}(5{x}_{1}\left)\mathit{sin}(5{x}_{2}\right) $; $ \mathit{sin}(5{x}_{1}){x}_{2} $; $ \mathit{sin}(5{x}_{1}){x}_{2}^{2} $; $ \mathit{sin}(5{x}_{1}){e}^{-{x}_{2}} $; $ \mathit{sin}(5{x}_{1}){e}^{2{x}_{2}} $; $ \mathit{sin}(5{x}_{1}){e}^{2{x}_{2}} $; $ {x}_{1} $; $ {x}_{1}\mathit{sin}({x}_{2}) $; $ {x_1} sin(3x2)$;$x1sin(5x2)$;$x1x2$;$x1
The initial maximum number of neurons for OP-ELM is 100. This section also adds an ELM comparison with an activation function of RBF. The initial number of nodes of the ELM is 5, and the optimal number of hidden layer nodes is found at intervals of 10 nodes within 5–1000, and the optimal number of nodes obtained on each dataset is shown in Table 9. The table also shows the optimal parameter combination and support vector number of SVM and LSSVM on the problem, and the final number of neurons of OP-ELM.
Regressor (C, γ) | Time(s) | Testing | SVs/nodes | ||
Training | Testing | Rate (%) | DEV (%) | ||
FELM | 0.0704 | 0.0017 | 98.82 | 0.62 | 73 |
ELM (sig) | 0.0054 | 0.0013 | 97.29 | 0.86 | 155 |
ELM (RBF) | 0.0020 | 0.0013 | 97.38 | 0.90 | 75 |
OP-ELM | 0.0704 | 0.0015 | 97.24 | 1.06 | 49 |
SVM (24, 26) | 0.0188 | 0.0033 | 97.29 | 0.87 | 269.86 |
LSSVM (23, 2-4) | 0.0219 | 0.0062 | 97.89 | 0.78 | 666 |
The average results of 50 trials conducted by FELM and other models on the XOR dataset are shown in Table 9. The data in the table show that the performance of FELM is better than ELM, OP-ELM, SVM and LSSVM. Figure 8 shows the boundaries of different classifiers on the XOR problem. It can be seen that, similar to ELM, OP-ELM, SVM and LS-SVM, FELM can solve the XOR problem well.
The newly proposed FELM algorithm is compared with four other popular algorithms (ELM, OP-ELM, SVM and LSSVM) on four classification problems: Iris, WDBC, Diabetes and Wine. These four datasets are from UCI Machine Learning repository [70], and the number of samples, attributes and classes are shown in Table 10. The ELM algorithm sets the initial number of nodes to 5, and finds the optimal number of hidden layer nodes at intervals of 10 nodes within 5–1000. As shown in Table 11, we assign appropriate basic functions to these datasets for our FELM algorithm, and the optimal number of nodes obtained by the ELM algorithm on each dataset. The table also shows the optimal parameter combination and support vector number of SVM and LSSVM, the initial maximum number of neurons and the number of neurons after pruning of OP-ELM.
Datasets | #Train | #Test | #Total | #(Featdures/Classes) |
Iris | 100 | 50 | 150 | 4/3 |
WDBC | 379 | 190 | 569 | 30/2 |
Diabetes | 512 | 256 | 768 | 8/2 |
Wine | 118 | 60 | 178 | 13/3 |
Algorithm | FELM | SVM | LSSVM | ELM | OP-ELM | |||
basic functions | (C, γ) | # SVs | (C, γ) | # SVs | # nodes | init | final | |
Iris | $ \{1, {e}^{-x}, {e}^{x}, {e}^{-2x}, {e}^{2x}\} $ | (26, 2-5) | 20.48 | (2-4, 23) | 100 | 15 | 90 | 16.80 |
WDBC | $ \{1, {e}^{-3x}, {e}^{-5x}\} $ | (22, 2-3) | 55.04 | (22, 25) | 379 | 65 | 90 | 32.90 |
Diabetes | $ \{{e}^{-x}, {e}^{x}\} $ | (24, 2-5) | 281.7 | (24, 23) | 512 | 15 | 90 | 19.90 |
Wine | $ \{{e}^{-x}, {e}^{-2x}\} $ | (2-1, 2-2) | 63.64 | (2-3, 23) | 118 | 15 | 90 | 33.40 |
The performance comparison between all algorithms is shown in Tables 12–14. In the comparison of these five algorithms, obviously, better test results are given in bold. In Table 12, our FELM compared with four other algorithms on testing correct classification rate, and FELM achieved the highest test correct classification rate. The comparison results of the five algorithms for training and testing time are shown in Table 13. The learning speed of FELM is similar to the ELM, which is much faster than the OP-ELM, SVM and LSSVM. In Table 14, FELM is compared with other comparison algorithms on the standard deviation of testing correct classification rate, and the results show that FELM has good stability. Figure 9 shows the successful classification rate comparison of FELM and the other four algorithms running 50 times on four classification datasets. It can be seen that FELM obtains the highest number of higher classification rates. Compared with other algorithms, the curve fluctuation is smaller, indicating that its stability is better. In summary, combined with Tables 12–14 and Figure 9, it can be seen that FELM not only guarantees the learning speed in all cases, but also achieves better generalization performance.
Datasets | ELM | OP-ELM | SVM | LSSVM | FELM |
Iris | 95.36 | 95.36 | 96.40 | 91.32 | 97.76 |
WDBC | 95.16 | 95.83 | 97.78 | 95.01 | 97.81 |
Diabetes | 76.83 | 76.89 | 77.59 | 73.70 | 77.85 |
Wine | 96.03 | 95.73 | 97.90 | 94.50 | 98.87 |
Datasets | ELM (s) | OP-ELM (s) | SVM (s) | LSSVM (s) | FELM (s) | |||||
Train | Test | Train | Test | Train | Test | Train | Test | Train | Test | |
Iris | 0.0002 | 0.0005 | 0.0217 | 0.0004 | 0.0004 | 0.0001 | 0.0022 | 0.0036 | 0.0002 | 0.0001 |
WDBC | 0.0013 | 0.0009 | 0.0519 | 0.0010 | 0.0047 | 0.0010 | 0.0079 | 0.0161 | 0.0043 | 0.0010 |
Diabetes | 0.0004 | 0.0007 | 0.0716 | 0.0006 | 0.0199 | 0.0044 | 0.0120 | 0.0078 | 0.0005 | 0.0002 |
Wine | 0.0002 | 0.0005 | 0.0255 | 0.0007 | 0.0011 | 0.0003 | 0.0024 | 0.0076 | 0.0004 | 0.0002 |
Datasets | ELM | OP-ELM | SVM | LSSVM | FELM |
Iris | 3.19 | 2.95 | 2.62 | 3.89 | 1.84 |
WDBC | 1.63 | 1.45 | 0.91 | 1.56 | 1.00 |
Diabetes | 2.21 | 2.15 | 2.15 | 3.15 | 2.17 |
Wine | 3.31 | 2.43 | 1.64 | 2.82 | 1.41 |
In this paper, we propose a new method for data regression and classification called functional extreme learning machine (FELM). Different from the traditional ELM, FELM is problem-driven rather than model-driven, without the concept of weight and bias. It uses the functional neuron as the basic unit, and uses functional equation solving theory to guide its modeling process. The functional neuron of the learning machine is represented by a linear combination of any linearly independent basic functions, and infinitely approximates the desired accuracy by adjusting the coefficients of the basic functions in the functional neuron. In addition, the parameter fast learning algorithm proposed in this paper does not need iteration and has high accuracy. Its learning process is different from the ELM used by people at present. It is expected to fundamentally overcome the shortcomings of the random initial parameters of the hidden layer (connection weights, bias values, number of nodes) in the current ELM theory that significantly affect the classification accuracy of ELM. Like ELM, FELM has less human intervention. It only needs to match the appropriate basic functions for the problem, and can obtain the optimal parameters according to the parameter learning algorithm without iteration. Simulation results show that compared with ELM, FELM has better performance and similar learning speed in regression and classification. Compared to SVM and LS-SVM, FELM can run stably with faster learning speed (up to several hundred times) while guaranteeing generalization performance. The proposed FELM theory provides a new idea for tapping the potential of extreme learning and broadening the application of extreme learning, which has important theoretical significance and broad application prospects. In the future work, we will use the parameter screening algorithm to further improve the generalization ability and stability of FELM and broaden its practical application range. These are the author's next works.
This research is funded by the National Natural Science Foundation of China, Grant number 62066005, U21A20464. Project of the Guangxi Science and Technology under Grant No. 2019KY0185.
The authors declare there is no conflict of interest.
[1] | PLOS Currents Outbreaks, 2014. |
[2] | The Journal of Infectious Diseases, 196 (2007), S142-147. |
[3] | Available from: http://www.cdc.gov/vhf/ebola/outbreaks/2014-west-africa/cumulative-cases-graphs.html. (Last accessed on 18 March 2015.) |
[4] | Available from: http://www.cdc.gov/vhf/ebola/healthcare-us/preparing/clinicians.html. (Last accessed on 18 March 2015.) |
[5] | PLOS Currents Outbreaks, (2014), 1-12. |
[6] | Journal of Mathematical Biology, 28 (1990), 365-382. |
[7] | Emerging Infectious Diseases, 12 (2006), 122-127. |
[8] | Journal of Experimental Botany, 10 (1959), 290-301. |
[9] | Mathematical Biosciences, 180 (2002), 29-48. |
[10] | Journal of Theoretical Biology, 313 (2012), 12-19. |
[11] | Available from: http://apps.who.int/iris/bitstream/10665/137376/1/roadmapsitrep_29Oct2014_eng.pdf?ua=1. (Last accessed on 18 March 2015.) |
[12] | Available from: http://apps.who.int/ebola/en/current-situation. (Last accessed on 18 March 2015.) |
1. | Maryam Sayadi, Behzad Hessari, Majid Montaseri, Amir Naghibi, Enhanced TDS Modeling Using an AI Framework Integrating Grey Wolf Optimization with Kernel Extreme Learning Machine, 2024, 16, 2073-4441, 2818, 10.3390/w16192818 | |
2. | Rongchao Yang, Qingbo Zhou, Lei Xu, Yi Zhang, Tongyang Wei, Forecasting the Total Output Value of Agriculture, Forestry, Animal Husbandry, and Fishery in Various Provinces of China via NPP-VIIRS Nighttime Light Data, 2024, 14, 2076-3417, 8752, 10.3390/app14198752 | |
3. | Qasem Abu Al-Haija, Shahad Altamimi, Mazen AlWadi, Analysis of Extreme Learning Machines (ELMs) for intelligent intrusion detection systems: A survey, 2024, 253, 09574174, 124317, 10.1016/j.eswa.2024.124317 | |
4. | Xiangming Kong, Ningyun Lu, Baoli Zhang, Jianfei Chen, 2023, RUL prediction of rolling element bearings based on ACO-FELM, 979-8-3503-3775-4, 1, 10.1109/SAFEPROCESS58597.2023.10295598 | |
5. | Shahad Altamimi, Qasem Abu Al-Haija, Maximizing intrusion detection efficiency for IoT networks using extreme learning machine, 2024, 4, 2730-7239, 10.1007/s43926-024-00060-x | |
6. | Arthi. A, Kapil Aggarwal, R Karthikeyan, S. Kayalvili, Srimathi S, Aviral Srivastava, 2023, Hybrid Multimodal Machine Learning Driven Android Malware Recognition and Classification Model, 979-8-3503-4060-0, 1555, 10.1109/ICECA58529.2023.10394838 | |
7. | Abera Abiyo Dofee, Pritam Chand, Spatiotemporal dynamics of land use land cover patterns in the middle Omo-Gibe River Basin, Ethiopia: machine learning, geospatial, and field survey integrated approach, 2025, 11, 2363-6203, 10.1007/s40808-024-02185-y | |
8. | José Ignacio Amorena, Elvira Fernández-Ahumada, Dolores María Eugenia Álvarez, Preventing adulteration: authentication and identification of camelids and other textile fibers with NIR spectroscopy, 2025, 0040-5000, 1, 10.1080/00405000.2024.2447616 | |
9. | Senhui Wang, Applying genetic algorithm to extreme learning machine in prediction of tumbler index with principal component analysis for iron ore sintering, 2025, 15, 2045-2322, 10.1038/s41598-025-88755-1 | |
10. | Sheng Li, Shunli Wang, Wen Cao, Liya Zhang, Carlos Fernandez, An improved pelican optimization-kernel extreme learning machine for highly accurate state of charge estimation of lithium-ion batteries in energy storage systems, 2025, 0947-7047, 10.1007/s11581-025-06327-9 |
Datasets | Basic functions |
Types A and B | $ \{1, {x}^{2}, {x}^{4}, {x}^{6}, {x}^{8}, {x}^{10}, \mathit{cos}(x), \mathit{cos}(3x), \mathit{cos}(5x)\} $ |
Type C | $ \{x, {x}^{2}, {x}^{3}, \mathit{sin}(\frac{x}{2}), \mathit{cos}(\frac{x}{2}), \mathit{sin}(x), \mathit{cos}(x), \mathit{cos}(2x), {e}^{\frac{x}{3}}, {e}^{-\frac{x}{3}}, {e}^{\frac{x}{5}}, \mathit{cos}(\frac{\pi }{2}(1+x)), $ $ \mathit{cos}(\pi (1+x)), \mathit{cos}(2\pi (1+x\left)\right), \mathit{cos}(3\pi (1+x)), \mathit{arctan}\left(\frac{x}{7}\right), \mathit{arctan}(\frac{x}{6}), $ $ \mathit{arctan}(\frac{x}{5}), \mathit{arctan}(\frac{x}{4}), \mathit{arctan}(\frac{x}{3}), \mathit{arctan}(\frac{x}{2}), \mathit{arctan}(x), \mathit{arctan}(2x), $ $ \mathit{arctan}(3x), \mathit{arctan}(4x), \mathit{arctan}(5x\left)\right\} $ |
Type D | $ \{x, \mathit{sin}(\frac{x}{2}), \mathit{cos}(\frac{x}{2}), \mathit{cos}(\frac{x}{3}), \mathit{sin}(x), \mathit{sin}(2x), \mathit{cos}(x), \mathit{cos}(2x), \mathit{cos}(3x), {e}^{\frac{x}{3}}, {e}^{-\frac{x}{3}}, {e}^{\frac{x}{5}}, $ $ \mathit{cos}(\frac{\pi }{2}(1+x)), \mathit{cos}(\pi (1+x\left)\right), \mathit{cos}(2\pi (1+x)), \mathit{cos}(3\pi (1+x\left)\right)\} $ |
Noise | Regressor | Time(s) | Testing | SVs/nodes | ||
(C, γ) | Training | Testing | RMS | DEV | ||
Type A | FELM | 0.0024 | 0.0014 | 0.0065 | 0.0006 | 9 |
ELM | 0.0141 | 0.0047 | 0.0065 | 0.0012 | 20 | |
OP-ELM | 0.8194 | 0.0025 | 0.0060 | 0.0010 | 15.50 | |
SVR (27, 2-2) | 2.2711 | 0.3150 | 0.0145 | 0.0020 | 1613.46 | |
LSSVR (28, 20) | 2.0607 | 0.3824 | 0.0087 | 0.0010 | 5000 | |
Type B | FELM | 0.0027 | 0.0014 | 0.0098 | 0.0015 | 9 |
ELM | 0.0138 | 0.0084 | 0.0127 | 0.0021 | 20 | |
OP-ELM | 0.7905 | 0.0024 | 0.0115 | 0.0024 | 14.70 | |
SVR (22, 2-2) | 1.7248 | 0.6016 | 0.0188 | 0.0026 | 3089.38 | |
LSSVR (28, 20) | 1.9938 | 0.3710 | 0.0177 | 0.0018 | 5000 | |
Type C | FELM | 0.0053 | 0.0024 | 0.0290 | 0.0011 | 26 |
ELM | 0.4159 | 0.0194 | 0.0328 | 0.0012 | 90 | |
OP-ELM | 0.8390 | 0.0031 | 0.0686 | 0.0029 | 19.20 | |
SVR (28, 2-1) | 35.1289 | 0.5969 | 0.0283 | 0.0014 | 3107.00 | |
LSSVR (28, 2-8) | 2.2814 | 0.3572 | 0.0601 | 0.0022 | 5000 | |
Type D | FELM | 0.0042 | 0.0015 | 0.0384 | 0.0019 | 16 |
ELM | 0.4656 | 0.0209 | 0.0429 | 0.0032 | 100 | |
OP-ELM | 0.7586 | 0.0027 | 0.0725 | 0.0046 | 18.10 | |
SVR (27, 2-1) | 15.4804 | 0.8234 | 0.0434 | 0.0040 | 4020.82 | |
LSSVR (28, 20) | 1.9812 | 0.3792 | 0.0396 | 0.0037 | 5000 |
Datasets | #Train | #Test | #Total | #Features |
Abalone | 2784 | 1393 | 4177 | 8 |
Mpg | 261 | 131 | 392 | 7 |
Autoprice | 106 | 53 | 159 | 15 |
Balloon | 1334 | 667 | 2001 | 2 |
Baskball | 64 | 32 | 96 | 4 |
Cleveland | 202 | 101 | 303 | 13 |
Cloud | 72 | 36 | 108 | 9 |
Concrete CS | 686 | 344 | 1030 | 8 |
Diabetes | 28 | 15 | 43 | 2 |
Housing | 337 | 169 | 506 | 13 |
Machine CPU | 139 | 70 | 209 | 6 |
Mg | 923 | 462 | 1385 | 6 |
Quake | 1452 | 726 | 2178 | 3 |
Servo | 111 | 56 | 167 | 4 |
Strike | 416 | 209 | 625 | 6 |
Wisconsin B.C. | 129 | 65 | 194 | 32 |
Datasets | Basic functions | Datasets | Basic functions |
Abalone | $ \{{e}^{-x}, {e}^{x}\} $ | Diabetes | $ \{{e}^{-x}, {e}^{x}\} $ |
Mpg | $ \{1, {e}^{-\frac{x}{2}}, {e}^{-\frac{x}{3}}, {e}^{-\frac{x}{4}}\} $ | Housing | $ \{1, {e}^{-x}, {e}^{x}\} $ |
Autoprice | $ \{1, {e}^{\frac{x}{4}}\} $ | Machine CPU | $ \{1, {e}^{-x}, {e}^{x}\} $ |
Balloon | $ \{{e}^{-x}, {e}^{x}, {e}^{-2x}, {e}^{2x}, 1, x, x, {x}^{2}, {x}^{3}\} $ | Mg | $ \left\{\mathit{sin}(3x\right), \mathit{sin}(5x), {x}^{3}, {e}^{\frac{x}{2}}, {e}^{-\frac{x}{2}}, {e}^{\frac{x}{5}}, {e}^{-\frac{x}{5}}\} $ |
Baskball | $ \{{e}^{-\frac{x}{5}}, {e}^{-\frac{x}{6}}\} $ | Quake | $ \{1, \mathit{sin}(x), \mathit{sin}(3x), \mathit{sin}(5x), x, {x}^{3}, {x}^{5}, {e}^{-x}, {e}^{x}\} $ |
Cleveland | $ \{1, {e}^{-\frac{x}{2}}\} $ | Servo | $ \{1, {e}^{-3x}, {e}^{3x}, \mathit{sin}(\frac{x}{4}), \mathit{cos}(\frac{x}{4}), \mathit{sin}(\frac{x}{6}), \mathit{cos}(\frac{x}{6}\left)\right\} $ |
Cloud | $ \{1, {e}^{-\frac{x}{5}}\} $ | Strike | $ \{1, \mathit{sin}(\frac{x}{4})\} $ |
Concrete CS | $ \{1, \mathit{sin}(x), \mathit{sin}(3x), \mathit{sin}(5x), x, {x}^{2}, {x}^{3}, {e}^{-x}, {e}^{x}\} $ | Wisconsin B.C. | $ \left\{\mathit{sin}(\frac{x}{4}\right), \mathit{cos}(\frac{x}{4}\left)\right\} $ |
Algorithm | SVR | LSSVR | ELM | OP-ELM | ||||
$ \varepsilon $ | $ (C, \gamma) $ | # SVs | $ (C, \gamma) $ | # SVs | # nodes | init | final | |
Abalone | $ {2^{ - 5}} $ | $ ({2}^{1}, {2}^{0}) $ | 1051.28 | $ ({2}^{5}, {2}^{1}) $ | 2784 | 35 | 100 | 33 |
Mpg | $ {2^{ - 8}} $ | $ ({2}^{0}, {2}^{-1}) $ | 104.08 | $ ({2}^{2}, {2}^{1}) $ | 261 | 30 | 100 | 36 |
Autoprice | $ {2^{ - 6}} $ | $ ({2}^{3}, {2}^{-3}) $ | 42.66 | $ ({2}^{6}, {2}^{2}) $ | 106 | 15 | 100 | 14 |
Balloon | $ {2^{ - 3}} $ | $ ({2}^{7}, {2}^{-2}) $ | 6 | $ ({2}^{8}, {2}^{\text{-}1}) $ | 1334 | 20 | 100 | 41 |
Baskball | $ {2^{ - 8}} $ | $ ({2}^{3}, {2}^{-4}) $ | 44.38 | $ ({2}^{2}, {2}^{2}) $ | 64 | 10 | 62 | 7 |
Cleveland | $ {2^{ - 6}} $ | $ ({2}^{1}, {2}^{-7}) $ | 142.72 | $ ({2}^{4}, {2}^{8}) $ | 198 | 20 | 100 | 9 |
Cloud | $ {2^{ - 6}} $ | $ ({2}^{7}, {2}^{-8}) $ | 20.62 | $ ({2}^{8}, {2}^{6}) $ | 72 | 15 | 70 | 20 |
Concrete CS | $ {2^{ - 8}} $ | $ ({2}^{4}, {2}^{-1}) $ | 174.42 | $ ({2}^{7}, {2}^{\text{-1}}) $ | 686 | 90 | 100 | 87 |
Diabetes | $ {2^{ - 5}} $ | $ ({2}^{1}, {2}^{-2}) $ | 21.12 | $ ({2}^{2}, {2}^{0}) $ | 28 | 5 | 26 | 6 |
Housing | $ {2^{ - 8}} $ | $ ({2}^{3}, {2}^{-2}) $ | 123.42 | $ ({2}^{8}, {2}^{2}) $ | 337 | 80 | 100 | 58 |
Machine CPU | $ {2^{ - 7}} $ | $ ({2}^{8}, {2}^{-6}) $ | 8.74 | $ ({2}^{7}, {2}^{3}) $ | 139 | 30 | 100 | 15 |
Mg | $ {2^{ - 6}} $ | $ ({2}^{1}, {2}^{0}) $ | 592.84 | $ ({2}^{1}, {2}^{\text{-2}}) $ | 923 | 70 | 100 | 84 |
Quake | $ {2^{ - 1}} $ | $ ({2}^{-4}, {2}^{7}) $ | 493.28 | $ ({2}^{0}, {2}^{8}) $ | 1452 | 30 | 100 | 11 |
Servo | $ {2^{ - 7}} $ | $ ({2}^{5}, {2}^{-3}) $ | 46.36 | $ ({2}^{6}, {2}^{1}) $ | 111 | 25 | 100 | 41 |
Strike | $ {2^{ - 7}} $ | $ ({2}^{8}, {2}^{-8}) $ | 88.74 | $ ({2}^{-2}, {2}^{2}) $ | 416 | 10 | 100 | 11 |
Wisconsin B.C. | $ {2^{ - 8}} $ | $ ({2}^{7}, {2}^{-8}) $ | 14.92 | $ ({2}^{8}, {2}^{6}) $ | 129 | 75 | 100 | 51 |
Datasets | ELM | OP-ELM | SVR | LSSVR | FELM |
Abalone | 0.1566 | 0.2186 | 0.1513 | 0.1501 | 0.1525 |
Mpg | 0.1575 | 0.1568 | 0.1403 | 0.1389 | 0.1415 |
Autoprice | 0.1927 | 0.1804 | 0.1712 | 0.1669 | 0.1600 |
Balloon | 0.0157 | 0.0175 | 0.0420 | 0.0097 | 0.0072 |
Baskball | 0.2631 | 0.2669 | 0.2623 | 0.2576 | 0.2477 |
Cleveland | 0.4538 | 0.4359 | 0.4370 | 0.4281 | 0.4204 |
Cloud | 0.1458 | 0.1761 | 0.1240 | 0.1234 | 0.1168 |
Concrete CS | 0.1249 | 0.1196 | 0.1008 | 0.0792 | 0.1122 |
Diabetes | 0.3787 | 0.4635 | 0.3456 | 0.3242 | 0.2938 |
Housing | 0.1972 | 0.2305 | 0.1458 | 0.1478 | 0.1754 |
Machine CPU | 0.0474 | 0.0933 | 0.0731 | 0.0398 | 0.0395 |
Mg | 0.2748 | 0.2722 | 0.2719 | 0.2657 | 0.2678 |
Quake | 0.3475 | 0.3466 | 0.3420 | 0.3441 | 0.3436 |
Servo | 0.2300 | 0.1979 | 0.1819 | 0.1785 | 0.1694 |
Strike | 0.1517 | 0.1614 | 0.1541 | 0.1423 | 0.1320 |
Wisconsin B.C. | 0.0782 | 0.0269 | 0.0670 | 0.0264 | 0.0210 |
Datasets | ELM (s) | OP-ELM (s) | SVR(s) | LSSVR(s) | FELM (s) | |||||
Train | Test | Train | Test | Train | Test | Train | Test | Train | Test | |
Abalone | 0.0020 | 0.0008 | 0.5504 | 0.0019 | 0.3782 | 0.0756 | 0.5977 | 0.0791 | 0.0021 | 0.0006 |
Mpg | 0.0004 | 0.0004 | 0.0405 | 0.0008 | 0.0040 | 0.0007 | 0.0066 | 0.0017 | 0.0006 | 0.0002 |
Autoprice | 0.0002 | 0.0004 | 0.0248 | 0.0003 | 0.0010 | 0.0002 | 0.0037 | 0.0012 | 0.0007 | 0.0004 |
Balloon | 0.0006 | 0.0005 | 0.1419 | 0.0014 | 0.0015 | 0.0003 | 0.1311 | 0.0203 | 0.0011 | 0.0003 |
Baskball | 0.0001 | 0.0004 | 0.0108 | 0.0002 | 0.0006 | 0.0001 | 0.0027 | 0.0012 | 0.0001 | 0.0001 |
Cleveland | 0.0003 | 0.0007 | 0.0325 | 0.0003 | 0.0047 | 0.0010 | 0.0052 | 0.0018 | 0.0004 | 0.0002 |
Cloud | 0.0001 | 0.0004 | 0.0136 | 0.0004 | 0.0005 | 0.0001 | 0.0027 | 0.0011 | 0.0002 | 0.0001 |
Concrete CS | 0.0024 | 0.0008 | 0.1046 | 0.0025 | 0.0427 | 0.0028 | 0.0355 | 0.0077 | 0.0033 | 0.0009 |
Diabetes | 0.0001 | 0.0004 | 0.0048 | 0.0002 | 0.0002 | 0.0000 | 0.0024 | 0.0011 | 0.0001 | 0.0000 |
Housing | 0.0013 | 0.0005 | 0.0516 | 0.0015 | 0.0099 | 0.0012 | 0.0073 | 0.0029 | 0.0011 | 0.0003 |
Machine CPU | 0.0003 | 0.0004 | 0.0260 | 0.0004 | 0.0004 | 0.0001 | 0.0038 | 0.0020 | 0.0002 | 0.0001 |
Mg | 0.0016 | 0.0006 | 0.1538 | 0.0029 | 0.0628 | 0.0114 | 0.0634 | 0.0089 | 0.0038 | 0.0016 |
Quake | 0.0010 | 0.0005 | 0.2497 | 0.0005 | 0.0656 | 0.0130 | 0.1651 | 0.0186 | 0.0023 | 0.0006 |
Servo | 0.0002 | 0.0004 | 0.0268 | 0.0008 | 0.0027 | 0.0002 | 0.0035 | 0.0011 | 0.0004 | 0.0002 |
Strike | 0.0002 | 0.0003 | 0.0577 | 0.0003 | 0.0139 | 0.0012 | 0.0125 | 0.0039 | 0.0004 | 0.0001 |
Wisconsin B.C. | 0.0012 | 0.0004 | 0.0289 | 0.0010 | 0.0007 | 0.0001 | 0.0041 | 0.0019 | 0.0014 | 0.0006 |
Datasets | ELM | OP-ELM | SVR | LSSVR | FELM | |||||
Abalone | 0.0068 | 0.2685 | 0.0040 | 0.0043 | 0.0043 | |||||
Mpg | 0.0134 | 0.0176 | 0.0135 | 0.0130 | 0.0123 | |||||
Autoprice | 0.0340 | 0.0556 | 0.0364 | 0.0397 | 0.0240 | |||||
Balloon | 0.0102 | 0.0121 | 0.0121 | 0.0019 | 0.0005 | |||||
Baskball | 0.0285 | 0.0288 | 0.0293 | 0.0272 | 0.0278 | |||||
Cleveland | 0.0312 | 0.0356 | 0.0351 | 0.0343 | 0.0316 | |||||
Cloud | 0.0309 | 0.0672 | 0.0444 | 0.0363 | 0.0317 | |||||
Concrete CS | 0.0083 | 0.0085 | 0.0085 | 0.0084 | 0.0050 | |||||
Diabetes | 0.1095 | 0.3457 | 0.0575 | 0.0524 | 0.0443 | |||||
Housing | 0.0356 | 0.1718 | 0.0192 | 0.0189 | 0.0197 | |||||
Machine CPU | 0.0620 | 0.0623 | 0.0191 | 0.0249 | 0.0221 | |||||
Mg | 0.0100 | 0.0088 | 0.0105 | 0.0079 | 0.0083 | |||||
Quake | 0.0088 | 0.0093 | 0.0112 | 0.0090 | 0.0085 | |||||
Servo | 0.0407 | 0.0442 | 0.0584 | 0.0405 | 0.0423 | |||||
Strike | 0.0383 | 0.0312 | 0.0305 | 0.0380 | 0.0331 | |||||
Wisconsin B.C. | 0.0183 | 0.0091 | 0.0097 | 0.0030 | 0.0041 |
Regressor (C, γ) | Time(s) | Testing | SVs/nodes | ||
Training | Testing | Rate (%) | DEV (%) | ||
FELM | 0.0704 | 0.0017 | 98.82 | 0.62 | 73 |
ELM (sig) | 0.0054 | 0.0013 | 97.29 | 0.86 | 155 |
ELM (RBF) | 0.0020 | 0.0013 | 97.38 | 0.90 | 75 |
OP-ELM | 0.0704 | 0.0015 | 97.24 | 1.06 | 49 |
SVM (24, 26) | 0.0188 | 0.0033 | 97.29 | 0.87 | 269.86 |
LSSVM (23, 2-4) | 0.0219 | 0.0062 | 97.89 | 0.78 | 666 |
Datasets | #Train | #Test | #Total | #(Featdures/Classes) |
Iris | 100 | 50 | 150 | 4/3 |
WDBC | 379 | 190 | 569 | 30/2 |
Diabetes | 512 | 256 | 768 | 8/2 |
Wine | 118 | 60 | 178 | 13/3 |
Algorithm | FELM | SVM | LSSVM | ELM | OP-ELM | |||
basic functions | (C, γ) | # SVs | (C, γ) | # SVs | # nodes | init | final | |
Iris | $ \{1, {e}^{-x}, {e}^{x}, {e}^{-2x}, {e}^{2x}\} $ | (26, 2-5) | 20.48 | (2-4, 23) | 100 | 15 | 90 | 16.80 |
WDBC | $ \{1, {e}^{-3x}, {e}^{-5x}\} $ | (22, 2-3) | 55.04 | (22, 25) | 379 | 65 | 90 | 32.90 |
Diabetes | $ \{{e}^{-x}, {e}^{x}\} $ | (24, 2-5) | 281.7 | (24, 23) | 512 | 15 | 90 | 19.90 |
Wine | $ \{{e}^{-x}, {e}^{-2x}\} $ | (2-1, 2-2) | 63.64 | (2-3, 23) | 118 | 15 | 90 | 33.40 |
Datasets | ELM | OP-ELM | SVM | LSSVM | FELM |
Iris | 95.36 | 95.36 | 96.40 | 91.32 | 97.76 |
WDBC | 95.16 | 95.83 | 97.78 | 95.01 | 97.81 |
Diabetes | 76.83 | 76.89 | 77.59 | 73.70 | 77.85 |
Wine | 96.03 | 95.73 | 97.90 | 94.50 | 98.87 |
Datasets | ELM (s) | OP-ELM (s) | SVM (s) | LSSVM (s) | FELM (s) | |||||
Train | Test | Train | Test | Train | Test | Train | Test | Train | Test | |
Iris | 0.0002 | 0.0005 | 0.0217 | 0.0004 | 0.0004 | 0.0001 | 0.0022 | 0.0036 | 0.0002 | 0.0001 |
WDBC | 0.0013 | 0.0009 | 0.0519 | 0.0010 | 0.0047 | 0.0010 | 0.0079 | 0.0161 | 0.0043 | 0.0010 |
Diabetes | 0.0004 | 0.0007 | 0.0716 | 0.0006 | 0.0199 | 0.0044 | 0.0120 | 0.0078 | 0.0005 | 0.0002 |
Wine | 0.0002 | 0.0005 | 0.0255 | 0.0007 | 0.0011 | 0.0003 | 0.0024 | 0.0076 | 0.0004 | 0.0002 |
Datasets | ELM | OP-ELM | SVM | LSSVM | FELM |
Iris | 3.19 | 2.95 | 2.62 | 3.89 | 1.84 |
WDBC | 1.63 | 1.45 | 0.91 | 1.56 | 1.00 |
Diabetes | 2.21 | 2.15 | 2.15 | 3.15 | 2.17 |
Wine | 3.31 | 2.43 | 1.64 | 2.82 | 1.41 |
Datasets | Basic functions |
Types A and B | $ \{1, {x}^{2}, {x}^{4}, {x}^{6}, {x}^{8}, {x}^{10}, \mathit{cos}(x), \mathit{cos}(3x), \mathit{cos}(5x)\} $ |
Type C | $ \{x, {x}^{2}, {x}^{3}, \mathit{sin}(\frac{x}{2}), \mathit{cos}(\frac{x}{2}), \mathit{sin}(x), \mathit{cos}(x), \mathit{cos}(2x), {e}^{\frac{x}{3}}, {e}^{-\frac{x}{3}}, {e}^{\frac{x}{5}}, \mathit{cos}(\frac{\pi }{2}(1+x)), $ $ \mathit{cos}(\pi (1+x)), \mathit{cos}(2\pi (1+x\left)\right), \mathit{cos}(3\pi (1+x)), \mathit{arctan}\left(\frac{x}{7}\right), \mathit{arctan}(\frac{x}{6}), $ $ \mathit{arctan}(\frac{x}{5}), \mathit{arctan}(\frac{x}{4}), \mathit{arctan}(\frac{x}{3}), \mathit{arctan}(\frac{x}{2}), \mathit{arctan}(x), \mathit{arctan}(2x), $ $ \mathit{arctan}(3x), \mathit{arctan}(4x), \mathit{arctan}(5x\left)\right\} $ |
Type D | $ \{x, \mathit{sin}(\frac{x}{2}), \mathit{cos}(\frac{x}{2}), \mathit{cos}(\frac{x}{3}), \mathit{sin}(x), \mathit{sin}(2x), \mathit{cos}(x), \mathit{cos}(2x), \mathit{cos}(3x), {e}^{\frac{x}{3}}, {e}^{-\frac{x}{3}}, {e}^{\frac{x}{5}}, $ $ \mathit{cos}(\frac{\pi }{2}(1+x)), \mathit{cos}(\pi (1+x\left)\right), \mathit{cos}(2\pi (1+x)), \mathit{cos}(3\pi (1+x\left)\right)\} $ |
Noise | Regressor | Time(s) | Testing | SVs/nodes | ||
(C, γ) | Training | Testing | RMS | DEV | ||
Type A | FELM | 0.0024 | 0.0014 | 0.0065 | 0.0006 | 9 |
ELM | 0.0141 | 0.0047 | 0.0065 | 0.0012 | 20 | |
OP-ELM | 0.8194 | 0.0025 | 0.0060 | 0.0010 | 15.50 | |
SVR (27, 2-2) | 2.2711 | 0.3150 | 0.0145 | 0.0020 | 1613.46 | |
LSSVR (28, 20) | 2.0607 | 0.3824 | 0.0087 | 0.0010 | 5000 | |
Type B | FELM | 0.0027 | 0.0014 | 0.0098 | 0.0015 | 9 |
ELM | 0.0138 | 0.0084 | 0.0127 | 0.0021 | 20 | |
OP-ELM | 0.7905 | 0.0024 | 0.0115 | 0.0024 | 14.70 | |
SVR (22, 2-2) | 1.7248 | 0.6016 | 0.0188 | 0.0026 | 3089.38 | |
LSSVR (28, 20) | 1.9938 | 0.3710 | 0.0177 | 0.0018 | 5000 | |
Type C | FELM | 0.0053 | 0.0024 | 0.0290 | 0.0011 | 26 |
ELM | 0.4159 | 0.0194 | 0.0328 | 0.0012 | 90 | |
OP-ELM | 0.8390 | 0.0031 | 0.0686 | 0.0029 | 19.20 | |
SVR (28, 2-1) | 35.1289 | 0.5969 | 0.0283 | 0.0014 | 3107.00 | |
LSSVR (28, 2-8) | 2.2814 | 0.3572 | 0.0601 | 0.0022 | 5000 | |
Type D | FELM | 0.0042 | 0.0015 | 0.0384 | 0.0019 | 16 |
ELM | 0.4656 | 0.0209 | 0.0429 | 0.0032 | 100 | |
OP-ELM | 0.7586 | 0.0027 | 0.0725 | 0.0046 | 18.10 | |
SVR (27, 2-1) | 15.4804 | 0.8234 | 0.0434 | 0.0040 | 4020.82 | |
LSSVR (28, 20) | 1.9812 | 0.3792 | 0.0396 | 0.0037 | 5000 |
Datasets | #Train | #Test | #Total | #Features |
Abalone | 2784 | 1393 | 4177 | 8 |
Mpg | 261 | 131 | 392 | 7 |
Autoprice | 106 | 53 | 159 | 15 |
Balloon | 1334 | 667 | 2001 | 2 |
Baskball | 64 | 32 | 96 | 4 |
Cleveland | 202 | 101 | 303 | 13 |
Cloud | 72 | 36 | 108 | 9 |
Concrete CS | 686 | 344 | 1030 | 8 |
Diabetes | 28 | 15 | 43 | 2 |
Housing | 337 | 169 | 506 | 13 |
Machine CPU | 139 | 70 | 209 | 6 |
Mg | 923 | 462 | 1385 | 6 |
Quake | 1452 | 726 | 2178 | 3 |
Servo | 111 | 56 | 167 | 4 |
Strike | 416 | 209 | 625 | 6 |
Wisconsin B.C. | 129 | 65 | 194 | 32 |
Datasets | Basic functions | Datasets | Basic functions |
Abalone | $ \{{e}^{-x}, {e}^{x}\} $ | Diabetes | $ \{{e}^{-x}, {e}^{x}\} $ |
Mpg | $ \{1, {e}^{-\frac{x}{2}}, {e}^{-\frac{x}{3}}, {e}^{-\frac{x}{4}}\} $ | Housing | $ \{1, {e}^{-x}, {e}^{x}\} $ |
Autoprice | $ \{1, {e}^{\frac{x}{4}}\} $ | Machine CPU | $ \{1, {e}^{-x}, {e}^{x}\} $ |
Balloon | $ \{{e}^{-x}, {e}^{x}, {e}^{-2x}, {e}^{2x}, 1, x, x, {x}^{2}, {x}^{3}\} $ | Mg | $ \left\{\mathit{sin}(3x\right), \mathit{sin}(5x), {x}^{3}, {e}^{\frac{x}{2}}, {e}^{-\frac{x}{2}}, {e}^{\frac{x}{5}}, {e}^{-\frac{x}{5}}\} $ |
Baskball | $ \{{e}^{-\frac{x}{5}}, {e}^{-\frac{x}{6}}\} $ | Quake | $ \{1, \mathit{sin}(x), \mathit{sin}(3x), \mathit{sin}(5x), x, {x}^{3}, {x}^{5}, {e}^{-x}, {e}^{x}\} $ |
Cleveland | $ \{1, {e}^{-\frac{x}{2}}\} $ | Servo | $ \{1, {e}^{-3x}, {e}^{3x}, \mathit{sin}(\frac{x}{4}), \mathit{cos}(\frac{x}{4}), \mathit{sin}(\frac{x}{6}), \mathit{cos}(\frac{x}{6}\left)\right\} $ |
Cloud | $ \{1, {e}^{-\frac{x}{5}}\} $ | Strike | $ \{1, \mathit{sin}(\frac{x}{4})\} $ |
Concrete CS | $ \{1, \mathit{sin}(x), \mathit{sin}(3x), \mathit{sin}(5x), x, {x}^{2}, {x}^{3}, {e}^{-x}, {e}^{x}\} $ | Wisconsin B.C. | $ \left\{\mathit{sin}(\frac{x}{4}\right), \mathit{cos}(\frac{x}{4}\left)\right\} $ |
Algorithm | SVR | LSSVR | ELM | OP-ELM | ||||
$ \varepsilon $ | $ (C, \gamma) $ | # SVs | $ (C, \gamma) $ | # SVs | # nodes | init | final | |
Abalone | $ {2^{ - 5}} $ | $ ({2}^{1}, {2}^{0}) $ | 1051.28 | $ ({2}^{5}, {2}^{1}) $ | 2784 | 35 | 100 | 33 |
Mpg | $ {2^{ - 8}} $ | $ ({2}^{0}, {2}^{-1}) $ | 104.08 | $ ({2}^{2}, {2}^{1}) $ | 261 | 30 | 100 | 36 |
Autoprice | $ {2^{ - 6}} $ | $ ({2}^{3}, {2}^{-3}) $ | 42.66 | $ ({2}^{6}, {2}^{2}) $ | 106 | 15 | 100 | 14 |
Balloon | $ {2^{ - 3}} $ | $ ({2}^{7}, {2}^{-2}) $ | 6 | $ ({2}^{8}, {2}^{\text{-}1}) $ | 1334 | 20 | 100 | 41 |
Baskball | $ {2^{ - 8}} $ | $ ({2}^{3}, {2}^{-4}) $ | 44.38 | $ ({2}^{2}, {2}^{2}) $ | 64 | 10 | 62 | 7 |
Cleveland | $ {2^{ - 6}} $ | $ ({2}^{1}, {2}^{-7}) $ | 142.72 | $ ({2}^{4}, {2}^{8}) $ | 198 | 20 | 100 | 9 |
Cloud | $ {2^{ - 6}} $ | $ ({2}^{7}, {2}^{-8}) $ | 20.62 | $ ({2}^{8}, {2}^{6}) $ | 72 | 15 | 70 | 20 |
Concrete CS | $ {2^{ - 8}} $ | $ ({2}^{4}, {2}^{-1}) $ | 174.42 | $ ({2}^{7}, {2}^{\text{-1}}) $ | 686 | 90 | 100 | 87 |
Diabetes | $ {2^{ - 5}} $ | $ ({2}^{1}, {2}^{-2}) $ | 21.12 | $ ({2}^{2}, {2}^{0}) $ | 28 | 5 | 26 | 6 |
Housing | $ {2^{ - 8}} $ | $ ({2}^{3}, {2}^{-2}) $ | 123.42 | $ ({2}^{8}, {2}^{2}) $ | 337 | 80 | 100 | 58 |
Machine CPU | $ {2^{ - 7}} $ | $ ({2}^{8}, {2}^{-6}) $ | 8.74 | $ ({2}^{7}, {2}^{3}) $ | 139 | 30 | 100 | 15 |
Mg | $ {2^{ - 6}} $ | $ ({2}^{1}, {2}^{0}) $ | 592.84 | $ ({2}^{1}, {2}^{\text{-2}}) $ | 923 | 70 | 100 | 84 |
Quake | $ {2^{ - 1}} $ | $ ({2}^{-4}, {2}^{7}) $ | 493.28 | $ ({2}^{0}, {2}^{8}) $ | 1452 | 30 | 100 | 11 |
Servo | $ {2^{ - 7}} $ | $ ({2}^{5}, {2}^{-3}) $ | 46.36 | $ ({2}^{6}, {2}^{1}) $ | 111 | 25 | 100 | 41 |
Strike | $ {2^{ - 7}} $ | $ ({2}^{8}, {2}^{-8}) $ | 88.74 | $ ({2}^{-2}, {2}^{2}) $ | 416 | 10 | 100 | 11 |
Wisconsin B.C. | $ {2^{ - 8}} $ | $ ({2}^{7}, {2}^{-8}) $ | 14.92 | $ ({2}^{8}, {2}^{6}) $ | 129 | 75 | 100 | 51 |
Datasets | ELM | OP-ELM | SVR | LSSVR | FELM |
Abalone | 0.1566 | 0.2186 | 0.1513 | 0.1501 | 0.1525 |
Mpg | 0.1575 | 0.1568 | 0.1403 | 0.1389 | 0.1415 |
Autoprice | 0.1927 | 0.1804 | 0.1712 | 0.1669 | 0.1600 |
Balloon | 0.0157 | 0.0175 | 0.0420 | 0.0097 | 0.0072 |
Baskball | 0.2631 | 0.2669 | 0.2623 | 0.2576 | 0.2477 |
Cleveland | 0.4538 | 0.4359 | 0.4370 | 0.4281 | 0.4204 |
Cloud | 0.1458 | 0.1761 | 0.1240 | 0.1234 | 0.1168 |
Concrete CS | 0.1249 | 0.1196 | 0.1008 | 0.0792 | 0.1122 |
Diabetes | 0.3787 | 0.4635 | 0.3456 | 0.3242 | 0.2938 |
Housing | 0.1972 | 0.2305 | 0.1458 | 0.1478 | 0.1754 |
Machine CPU | 0.0474 | 0.0933 | 0.0731 | 0.0398 | 0.0395 |
Mg | 0.2748 | 0.2722 | 0.2719 | 0.2657 | 0.2678 |
Quake | 0.3475 | 0.3466 | 0.3420 | 0.3441 | 0.3436 |
Servo | 0.2300 | 0.1979 | 0.1819 | 0.1785 | 0.1694 |
Strike | 0.1517 | 0.1614 | 0.1541 | 0.1423 | 0.1320 |
Wisconsin B.C. | 0.0782 | 0.0269 | 0.0670 | 0.0264 | 0.0210 |
Datasets | ELM (s) | OP-ELM (s) | SVR(s) | LSSVR(s) | FELM (s) | |||||
Train | Test | Train | Test | Train | Test | Train | Test | Train | Test | |
Abalone | 0.0020 | 0.0008 | 0.5504 | 0.0019 | 0.3782 | 0.0756 | 0.5977 | 0.0791 | 0.0021 | 0.0006 |
Mpg | 0.0004 | 0.0004 | 0.0405 | 0.0008 | 0.0040 | 0.0007 | 0.0066 | 0.0017 | 0.0006 | 0.0002 |
Autoprice | 0.0002 | 0.0004 | 0.0248 | 0.0003 | 0.0010 | 0.0002 | 0.0037 | 0.0012 | 0.0007 | 0.0004 |
Balloon | 0.0006 | 0.0005 | 0.1419 | 0.0014 | 0.0015 | 0.0003 | 0.1311 | 0.0203 | 0.0011 | 0.0003 |
Baskball | 0.0001 | 0.0004 | 0.0108 | 0.0002 | 0.0006 | 0.0001 | 0.0027 | 0.0012 | 0.0001 | 0.0001 |
Cleveland | 0.0003 | 0.0007 | 0.0325 | 0.0003 | 0.0047 | 0.0010 | 0.0052 | 0.0018 | 0.0004 | 0.0002 |
Cloud | 0.0001 | 0.0004 | 0.0136 | 0.0004 | 0.0005 | 0.0001 | 0.0027 | 0.0011 | 0.0002 | 0.0001 |
Concrete CS | 0.0024 | 0.0008 | 0.1046 | 0.0025 | 0.0427 | 0.0028 | 0.0355 | 0.0077 | 0.0033 | 0.0009 |
Diabetes | 0.0001 | 0.0004 | 0.0048 | 0.0002 | 0.0002 | 0.0000 | 0.0024 | 0.0011 | 0.0001 | 0.0000 |
Housing | 0.0013 | 0.0005 | 0.0516 | 0.0015 | 0.0099 | 0.0012 | 0.0073 | 0.0029 | 0.0011 | 0.0003 |
Machine CPU | 0.0003 | 0.0004 | 0.0260 | 0.0004 | 0.0004 | 0.0001 | 0.0038 | 0.0020 | 0.0002 | 0.0001 |
Mg | 0.0016 | 0.0006 | 0.1538 | 0.0029 | 0.0628 | 0.0114 | 0.0634 | 0.0089 | 0.0038 | 0.0016 |
Quake | 0.0010 | 0.0005 | 0.2497 | 0.0005 | 0.0656 | 0.0130 | 0.1651 | 0.0186 | 0.0023 | 0.0006 |
Servo | 0.0002 | 0.0004 | 0.0268 | 0.0008 | 0.0027 | 0.0002 | 0.0035 | 0.0011 | 0.0004 | 0.0002 |
Strike | 0.0002 | 0.0003 | 0.0577 | 0.0003 | 0.0139 | 0.0012 | 0.0125 | 0.0039 | 0.0004 | 0.0001 |
Wisconsin B.C. | 0.0012 | 0.0004 | 0.0289 | 0.0010 | 0.0007 | 0.0001 | 0.0041 | 0.0019 | 0.0014 | 0.0006 |
Datasets | ELM | OP-ELM | SVR | LSSVR | FELM | |||||
Abalone | 0.0068 | 0.2685 | 0.0040 | 0.0043 | 0.0043 | |||||
Mpg | 0.0134 | 0.0176 | 0.0135 | 0.0130 | 0.0123 | |||||
Autoprice | 0.0340 | 0.0556 | 0.0364 | 0.0397 | 0.0240 | |||||
Balloon | 0.0102 | 0.0121 | 0.0121 | 0.0019 | 0.0005 | |||||
Baskball | 0.0285 | 0.0288 | 0.0293 | 0.0272 | 0.0278 | |||||
Cleveland | 0.0312 | 0.0356 | 0.0351 | 0.0343 | 0.0316 | |||||
Cloud | 0.0309 | 0.0672 | 0.0444 | 0.0363 | 0.0317 | |||||
Concrete CS | 0.0083 | 0.0085 | 0.0085 | 0.0084 | 0.0050 | |||||
Diabetes | 0.1095 | 0.3457 | 0.0575 | 0.0524 | 0.0443 | |||||
Housing | 0.0356 | 0.1718 | 0.0192 | 0.0189 | 0.0197 | |||||
Machine CPU | 0.0620 | 0.0623 | 0.0191 | 0.0249 | 0.0221 | |||||
Mg | 0.0100 | 0.0088 | 0.0105 | 0.0079 | 0.0083 | |||||
Quake | 0.0088 | 0.0093 | 0.0112 | 0.0090 | 0.0085 | |||||
Servo | 0.0407 | 0.0442 | 0.0584 | 0.0405 | 0.0423 | |||||
Strike | 0.0383 | 0.0312 | 0.0305 | 0.0380 | 0.0331 | |||||
Wisconsin B.C. | 0.0183 | 0.0091 | 0.0097 | 0.0030 | 0.0041 |
Regressor (C, γ) | Time(s) | Testing | SVs/nodes | ||
Training | Testing | Rate (%) | DEV (%) | ||
FELM | 0.0704 | 0.0017 | 98.82 | 0.62 | 73 |
ELM (sig) | 0.0054 | 0.0013 | 97.29 | 0.86 | 155 |
ELM (RBF) | 0.0020 | 0.0013 | 97.38 | 0.90 | 75 |
OP-ELM | 0.0704 | 0.0015 | 97.24 | 1.06 | 49 |
SVM (24, 26) | 0.0188 | 0.0033 | 97.29 | 0.87 | 269.86 |
LSSVM (23, 2-4) | 0.0219 | 0.0062 | 97.89 | 0.78 | 666 |
Datasets | #Train | #Test | #Total | #(Featdures/Classes) |
Iris | 100 | 50 | 150 | 4/3 |
WDBC | 379 | 190 | 569 | 30/2 |
Diabetes | 512 | 256 | 768 | 8/2 |
Wine | 118 | 60 | 178 | 13/3 |
Algorithm | FELM | SVM | LSSVM | ELM | OP-ELM | |||
basic functions | (C, γ) | # SVs | (C, γ) | # SVs | # nodes | init | final | |
Iris | $ \{1, {e}^{-x}, {e}^{x}, {e}^{-2x}, {e}^{2x}\} $ | (26, 2-5) | 20.48 | (2-4, 23) | 100 | 15 | 90 | 16.80 |
WDBC | $ \{1, {e}^{-3x}, {e}^{-5x}\} $ | (22, 2-3) | 55.04 | (22, 25) | 379 | 65 | 90 | 32.90 |
Diabetes | $ \{{e}^{-x}, {e}^{x}\} $ | (24, 2-5) | 281.7 | (24, 23) | 512 | 15 | 90 | 19.90 |
Wine | $ \{{e}^{-x}, {e}^{-2x}\} $ | (2-1, 2-2) | 63.64 | (2-3, 23) | 118 | 15 | 90 | 33.40 |
Datasets | ELM | OP-ELM | SVM | LSSVM | FELM |
Iris | 95.36 | 95.36 | 96.40 | 91.32 | 97.76 |
WDBC | 95.16 | 95.83 | 97.78 | 95.01 | 97.81 |
Diabetes | 76.83 | 76.89 | 77.59 | 73.70 | 77.85 |
Wine | 96.03 | 95.73 | 97.90 | 94.50 | 98.87 |
Datasets | ELM (s) | OP-ELM (s) | SVM (s) | LSSVM (s) | FELM (s) | |||||
Train | Test | Train | Test | Train | Test | Train | Test | Train | Test | |
Iris | 0.0002 | 0.0005 | 0.0217 | 0.0004 | 0.0004 | 0.0001 | 0.0022 | 0.0036 | 0.0002 | 0.0001 |
WDBC | 0.0013 | 0.0009 | 0.0519 | 0.0010 | 0.0047 | 0.0010 | 0.0079 | 0.0161 | 0.0043 | 0.0010 |
Diabetes | 0.0004 | 0.0007 | 0.0716 | 0.0006 | 0.0199 | 0.0044 | 0.0120 | 0.0078 | 0.0005 | 0.0002 |
Wine | 0.0002 | 0.0005 | 0.0255 | 0.0007 | 0.0011 | 0.0003 | 0.0024 | 0.0076 | 0.0004 | 0.0002 |
Datasets | ELM | OP-ELM | SVM | LSSVM | FELM |
Iris | 3.19 | 2.95 | 2.62 | 3.89 | 1.84 |
WDBC | 1.63 | 1.45 | 0.91 | 1.56 | 1.00 |
Diabetes | 2.21 | 2.15 | 2.15 | 3.15 | 2.17 |
Wine | 3.31 | 2.43 | 1.64 | 2.82 | 1.41 |