Research article

Hybrid feature selection techniques using automatic modification

  • Published: 27 February 2026
  • MSC : 62R07, 68T05, 68T09, 68Q32, 94A16

  • Feature selection (FS) in huge data sets is a critical aspect of machine learning that involves choosing the most relevant features. It plays a significant role in improving the model's performance, reducing overfitting, and enhancing the interpretability. In this paper, we construct an automatic modification of the basic FS techniques such as Lasso, Deep Neural Network (DNN), Random Forest (RF) and a Principal Component Analysis (PCA), based on the K-means clustering method and the Silhouette score method, instead of visualization or threshold based methods based on background knowledge. Additionally, the construction of two hybrid methods is proposed, the purpose of which is to exploit the advantages offered by a number of feature seledtion methods: the first is the score method to leverage multiple types of methods; and the second is the refinement method to enhance the outcomes of one method by adapting them to another method. Moreover, to evaluate the efficiency of the FS method, a linear regression and a DNN nonlinear regression are employed to minimize the dependency of the choice of the regression. Through numerical tests, we show that the automatic modification of the conventional methods can generate a convenient way to set the criterion. Additionally, based on the results that are derived from both the linear regression and the DNN regression, the hybrid FS techniques can more accurately perform in both the linear and nonlinear regressions without any dependency on the data.

    Citation: Sunyoung Bu, Inmi Kim. Hybrid feature selection techniques using automatic modification[J]. AIMS Mathematics, 2026, 11(2): 5152-5171. doi: 10.3934/math.2026210

    Related Papers:

  • Feature selection (FS) in huge data sets is a critical aspect of machine learning that involves choosing the most relevant features. It plays a significant role in improving the model's performance, reducing overfitting, and enhancing the interpretability. In this paper, we construct an automatic modification of the basic FS techniques such as Lasso, Deep Neural Network (DNN), Random Forest (RF) and a Principal Component Analysis (PCA), based on the K-means clustering method and the Silhouette score method, instead of visualization or threshold based methods based on background knowledge. Additionally, the construction of two hybrid methods is proposed, the purpose of which is to exploit the advantages offered by a number of feature seledtion methods: the first is the score method to leverage multiple types of methods; and the second is the refinement method to enhance the outcomes of one method by adapting them to another method. Moreover, to evaluate the efficiency of the FS method, a linear regression and a DNN nonlinear regression are employed to minimize the dependency of the choice of the regression. Through numerical tests, we show that the automatic modification of the conventional methods can generate a convenient way to set the criterion. Additionally, based on the results that are derived from both the linear regression and the DNN regression, the hybrid FS techniques can more accurately perform in both the linear and nonlinear regressions without any dependency on the data.



    加载中


    [1] T. Abeel, T. Helleputte, Y. V. Peer, P. Dupont, Y. Saeys, Robust biomarker identification for cancer diagnosis with ensemble feature selection methods, Bioinformatics, 26 (2009), 392–398. https://doi.org/10.1093/bioinformatics/btp630 doi: 10.1093/bioinformatics/btp630
    [2] H. Ahn, H. Moon, M. J. Fazzari, N. Lim, J. J. Chen, R. L. Kodell, Classification by ensembles from random partitions of high-dimensional data, Comput. Stat. Data Anal., 51 (2007), 6166–6179. https://doi.org/10.1016/j.csda.2006.12.043 doi: 10.1016/j.csda.2006.12.043
    [3] K. W. De Bock, K. Coussement, D. Van den Poel, Ensemble classification based on generalized additive models, Comput. Stat. Data Anal., 54 (2010), 1535–1546. https://doi.org/10.1016/j.csda.2009.12.013 doi: 10.1016/j.csda.2009.12.013
    [4] D. Cai, C. Zhang, X. He, Unsupervised feature selection for multi-cluster data, ACM, 2010,333–342. https://doi.org/10.1145/1835804.1835848
    [5] J. Chin, M. Andri, H. Habibollah, H. Nuzly, Supervised, unsupervised, and semi supervised feature selection: A review on gene selection, IEEE/ACM TCBB., 13 (2016), 971–989. https://doi.org/10.1109/TCBB.2015.2478454 doi: 10.1109/TCBB.2015.2478454
    [6] J. Cai, J. Luo, S. Wang, S. Yang, Feature selection in machine learning: A new perspective, Neurocomputing, 300 (2018), 70–79. https://doi.org/10.1016/j.neucom.2017.11.077 doi: 10.1016/j.neucom.2017.11.077
    [7] D. Effrosynidis, A. Arampatzis, An evaluation of feature selection methods for environmental data, Ecol. Inform., 61 (2021), 101224. https://doi.org/10.1016/j.ecoinf.2021.101224 doi: 10.1016/j.ecoinf.2021.101224
    [8] T. K. Ho, The random subspace method for constructing decision forests, IEEE. Trans. Pattern Anal. Mach. Intell., 20 (1998), 832–844.
    [9] N. Hoque, M. Singh, D. K. Bhattacharyya, EFS-MI: An ensemble feature selection method for classification, Complex Intell. Syst., 4 (2018), 105–118. https://doi.org/10.1007/s40747-017-0060-x doi: 10.1007/s40747-017-0060-x
    [10] A. Hashemi, M. B. Dowlatshahi, H. Nezamabadi-pour, A pareto-based ensemble of feature selection algorithms, Expert Syst. Appl., 180 (2021), 115130. https://doi.org/10.1016/j.eswa.2021.115130 doi: 10.1016/j.eswa.2021.115130
    [11] M. H. Law, M. A. Figueiredo, A. K. Jain, Simultaneous feature selection and clustering using mixture models, IEEE Trans. Pattern Anal. Mach. Intell., 26 (2004), 1154–1166. https://doi.org/10.1109/TPAMI.2004.71 doi: 10.1109/TPAMI.2004.71
    [12] H. Liu, L. Liu, H. Zhang, Ensemble gene selection for cancer classification, Pattern Recogn., 43 (2010), 2763–2772. https://doi.org/10.1016/j.patcog.2010.02.008 doi: 10.1016/j.patcog.2010.02.008
    [13] J. Nobre, F. Neves, Combining principal component analysis, discrete wavelet transform and XGBoost to trade in the financial markets, Expert Syst. Appl., 125 (2019), 181–194. https://doi.org/10.1016/j.eswa.2019.01.083 doi: 10.1016/j.eswa.2019.01.083
    [14] E. O. Omuya, G. O. Okeyo, M. W. Kimwele, Feature selection for classification using principal component analysis and information gain, Expert Syst. Appl., 174 (2021), https://doi.org/10.1016/j.eswa.2021.114765
    [15] J. Qiu, X. Xiang, C. Wang, X. Zhang, A multi-objective feature selection approach based on chemical reaction optimization, Appl. Soft Comput., 112 (2021), 107794. https://doi.org/10.1016/j.asoc.2021.107794 doi: 10.1016/j.asoc.2021.107794
    [16] M. Seok, W. Kim, J. Kim, Machine learning for sarcopenia prediction in the elderly using socioeconomic, infrastructure, and quality-of-life data, Healthcare, 11 (2023), 2881. https://doi.org/10.3390/healthcare11212881 doi: 10.3390/healthcare11212881
    [17] R. J. Urbanowicz, R. S. Olson, P. Schmitt, M. Meeker, J. H. Moore, Benchmarking relief-based feature selection methods for bioinformatics data mining, J. Biomed. Inform., 85 (2018), 168–188. https://doi.org/10.1016/j.jbi.2018.07.015 doi: 10.1016/j.jbi.2018.07.015
    [18] S. L. Wang, X. L. Li, J. Fang, Finding minimum gene subsets with heuristic breadth-first search algorithm for robust tumor classification, BMC Bioinform., 13 (2012), 178. https://doi.org/10.1186/1471-2105-13-178 doi: 10.1186/1471-2105-13-178
    [19] B. Xue, M. Zhang, W. N. Browne, X. Yao, A survey on evolutionary computation approaches to feature selection, IEEE Trans. Evol. Comput., 20 (2015), 606–626. https://doi.org/10.1109/TEVC.2015.2504420 doi: 10.1109/TEVC.2015.2504420
    [20] L. Zhang, P. N. Suganthan, Random forests with ensemble of feature spaces, Pattern Recognit., 47 (2014), 3429–3437.
    [21] Z. Zhao, H. Liu, Spectral feature selection for supervised and unsupervised learning, In Proceedings of the 24th international conference on Machine learning, ACM, 2007, 1151–1157.
    [22] Z. Zhao, L. Wang, H. Liu, Efficient spectral feature selection with minimum redundancy, Proceedings of the AAAI Conference on Artificial Intelligence, 24 (2010), 673–678. https://doi.org/10.1609/aaai.v24i1.7671
  • Reader Comments
  • © 2026 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(155) PDF downloads(9) Cited by(0)

Article outline

Figures and Tables

Figures(7)  /  Tables(4)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog