Loading [MathJax]/jax/output/SVG/jax.js
Research article Special Issues

Assessing agreement between permutation and dropout variable importance methods for regression and random forest models

  • Received: 31 December 2023 Revised: 01 June 2024 Accepted: 08 July 2024 Published: 22 July 2024
  • Permutation techniques have been used extensively in machine learning algorithms for evaluating variable importance. In ordinary regression, however, variables are often removed to gauge their importance. In this paper, we compared the results for permuting variables to removing variables in regression to assess relations between these two methods. We compared permute-and-predict (PaP) methods with leave-one-covariate-out (LOCO) techniques. We also compared these results with conventional metrics such as regression coefficient estimates, t-statistics, and random forest out-of-bag (OOB) PaP importance. Our results indicate that permutation importance metrics are practically equivalent to those obtained from removing variables in a regression setting. We demonstrate a strong association between the PaP metrics, true coefficients, and regression-estimated coefficients. We also show a strong relation between the LOCO metrics and the regression t-statistics. Finally, we illustrate that manual PaP methods are not equivalent to the OOB PaP technique and suggest prioritizing the use of manual PaP methods on validation data.

    Citation: Kelvyn Bladen, D. Richard Cutler. Assessing agreement between permutation and dropout variable importance methods for regression and random forest models[J]. Electronic Research Archive, 2024, 32(7): 4495-4514. doi: 10.3934/era.2024203

    Related Papers:

    [1] Shengming Hu, Yongfei Lu, Xuanchi Liu, Cheng Huang, Zhou Wang, Lei Huang, Weihang Zhang, Xiaoyang Li . Stability prediction of circular sliding failure soil slopes based on a genetic algorithm optimization of random forest algorithm. Electronic Research Archive, 2024, 32(11): 6120-6139. doi: 10.3934/era.2024284
    [2] Ye Yu, Zhiyuan Liu . A data-driven on-site injury severity assessment model for car-to-electric-bicycle collisions based on positional relationship and random forest. Electronic Research Archive, 2023, 31(6): 3417-3434. doi: 10.3934/era.2023173
    [3] Zhiyong Qian, Wangsen Xiao, Shulan Hu . The generalization ability of logistic regression with Markov sampling. Electronic Research Archive, 2023, 31(9): 5250-5266. doi: 10.3934/era.2023267
    [4] Tej Bahadur Shahi, Cheng-Yuan Xu, Arjun Neupane, Dayle B. Fleischfresser, Daniel J. O'Connor, Graeme C. Wright, William Guo . Peanut yield prediction with UAV multispectral imagery using a cooperative machine learning approach. Electronic Research Archive, 2023, 31(6): 3343-3361. doi: 10.3934/era.2023169
    [5] Li Yang, Kai Zou, Yuxuan Zou . Graph-based two-level indicator system construction method for smart city information security risk assessment. Electronic Research Archive, 2024, 32(8): 5139-5156. doi: 10.3934/era.2024237
    [6] Xuerui Li, Lican Kang, Yanyan Liu, Yuanshan Wu . Distributed Bayesian posterior voting strategy for massive data. Electronic Research Archive, 2022, 30(5): 1936-1953. doi: 10.3934/era.2022098
    [7] Qiang Guo, Zimeng Zhou, Jie Li, Fengwei Jing . Mechanism- and data-driven algorithms of electrical energy consumption accounting and prediction for medium and heavy plate rolling. Electronic Research Archive, 2025, 33(1): 381-408. doi: 10.3934/era.2025019
    [8] Huimin Bai, Li Li, Yongping Wu, Chen Liu, Zhiqiang Gong, Guolin Feng, Gui-Quan Sun . Study on the influence of meteorological elements on growing season vegetation coverage in Xinjiang, China. Electronic Research Archive, 2022, 30(9): 3463-3480. doi: 10.3934/era.2022177
    [9] Jie Zheng, Yijun Li . Machine learning model of tax arrears prediction based on knowledge graph. Electronic Research Archive, 2023, 31(7): 4057-4076. doi: 10.3934/era.2023206
    [10] Yuhang Liu, Jun Chen, Yuchen Wang, Wei Wang . Interpretable machine learning models for detecting fine-grained transport modes by multi-source data. Electronic Research Archive, 2023, 31(11): 6844-6865. doi: 10.3934/era.2023346
  • Permutation techniques have been used extensively in machine learning algorithms for evaluating variable importance. In ordinary regression, however, variables are often removed to gauge their importance. In this paper, we compared the results for permuting variables to removing variables in regression to assess relations between these two methods. We compared permute-and-predict (PaP) methods with leave-one-covariate-out (LOCO) techniques. We also compared these results with conventional metrics such as regression coefficient estimates, t-statistics, and random forest out-of-bag (OOB) PaP importance. Our results indicate that permutation importance metrics are practically equivalent to those obtained from removing variables in a regression setting. We demonstrate a strong association between the PaP metrics, true coefficients, and regression-estimated coefficients. We also show a strong relation between the LOCO metrics and the regression t-statistics. Finally, we illustrate that manual PaP methods are not equivalent to the OOB PaP technique and suggest prioritizing the use of manual PaP methods on validation data.



    Now we have stepped into 2023, at the beginning of the new year, and together with the Editorial Office of AIMS Biophysics, we wish to testify my sincere gratitude to all authors, members of the editorial board, and reviewers, thanking everyone for their contribution to AIMS Biophysics in 2022, now we hope we could cooperate with you more this year.

    AIMS Biophysics is an international Open Access journal founded in 2014 and devoted to publishing peer-reviewed, high-quality, original papers in the field of biophysics.

    The statistics and metrics of the journal have been increased and remarkable are the following achievements:

    - About 30 publications in 2022 (3 review papers, 23 research articles, 4 editorials);

    - A total of four special issues were issued in 2022, and it is hoped that these four special issues will attract more contributions from authors in 2023. 4 special issues have reached more than 5 papers. In particular, the new topics of the special issues proposed in 2022 have allowed the interplay between different scholars coming from different research fields. AIMS Biophysics invited nine experts to join our editorial board in 2022.

    In the next year 2023, we hope that we can increase the quantity and quality of papers submitted to AIMS Biophysics and constantly seek scholars with good backgrounds to join the editorial board. Shorten the article processing cycle and improve efficiency. Strive to establish a special issue with topical and hot topics, attract more relevant manuscripts, increase citations/papers and total citations, and improve the academic ranking of AIMS Biophysics.

    Finally, we would like to thank all the editorial board members again. The development and progress of the magazine can not be separated from your strong support and time. In the coming year of 2023, we look forward to further strengthening the magazine's strength through continued cooperation.

    Prof. Carlo Bianca co-Editor in Chief

    Prof. Lombardo Domenico co-Editor in Chief

    AIMS Biophysics

    Manuscript statistics (2022)

    Reject rate: 45.3%

    Publication time (median time from submission to online): 76 days

    Type Number
    Review 3
    Research article 23
    Editorial 4

     | Show Table
    DownLoad: CSV

    The top 10 articles with the highest citations for the past five years:

    Title Citations
    Recent progress in Monte Carlo simulation on gold nanoparticle radiosensitization 21
    Charged amino acids may promote coronavirus SARS-CoV-2 fusion with the host cell 16
    Intrinsic blue-green fluorescence in amyloyd fibrils 13
    Interdisciplinary approaches to the study of biological membranes 12
    Functional characterizations of polyethylene terephthalate-degrading cutinase-like enzyme Cut190 mutants using bis(2-hydroxyethyl) terephthalate as the model substrate 10
    Macromolecular sizes of serum albumins in its aqueous solutions 8
    Biochemical and biophysical mechanisms underlying the heart and the brain dialog 6
    Nanoparticle-based delivery platforms for mRNA vaccine development 6
    Thermodynamic, kinetic and docking studies of some unsaturated fatty acids-quercetin derivatives as inhibitors of mushroom tyrosinase 6
    A machine learning algorithm for identifying and tracking bacteria in three dimensions using Digital Holographic Microscopy 6

     | Show Table
    DownLoad: CSV

    The top 10 articles with the highest viewed for the past two years:

    Title Viewed
    Toxicity associated with gadolinium-based contrast-enhanced examinations 5184
    An efficient method of detection of COVID-19 using Mask R-CNN on chest X-Ray images 3621
    Effects of magnetic field treated water on some growth parameters of corn (Zea mays) plants 3076
    A basic introduction to single particles cryo-electron microscopy 1874
    Screening coronavirus and human proteins for sialic acid binding sites using a docking approach 1831
    Sequence–function correlation of the transmembrane domains in NS4B of HCV using a computational approach 1724
    Radioprotective effect of nanoceria and magnetic flower-like iron oxide microparticles on gamma radiation-induced damage in BSA protein 1701
    Chest X-Ray image and pathological data based artificial intelligence enabled dual diagnostic method for multi-stage classification of COVID-19 patients 1631
    Tumor treating fields (TTFs) using uninsulated electrodes induce cell death in human non-small cell lung carcinoma (NSCLC) cells 1452
    Evaluation of dose enhancement with gold nanoparticles in kilovoltage radiotherapy using the new EGS geometry library in Monte Carlo simulation 1396

     | Show Table
    DownLoad: CSV

    • Importance of modelling and simulation in biophysical applications;
    • Electromagnetic waves and biology;
    • Scientific advances in complex systems of biophysical interest;
    • Scientific Advance in Biomembranes and Biomimetic Membranes of Biophysical Interest

    Scientific advances in complex systems of biophysical interest

    https://www.aimspress.com/aimsbpoa/article/6201/special-articles

    Interplay and Multiscale Modeling of Biological Complex Systems

    https://www.aimspress.com/aimsbpoa/article/6057/special-articles

    Methodological trends in structural biology 2021

    https://www.aimspress.com/aimsbpoa/article/5840/special-articles

    Applications of artificial intelligence, mathematical modeling and simulation in medical biophysics

    https://www.aimspress.com/aimsbpoa/article/5637/special-articles

    AIMS Biophysics has a total of 43 editors, 9 of whom were newly invited in 2022.

    In the past year, we published 30 articles, created 4 special issues, and invited 9 new editorial board members. The development of articles and special issues is stable and all aspects go hand in hand.

    Strive to speed up the process of journal processing, hoping that the median processing time from receiving to publishing online next year is stable and less than 50 days; At the same time, both the appointment and processing of manuscripts should be in strict accordance with the standards, hoping to attract high manuscript quality through the level accumulation of journals. Only by laying a good foundation of the most fundamental quality will the possibility of journals being included in various excellent databases increase, thus improving the popularity of journals. Our ultimate goal seeks to be indexed by more databases by 2023.



    [1] W. Kruskal, R. Majors, Concepts of relative importance in recent scientific literature, Am. Stat., 43 (1989), 2–6.
    [2] C. Achen, Interpreting and Using Regression, Sage, 29 (1982).
    [3] R. Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. B, 58 (1996), 267–288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x doi: 10.1111/j.2517-6161.1996.tb02080.x
    [4] H. Zou, T. Hastie, Regularization and variable selection via the elastic net, J. R. Stat. Soc. B, 67 (2005), 301–320. https://doi.org/10.1111/j.1467-9868.2005.00503.x doi: 10.1111/j.1467-9868.2005.00503.x
    [5] J. Pratt, Dividing the indivisible: using simple symmetry to partition variance explained, in Proceedings of the Second International Tampere Conference in Statistics, (1987), 245–260.
    [6] L. Breiman, Random forests, Mach. Learn., 45 (2001), 5–32. https://doi.org/10.1023/A: 1010933404324
    [7] C. Strobl, A. Boulesteix, T. Kneib, T. Augustin, A. Zeileis, Conditional variable importance for random forests, BMC Bioinf., 9 (2008), 1–11. https://doi.org/10.1186/1471-2105-9-307 doi: 10.1186/1471-2105-9-307
    [8] K. Bladen, Contributions to Random Forest Variable Importance with Applications in R, MS thesis, Utah State University, 2022.
    [9] G. Hooker, L. Mentch, S. Zhou, Unrestricted permutation forces extrapolation: variable importance requires at least one more model, or there is no free variable importance, Stat. Comput., 31 (2021), 1–16. https://doi.org/10.1007/s11222-021-10057-z doi: 10.1007/s11222-021-10057-z
    [10] J. Lei, M. G'Sell, A. Rinaldo, R. Tibshirani, L. Wasserman, Distribution-free predictive inference for regression, J. Am. Stat. Assoc., 113 (2018), 1094–1111. https://doi.org/10.1080/01621459.2017.1307116 doi: 10.1080/01621459.2017.1307116
    [11] R. Barber, E. Candès, Controlling the false discovery rate via knockoffs, Ann. Stat., 43 (2015), 2055–2085.
    [12] E. Candès, Y. Fan, L. Janson, J. Lv, Panning for gold: 'model-X' knockoffs for high dimensional controlled variable selection, J. R. Stat. Soc. B, 80 (2018), 551–577. https://doi.org/10.1111/rssb.12265 doi: 10.1111/rssb.12265
    [13] C. Ye, Y. Yang, Y. Yang, Sparsity oriented importance learning for high-dimensional linear regression, J. Am. Stat. Assoc., 113 (2018), 1797–1812. https://doi.org/10.1080/01621459.2017.1377080 doi: 10.1080/01621459.2017.1377080
    [14] D. Apley, J. Zhu, Visualizing the effects of predictor variables in black box supervised learning models, J. R. Stat. Soc. B, 82 (2020), 1059–1086. https://doi.org/10.1111/rssb.12377 doi: 10.1111/rssb.12377
    [15] A. Goldstein, A. Kapelner, J. Bleich, E. Pitkin, Peeking inside the black box: visualizing statistical learning with plots of individual conditional expectation, J. Comput. Graphical Stat., 24 (2015), 44–65. https://doi.org/10.1080/10618600.2014.907095 doi: 10.1080/10618600.2014.907095
    [16] B. Greenwell, B. Boehmke, A. McCarthy, A simple and effective model-based variable importance measure, preprint, arXiv: 1805.04755, 2018.
  • Reader Comments
  • © 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1102) PDF downloads(54) Cited by(0)

Figures and Tables

Figures(10)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog