
Permutation techniques have been used extensively in machine learning algorithms for evaluating variable importance. In ordinary regression, however, variables are often removed to gauge their importance. In this paper, we compared the results for permuting variables to removing variables in regression to assess relations between these two methods. We compared permute-and-predict (PaP) methods with leave-one-covariate-out (LOCO) techniques. We also compared these results with conventional metrics such as regression coefficient estimates, t-statistics, and random forest out-of-bag (OOB) PaP importance. Our results indicate that permutation importance metrics are practically equivalent to those obtained from removing variables in a regression setting. We demonstrate a strong association between the PaP metrics, true coefficients, and regression-estimated coefficients. We also show a strong relation between the LOCO metrics and the regression t-statistics. Finally, we illustrate that manual PaP methods are not equivalent to the OOB PaP technique and suggest prioritizing the use of manual PaP methods on validation data.
Citation: Kelvyn Bladen, D. Richard Cutler. Assessing agreement between permutation and dropout variable importance methods for regression and random forest models[J]. Electronic Research Archive, 2024, 32(7): 4495-4514. doi: 10.3934/era.2024203
[1] | Shengming Hu, Yongfei Lu, Xuanchi Liu, Cheng Huang, Zhou Wang, Lei Huang, Weihang Zhang, Xiaoyang Li . Stability prediction of circular sliding failure soil slopes based on a genetic algorithm optimization of random forest algorithm. Electronic Research Archive, 2024, 32(11): 6120-6139. doi: 10.3934/era.2024284 |
[2] | Ye Yu, Zhiyuan Liu . A data-driven on-site injury severity assessment model for car-to-electric-bicycle collisions based on positional relationship and random forest. Electronic Research Archive, 2023, 31(6): 3417-3434. doi: 10.3934/era.2023173 |
[3] | Zhiyong Qian, Wangsen Xiao, Shulan Hu . The generalization ability of logistic regression with Markov sampling. Electronic Research Archive, 2023, 31(9): 5250-5266. doi: 10.3934/era.2023267 |
[4] | Tej Bahadur Shahi, Cheng-Yuan Xu, Arjun Neupane, Dayle B. Fleischfresser, Daniel J. O'Connor, Graeme C. Wright, William Guo . Peanut yield prediction with UAV multispectral imagery using a cooperative machine learning approach. Electronic Research Archive, 2023, 31(6): 3343-3361. doi: 10.3934/era.2023169 |
[5] | Li Yang, Kai Zou, Yuxuan Zou . Graph-based two-level indicator system construction method for smart city information security risk assessment. Electronic Research Archive, 2024, 32(8): 5139-5156. doi: 10.3934/era.2024237 |
[6] | Xuerui Li, Lican Kang, Yanyan Liu, Yuanshan Wu . Distributed Bayesian posterior voting strategy for massive data. Electronic Research Archive, 2022, 30(5): 1936-1953. doi: 10.3934/era.2022098 |
[7] | Qiang Guo, Zimeng Zhou, Jie Li, Fengwei Jing . Mechanism- and data-driven algorithms of electrical energy consumption accounting and prediction for medium and heavy plate rolling. Electronic Research Archive, 2025, 33(1): 381-408. doi: 10.3934/era.2025019 |
[8] | Huimin Bai, Li Li, Yongping Wu, Chen Liu, Zhiqiang Gong, Guolin Feng, Gui-Quan Sun . Study on the influence of meteorological elements on growing season vegetation coverage in Xinjiang, China. Electronic Research Archive, 2022, 30(9): 3463-3480. doi: 10.3934/era.2022177 |
[9] | Jie Zheng, Yijun Li . Machine learning model of tax arrears prediction based on knowledge graph. Electronic Research Archive, 2023, 31(7): 4057-4076. doi: 10.3934/era.2023206 |
[10] | Yuhang Liu, Jun Chen, Yuchen Wang, Wei Wang . Interpretable machine learning models for detecting fine-grained transport modes by multi-source data. Electronic Research Archive, 2023, 31(11): 6844-6865. doi: 10.3934/era.2023346 |
Permutation techniques have been used extensively in machine learning algorithms for evaluating variable importance. In ordinary regression, however, variables are often removed to gauge their importance. In this paper, we compared the results for permuting variables to removing variables in regression to assess relations between these two methods. We compared permute-and-predict (PaP) methods with leave-one-covariate-out (LOCO) techniques. We also compared these results with conventional metrics such as regression coefficient estimates, t-statistics, and random forest out-of-bag (OOB) PaP importance. Our results indicate that permutation importance metrics are practically equivalent to those obtained from removing variables in a regression setting. We demonstrate a strong association between the PaP metrics, true coefficients, and regression-estimated coefficients. We also show a strong relation between the LOCO metrics and the regression t-statistics. Finally, we illustrate that manual PaP methods are not equivalent to the OOB PaP technique and suggest prioritizing the use of manual PaP methods on validation data.
Now we have stepped into 2023, at the beginning of the new year, and together with the Editorial Office of AIMS Biophysics, we wish to testify my sincere gratitude to all authors, members of the editorial board, and reviewers, thanking everyone for their contribution to AIMS Biophysics in 2022, now we hope we could cooperate with you more this year.
AIMS Biophysics is an international Open Access journal founded in 2014 and devoted to publishing peer-reviewed, high-quality, original papers in the field of biophysics.
The statistics and metrics of the journal have been increased and remarkable are the following achievements:
- About 30 publications in 2022 (3 review papers, 23 research articles, 4 editorials);
- A total of four special issues were issued in 2022, and it is hoped that these four special issues will attract more contributions from authors in 2023. 4 special issues have reached more than 5 papers. In particular, the new topics of the special issues proposed in 2022 have allowed the interplay between different scholars coming from different research fields. AIMS Biophysics invited nine experts to join our editorial board in 2022.
In the next year 2023, we hope that we can increase the quantity and quality of papers submitted to AIMS Biophysics and constantly seek scholars with good backgrounds to join the editorial board. Shorten the article processing cycle and improve efficiency. Strive to establish a special issue with topical and hot topics, attract more relevant manuscripts, increase citations/papers and total citations, and improve the academic ranking of AIMS Biophysics.
Finally, we would like to thank all the editorial board members again. The development and progress of the magazine can not be separated from your strong support and time. In the coming year of 2023, we look forward to further strengthening the magazine's strength through continued cooperation.
Prof. Carlo Bianca co-Editor in Chief
Prof. Lombardo Domenico co-Editor in Chief
AIMS Biophysics
Manuscript statistics (2022)
Reject rate: 45.3%
Publication time (median time from submission to online): 76 days
Type | Number |
Review | 3 |
Research article | 23 |
Editorial | 4 |
The top 10 articles with the highest citations for the past five years:
Title | Citations |
Recent progress in Monte Carlo simulation on gold nanoparticle radiosensitization | 21 |
Charged amino acids may promote coronavirus SARS-CoV-2 fusion with the host cell | 16 |
Intrinsic blue-green fluorescence in amyloyd fibrils | 13 |
Interdisciplinary approaches to the study of biological membranes | 12 |
Functional characterizations of polyethylene terephthalate-degrading cutinase-like enzyme Cut190 mutants using bis(2-hydroxyethyl) terephthalate as the model substrate | 10 |
Macromolecular sizes of serum albumins in its aqueous solutions | 8 |
Biochemical and biophysical mechanisms underlying the heart and the brain dialog | 6 |
Nanoparticle-based delivery platforms for mRNA vaccine development | 6 |
Thermodynamic, kinetic and docking studies of some unsaturated fatty acids-quercetin derivatives as inhibitors of mushroom tyrosinase | 6 |
A machine learning algorithm for identifying and tracking bacteria in three dimensions using Digital Holographic Microscopy | 6 |
The top 10 articles with the highest viewed for the past two years:
Title | Viewed |
Toxicity associated with gadolinium-based contrast-enhanced examinations | 5184 |
An efficient method of detection of COVID-19 using Mask R-CNN on chest X-Ray images | 3621 |
Effects of magnetic field treated water on some growth parameters of corn (Zea mays) plants | 3076 |
A basic introduction to single particles cryo-electron microscopy | 1874 |
Screening coronavirus and human proteins for sialic acid binding sites using a docking approach | 1831 |
Sequence–function correlation of the transmembrane domains in NS4B of HCV using a computational approach | 1724 |
Radioprotective effect of nanoceria and magnetic flower-like iron oxide microparticles on gamma radiation-induced damage in BSA protein | 1701 |
Chest X-Ray image and pathological data based artificial intelligence enabled dual diagnostic method for multi-stage classification of COVID-19 patients | 1631 |
Tumor treating fields (TTFs) using uninsulated electrodes induce cell death in human non-small cell lung carcinoma (NSCLC) cells | 1452 |
Evaluation of dose enhancement with gold nanoparticles in kilovoltage radiotherapy using the new EGS geometry library in Monte Carlo simulation | 1396 |
Scientific advances in complex systems of biophysical interest
https://www.aimspress.com/aimsbpoa/article/6201/special-articles
Interplay and Multiscale Modeling of Biological Complex Systems
https://www.aimspress.com/aimsbpoa/article/6057/special-articles
Methodological trends in structural biology 2021
https://www.aimspress.com/aimsbpoa/article/5840/special-articles
Applications of artificial intelligence, mathematical modeling and simulation in medical biophysics
https://www.aimspress.com/aimsbpoa/article/5637/special-articles
AIMS Biophysics has a total of 43 editors, 9 of whom were newly invited in 2022.
In the past year, we published 30 articles, created 4 special issues, and invited 9 new editorial board members. The development of articles and special issues is stable and all aspects go hand in hand.
Strive to speed up the process of journal processing, hoping that the median processing time from receiving to publishing online next year is stable and less than 50 days; At the same time, both the appointment and processing of manuscripts should be in strict accordance with the standards, hoping to attract high manuscript quality through the level accumulation of journals. Only by laying a good foundation of the most fundamental quality will the possibility of journals being included in various excellent databases increase, thus improving the popularity of journals. Our ultimate goal seeks to be indexed by more databases by 2023.
[1] | W. Kruskal, R. Majors, Concepts of relative importance in recent scientific literature, Am. Stat., 43 (1989), 2–6. |
[2] | C. Achen, Interpreting and Using Regression, Sage, 29 (1982). |
[3] |
R. Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. B, 58 (1996), 267–288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x doi: 10.1111/j.2517-6161.1996.tb02080.x
![]() |
[4] |
H. Zou, T. Hastie, Regularization and variable selection via the elastic net, J. R. Stat. Soc. B, 67 (2005), 301–320. https://doi.org/10.1111/j.1467-9868.2005.00503.x doi: 10.1111/j.1467-9868.2005.00503.x
![]() |
[5] | J. Pratt, Dividing the indivisible: using simple symmetry to partition variance explained, in Proceedings of the Second International Tampere Conference in Statistics, (1987), 245–260. |
[6] | L. Breiman, Random forests, Mach. Learn., 45 (2001), 5–32. https://doi.org/10.1023/A: 1010933404324 |
[7] |
C. Strobl, A. Boulesteix, T. Kneib, T. Augustin, A. Zeileis, Conditional variable importance for random forests, BMC Bioinf., 9 (2008), 1–11. https://doi.org/10.1186/1471-2105-9-307 doi: 10.1186/1471-2105-9-307
![]() |
[8] | K. Bladen, Contributions to Random Forest Variable Importance with Applications in R, MS thesis, Utah State University, 2022. |
[9] |
G. Hooker, L. Mentch, S. Zhou, Unrestricted permutation forces extrapolation: variable importance requires at least one more model, or there is no free variable importance, Stat. Comput., 31 (2021), 1–16. https://doi.org/10.1007/s11222-021-10057-z doi: 10.1007/s11222-021-10057-z
![]() |
[10] |
J. Lei, M. G'Sell, A. Rinaldo, R. Tibshirani, L. Wasserman, Distribution-free predictive inference for regression, J. Am. Stat. Assoc., 113 (2018), 1094–1111. https://doi.org/10.1080/01621459.2017.1307116 doi: 10.1080/01621459.2017.1307116
![]() |
[11] | R. Barber, E. Candès, Controlling the false discovery rate via knockoffs, Ann. Stat., 43 (2015), 2055–2085. |
[12] |
E. Candès, Y. Fan, L. Janson, J. Lv, Panning for gold: 'model-X' knockoffs for high dimensional controlled variable selection, J. R. Stat. Soc. B, 80 (2018), 551–577. https://doi.org/10.1111/rssb.12265 doi: 10.1111/rssb.12265
![]() |
[13] |
C. Ye, Y. Yang, Y. Yang, Sparsity oriented importance learning for high-dimensional linear regression, J. Am. Stat. Assoc., 113 (2018), 1797–1812. https://doi.org/10.1080/01621459.2017.1377080 doi: 10.1080/01621459.2017.1377080
![]() |
[14] |
D. Apley, J. Zhu, Visualizing the effects of predictor variables in black box supervised learning models, J. R. Stat. Soc. B, 82 (2020), 1059–1086. https://doi.org/10.1111/rssb.12377 doi: 10.1111/rssb.12377
![]() |
[15] |
A. Goldstein, A. Kapelner, J. Bleich, E. Pitkin, Peeking inside the black box: visualizing statistical learning with plots of individual conditional expectation, J. Comput. Graphical Stat., 24 (2015), 44–65. https://doi.org/10.1080/10618600.2014.907095 doi: 10.1080/10618600.2014.907095
![]() |
[16] | B. Greenwell, B. Boehmke, A. McCarthy, A simple and effective model-based variable importance measure, preprint, arXiv: 1805.04755, 2018. |
Type | Number |
Review | 3 |
Research article | 23 |
Editorial | 4 |
Title | Citations |
Recent progress in Monte Carlo simulation on gold nanoparticle radiosensitization | 21 |
Charged amino acids may promote coronavirus SARS-CoV-2 fusion with the host cell | 16 |
Intrinsic blue-green fluorescence in amyloyd fibrils | 13 |
Interdisciplinary approaches to the study of biological membranes | 12 |
Functional characterizations of polyethylene terephthalate-degrading cutinase-like enzyme Cut190 mutants using bis(2-hydroxyethyl) terephthalate as the model substrate | 10 |
Macromolecular sizes of serum albumins in its aqueous solutions | 8 |
Biochemical and biophysical mechanisms underlying the heart and the brain dialog | 6 |
Nanoparticle-based delivery platforms for mRNA vaccine development | 6 |
Thermodynamic, kinetic and docking studies of some unsaturated fatty acids-quercetin derivatives as inhibitors of mushroom tyrosinase | 6 |
A machine learning algorithm for identifying and tracking bacteria in three dimensions using Digital Holographic Microscopy | 6 |
Title | Viewed |
Toxicity associated with gadolinium-based contrast-enhanced examinations | 5184 |
An efficient method of detection of COVID-19 using Mask R-CNN on chest X-Ray images | 3621 |
Effects of magnetic field treated water on some growth parameters of corn (Zea mays) plants | 3076 |
A basic introduction to single particles cryo-electron microscopy | 1874 |
Screening coronavirus and human proteins for sialic acid binding sites using a docking approach | 1831 |
Sequence–function correlation of the transmembrane domains in NS4B of HCV using a computational approach | 1724 |
Radioprotective effect of nanoceria and magnetic flower-like iron oxide microparticles on gamma radiation-induced damage in BSA protein | 1701 |
Chest X-Ray image and pathological data based artificial intelligence enabled dual diagnostic method for multi-stage classification of COVID-19 patients | 1631 |
Tumor treating fields (TTFs) using uninsulated electrodes induce cell death in human non-small cell lung carcinoma (NSCLC) cells | 1452 |
Evaluation of dose enhancement with gold nanoparticles in kilovoltage radiotherapy using the new EGS geometry library in Monte Carlo simulation | 1396 |
Type | Number |
Review | 3 |
Research article | 23 |
Editorial | 4 |
Title | Citations |
Recent progress in Monte Carlo simulation on gold nanoparticle radiosensitization | 21 |
Charged amino acids may promote coronavirus SARS-CoV-2 fusion with the host cell | 16 |
Intrinsic blue-green fluorescence in amyloyd fibrils | 13 |
Interdisciplinary approaches to the study of biological membranes | 12 |
Functional characterizations of polyethylene terephthalate-degrading cutinase-like enzyme Cut190 mutants using bis(2-hydroxyethyl) terephthalate as the model substrate | 10 |
Macromolecular sizes of serum albumins in its aqueous solutions | 8 |
Biochemical and biophysical mechanisms underlying the heart and the brain dialog | 6 |
Nanoparticle-based delivery platforms for mRNA vaccine development | 6 |
Thermodynamic, kinetic and docking studies of some unsaturated fatty acids-quercetin derivatives as inhibitors of mushroom tyrosinase | 6 |
A machine learning algorithm for identifying and tracking bacteria in three dimensions using Digital Holographic Microscopy | 6 |
Title | Viewed |
Toxicity associated with gadolinium-based contrast-enhanced examinations | 5184 |
An efficient method of detection of COVID-19 using Mask R-CNN on chest X-Ray images | 3621 |
Effects of magnetic field treated water on some growth parameters of corn (Zea mays) plants | 3076 |
A basic introduction to single particles cryo-electron microscopy | 1874 |
Screening coronavirus and human proteins for sialic acid binding sites using a docking approach | 1831 |
Sequence–function correlation of the transmembrane domains in NS4B of HCV using a computational approach | 1724 |
Radioprotective effect of nanoceria and magnetic flower-like iron oxide microparticles on gamma radiation-induced damage in BSA protein | 1701 |
Chest X-Ray image and pathological data based artificial intelligence enabled dual diagnostic method for multi-stage classification of COVID-19 patients | 1631 |
Tumor treating fields (TTFs) using uninsulated electrodes induce cell death in human non-small cell lung carcinoma (NSCLC) cells | 1452 |
Evaluation of dose enhancement with gold nanoparticles in kilovoltage radiotherapy using the new EGS geometry library in Monte Carlo simulation | 1396 |