The spatial auto-regressive (SAR) model serves as a foundational tool for investigating spatial dependencies across various scientific fields. In this paper, we proposed a sequential variable selection approach named the profiled variable selection (PVS) procedure for the SAR model. The PVS procedure was designed to efficiently handle high-dimensional cases while maintaining high scalability, even as the number of potential covariates $ p $ grows exponentially with the sample size $ n $. However, existing penalization methods are inadequate in variable selection for the SAR model when $ p > n $, due to the necessity for a consistent initial estimator. The selection consistency of the PVS procedure was established under mild conditions. Numerical simulations demonstrated the promising performance of the PVS procedure in variable selection for SAR models. Additionally, a real data analysis on housing prices across Chinese cities illustrated the practical utility of our method.
Citation: Zengchao Xu, Yu Liu. The profiled variable selection for high-dimensional spatial auto-regressive model[J]. AIMS Mathematics, 2026, 11(5): 13981-13998. doi: 10.3934/math.2026575
The spatial auto-regressive (SAR) model serves as a foundational tool for investigating spatial dependencies across various scientific fields. In this paper, we proposed a sequential variable selection approach named the profiled variable selection (PVS) procedure for the SAR model. The PVS procedure was designed to efficiently handle high-dimensional cases while maintaining high scalability, even as the number of potential covariates $ p $ grows exponentially with the sample size $ n $. However, existing penalization methods are inadequate in variable selection for the SAR model when $ p > n $, due to the necessity for a consistent initial estimator. The selection consistency of the PVS procedure was established under mild conditions. Numerical simulations demonstrated the promising performance of the PVS procedure in variable selection for SAR models. Additionally, a real data analysis on housing prices across Chinese cities illustrated the practical utility of our method.
| [1] |
Y. Wu, Y. Sun, Shrinkage estimation of the linear model with spatial interaction, Metrika, 80 (2017), 51–68. https://doi.org/10.1007/s00184-016-0590-z doi: 10.1007/s00184-016-0590-z
|
| [2] |
X. Liu, J. Chen, S. Cheng, A penalized quasi-maximum likelihood method for variable selection in the spatial autoregressive model, Spat. Stat., 25 (2018), 86–104. https://doi.org/10.1016/j.spasta.2018.05.001 doi: 10.1016/j.spasta.2018.05.001
|
| [3] |
T. Xie, R. Cao, J. Du, Variable selection for spatial autoregressive models with a diverging number of parameters, Stat. Pap., 61 (2020), 1125–1145. https://doi.org/10.1007/s00362-018-0984-2 doi: 10.1007/s00362-018-0984-2
|
| [4] |
G. Luo, M. Wu, Variable selection for semiparametric varying-coefficient spatial autoregressive models with a diverging number of parameters, Commun. Stat.-Theor. M., 50 (2021), 2062–2079. https://doi.org/10.1080/03610926.2019.1659367 doi: 10.1080/03610926.2019.1659367
|
| [5] |
Y. Liu, X. Zhuang, Shrinkage estimation of semi-parametric spatial autoregressive panel data model with fixed effects, Stat. Probabil. Lett., 194 (2023), 109746. https://doi.org/10.1016/j.spl.2022.109746 doi: 10.1016/j.spl.2022.109746
|
| [6] |
Y. Liu, Adaptive lasso variable selection method for semiparametric spatial autoregressive panel data model with random effects, Commun. Stat.-Theor. M., 53 (2024), 2122–2140. https://doi.org/10.1080/03610926.2022.2119088 doi: 10.1080/03610926.2022.2119088
|
| [7] |
H. Wang, Forward regression for ultra-high dimensional variable screening, J. Am. Stat. Assoc., 104 (2009), 1512–1524. https://doi.org/10.1198/jasa.2008.tm08516 doi: 10.1198/jasa.2008.tm08516
|
| [8] | Y. C. Pati, R. Rezaiifar, P. S. Krishnaprasad, Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers, 1993, 40–44. https://doi.org/10.1109/ACSSC.1993.342465 |
| [9] |
P. Bühlmann, Boosting for high-dimensional linear models, Ann. Stat., 34 (2006), 559–583. https://doi.org/10.1214/009053606000000092 doi: 10.1214/009053606000000092
|
| [10] |
J. Chen, Z. Chen, Extended Bayesian information criteria for model selection with large model spaces, Biometrika, 95 (2008), 759–771. https://doi.org/10.1093/biomet/asn034 doi: 10.1093/biomet/asn034
|
| [11] | P. Zhao, B. Yu, On model selection consistency of Lasso, J. Mach. Learn. Res., 7 (2006), 2541–2563. |
| [12] |
J. Chen, Z. Chen, Extended BIC for small-n-large-p sparse GLM, Stat. Sinica, 22 (2012), 555–574. https://doi.org/10.5705/ss.2010.216 doi: 10.5705/ss.2010.216
|
| [13] |
S. Luo, Z. Chen, Selection consistency of EBIC for GLIM with non-canonical links and diverging number of parameters, Stat. Interface, 6 (2013), 275–284. https://doi.org/10.4310/SII.2013.v6.n2.a10 doi: 10.4310/SII.2013.v6.n2.a10
|