Partially linear models (PLMs) are widely employed in scientific research to analyze hybrid parametric-nonparametric relationships. However, their conventional reliance on symmetric error distributions severely limits applicability to real-world phenomena characterized by pronounced asymmetry and heavy-tailed behavior. To address this gap, we propose a novel PLM framework incorporating skewed generalized normal (SGN) distributed errors, which simultaneously accommodates extreme skewness and heavy-tailed attributes beyond the capabilities of symmetric or skew-normal (SN) specifications. Methodologically, we develop a penalized expectation-maximization (EM) algorithm with provable convergence guarantees and integrated adaptive smoothing selection, effectively resolving optimization instability in high-dimensional settings. Furthermore, we establish a unified diagnostic system that synergizes geometric leverage calculus with local influence analysis to systematically evaluate model robustness against perturbations and outliers. Extensive simulation studies demonstrate the framework's superior estimation accuracy compared to conventional symmetric and SN-based alternatives. Empirical validations based on real-world datasets reveal statistically significant improvements in model fit while maintaining interpretability.
Citation: Xue Wang, Weihu Cheng, Clécio S. Ferreira. Estimation and diagnostic for a skewed generalized normal partially linear models[J]. AIMS Mathematics, 2025, 10(7): 15698-15719. doi: 10.3934/math.2025703
Partially linear models (PLMs) are widely employed in scientific research to analyze hybrid parametric-nonparametric relationships. However, their conventional reliance on symmetric error distributions severely limits applicability to real-world phenomena characterized by pronounced asymmetry and heavy-tailed behavior. To address this gap, we propose a novel PLM framework incorporating skewed generalized normal (SGN) distributed errors, which simultaneously accommodates extreme skewness and heavy-tailed attributes beyond the capabilities of symmetric or skew-normal (SN) specifications. Methodologically, we develop a penalized expectation-maximization (EM) algorithm with provable convergence guarantees and integrated adaptive smoothing selection, effectively resolving optimization instability in high-dimensional settings. Furthermore, we establish a unified diagnostic system that synergizes geometric leverage calculus with local influence analysis to systematically evaluate model robustness against perturbations and outliers. Extensive simulation studies demonstrate the framework's superior estimation accuracy compared to conventional symmetric and SN-based alternatives. Empirical validations based on real-world datasets reveal statistically significant improvements in model fit while maintaining interpretability.
| [1] |
Y. Fan, Q. Li, Root-n-consistent estimation of partially linear time series models, J. Nonparametr. Stat., 11 (1999), 251–269. https://doi.org/10.1080/10485259908832783 doi: 10.1080/10485259908832783
|
| [2] |
G. Tabakan, Performance of the difference-based estimators in partially linear models, Statistics, 47 (2013), 329–347. https://doi.org/10.1080/02331888.2011.592189 doi: 10.1080/02331888.2011.592189
|
| [3] |
M. Han, D. Han, L. Sun, A class of partially linear transformation models for recurrent gap times, Commun. Stat.-Theor. M., 47 (2018), 739–766. https://doi.org/10.1080/03610926.2017.1313986 doi: 10.1080/03610926.2017.1313986
|
| [4] |
V. Cancho, V. Lachos, E. Ortega, A nonlinear regression model with skew-normal errors, Stat. Papers, 51 (2010), 547–558. https://doi.org/10.1007/s00362-008-0139-y doi: 10.1007/s00362-008-0139-y
|
| [5] |
C. Ferreira, G. Paula, Estimation and diagnostic for skew-normal partially linear models, J. Appl. Stat., 44 (2017), 3033–3053. https://doi.org/10.1080/02664763.2016.1267124 doi: 10.1080/02664763.2016.1267124
|
| [6] | A. Azzalini, A class of distributions which includes the normal ones, Scand. J. Stat., 12 (1985), 171–178. |
| [7] | W. Lyu, Z. Feng, Estimation and application of skew-normal data for generalized linear regression, Proceedings of the 2018 International Conference on Computer Modeling, Simulation and Algorithm (CMSA 2018), 2018,208–211. https://doi.org/10.2991/cmsa-18.2018.48 |
| [8] |
J. de Freitas, C. Azevedo, J. Nobre, A stochastic approximation ECME algorithm to semi-parametric scale mixtures of centred skew normal regression models, Stat. Comput., 33 (2023), 51. https://doi.org/10.1007/s11222-023-10223-5 doi: 10.1007/s11222-023-10223-5
|
| [9] | C. S. Ferreira, R. Dias, Semiparametric regression models under skew scale mixtures of normal distributions, Commun. Stat.-Simul. C., in press. https://doi.org/10.1080/03610918.2024.2372667 |
| [10] |
A. Azzalini, The skew-normal distribution and related multivariate families, Scand. J. Stat., 32 (2005), 159–188. https://doi.org/10.1111/j.1467-9469.2005.00426.x doi: 10.1111/j.1467-9469.2005.00426.x
|
| [11] |
R. Cook, Detection of influential observation in linear regression, Technometrics, 19 (1977), 15–18. https://doi.org/10.1080/00401706.1977.10489493 doi: 10.1080/00401706.1977.10489493
|
| [12] |
R. Cook, Assessment of local influence, J. R. Stat. Soc. B, 48 (1986), 133–155. https://doi.org/10.1111/j.2517-6161.1986.tb01398.x doi: 10.1111/j.2517-6161.1986.tb01398.x
|
| [13] |
C. Kim, B. Park, W. Kim, Influence diagnostics in semiparametric regression models, Stat. Probabil. Lett., 60 (2002), 49–58. https://doi.org/10.1016/S0167-7152(02)00268-7 doi: 10.1016/S0167-7152(02)00268-7
|
| [14] |
M. Galea, G. Paula, F. Cysneiros, On diagnostics in symmetrical nonlinear models, Stat. Probabil. Lett., 73 (2005), 459–467. https://doi.org/10.1016/j.spl.2005.04.033 doi: 10.1016/j.spl.2005.04.033
|
| [15] | M. Lemus, V. Lachos, C. Galarza, L. Matos, Estimation and diagnostics for partially linear censored regression models based on heavy-tailed distributions, Stat. Interface, 14 (2021), 165–182. |
| [16] |
C. Ferreira, G. Paula, G. Lana, Estimation and diagnostic for partially linear models with first-order autoregressive skew-normal errors, Comput. Stat., 37 (2022), 445–468. https://doi.org/10.1007/s00180-021-01130-2 doi: 10.1007/s00180-021-01130-2
|
| [17] |
G. Ibacache-Pulgar, C. Villegas, J. Lopez-Gonzales, M. Moraga, Influence measures in nonparametric regression model with symmetric random errors, Stat. Methods Appl., 32 (2023), 1–25. https://doi.org/10.1007/s10260-022-00648-z doi: 10.1007/s10260-022-00648-z
|
| [18] |
X. He, Z. Zhu, W. Fung, Estimation in a semiparametric model for longitudinal data with unspecified dependence structure, Biometrika, 89 (2002), 579–590. https://doi.org/10.1093/biomet/89.3.579 doi: 10.1093/biomet/89.3.579
|
| [19] |
P. Eilers, B. Marx, Flexible smoothing with B-splines and penalties, Stat. Sci., 11 (1996), 89–121. https://doi.org/10.1214/ss/1038425655 doi: 10.1214/ss/1038425655
|
| [20] |
L. Devroye, Random variate generation for exponentially and polynomially tilted stable distributions, ACM Trans. Model. Comput. S., 19 (2009), 18. https://doi.org/10.1145/1596519.1596523 doi: 10.1145/1596519.1596523
|
| [21] |
J. Nolan, Parameterizations and modes of stable distributions, Stat. Probabil. Lett., 38 (1998), 187–195. https://doi.org/10.1016/S0167-7152(98)00010-8 doi: 10.1016/S0167-7152(98)00010-8
|
| [22] |
P. Green, On use of the EM algorithm for penalized likelihood estimation, J. R. Stat. Soc. B, 52 (1990), 443–452. https://doi.org/10.1111/j.2517-6161.1990.tb01798.x doi: 10.1111/j.2517-6161.1990.tb01798.x
|
| [23] |
G. Ibacache-Pulgar, G. Paula, F. Cysneiros, Semiparametric additive models under symmetric distributions, TEST, 22 (2013), 103–121. https://doi.org/10.1007/s11749-012-0309-z doi: 10.1007/s11749-012-0309-z
|
| [24] |
B. Wei, Y. Hu, W. Fung, Generalized leverage and its applications, Scand. J. Stat., 25 (1998), 25–37. https://doi.org/10.1111/1467-9469.00086 doi: 10.1111/1467-9469.00086
|
| [25] | F. Salgado, Diagnóstico de influência em modelos elıpticos com efeitos mistos, Ph. D Thesis, Universidade de Sao Paulo, 2006. https://doi.org/10.11606/T.45.2006.tde-20210729-151424 |
| [26] |
A. Atkinson, Two graphical displays for outlying and influential observations in regression, Biometrika, 68 (1981), 13–20. https://doi.org/10.1093/biomet/68.1.13 doi: 10.1093/biomet/68.1.13
|