Research article

Outlier detection in gamma regression using Pearson residuals: Simulation and an application

  • Received: 05 January 2022 Revised: 28 April 2022 Accepted: 11 May 2022 Published: 17 June 2022
  • MSC : 62J12, 62J20

  • In data analysis, the choice of an appropriate regression model and outlier detection are both very important in obtaining reliable results. Gamma regression (GR) is employed when the distribution of the dependent variable is gamma. In this work, we derived new methods for outlier detection in GR. The proposed methods are based upon the adjusted and standardized Pearson residuals. Furthermore, a comparison of available and proposed methods is made using a simulation study and a real-life data set. The results of simulation and real-life application the evidence better performance of the adjusted Pearson residual based outlier detection approach.

    Citation: Muhammad Amin, Saima Afzal, Muhammad Nauman Akram, Abdisalam Hassan Muse, Ahlam H. Tolba, Tahani A. Abushal. Outlier detection in gamma regression using Pearson residuals: Simulation and an application[J]. AIMS Mathematics, 2022, 7(8): 15331-15347. doi: 10.3934/math.2022840

    Related Papers:

  • In data analysis, the choice of an appropriate regression model and outlier detection are both very important in obtaining reliable results. Gamma regression (GR) is employed when the distribution of the dependent variable is gamma. In this work, we derived new methods for outlier detection in GR. The proposed methods are based upon the adjusted and standardized Pearson residuals. Furthermore, a comparison of available and proposed methods is made using a simulation study and a real-life data set. The results of simulation and real-life application the evidence better performance of the adjusted Pearson residual based outlier detection approach.



    加载中


    [1] M. Amin, M. Amanullah, M. Aslam, Empirical evaluation of the inverse Gaussian regression residuals for the assessment of influential points, J. Chemometr., 30 (2016), 394–404. https://doi.org/10.1002/cem.2805 doi: 10.1002/cem.2805
    [2] M. Meloun, J. Militký, Detection of single influential points in OLS regression model building, Anal. Chim. Acta, 439 (2001), 169–191. https://doi.org/10.1016/S0003-2670(01)01040-6 doi: 10.1016/S0003-2670(01)01040-6
    [3] K. A. Mogaji, Geoelectrical parameter-based multivariate regression borehole yield model for predicting aquifer yield in managing groundwater resource sustainability, J. Taibah Univ. Sci., 10 (2016), 584–600. https://doi.org/10.1016/j.jtusci.2015.12.006 doi: 10.1016/j.jtusci.2015.12.006
    [4] O. S. Alshamrani, Construction cost prediction model for conventional and sustainable college buildings in North America, J. Taibah Univ. Sci., 11 (2017), 315–323. https://doi.org/10.1016/j.jtusci.2016.01.004 doi: 10.1016/j.jtusci.2016.01.004
    [5] A. M. Sarhan, A. I. El-Gohary, A. Mustafa, A. H. Tolba, Statistical analysis of regression competing risks model with covariates using Weibull sub-distributions, Int. J. Reliab. Appl., 20 (2019), 73–88.
    [6] J. Burger, P. Geladi, Hyperspectral NIR image regression part Ⅱ: Dataset preprocessing diagnostics, J. Chemometr., 20 (2006), 106–119. https://doi.org/10.1002/cem.986 doi: 10.1002/cem.986
    [7] D. L. Massart, L. Kaufman, P. J. Rousseeuw, A. Leroy, Least median of squares: A robust method for outlier and model error detection in regression and calibration, Anal. Chem. Acta, 187 (1986), 171–179. https://doi.org/10.1016/S0003-2670(00)82910-4 doi: 10.1016/S0003-2670(00)82910-4
    [8] E. Hund, D. L. Massart, J. Smeyers-Verbeke, Robust regression and outlier detection in the evaluation of robustness tests with different experimental designs, Anal. Chem. Acta., 463 (2002), 53–73. https://doi.org/10.1016/S0003-2670(02)00337-9 doi: 10.1016/S0003-2670(02)00337-9
    [9] P. J. Rousseeuw, M. Debruyne, S. Engelen, M. Hubert, Robustness and outlier detection in chemometrics, Crit. Rev. Anal. Chem., 36 (2006), 221–242. https://doi.org/10.1080/10408340600969403 doi: 10.1080/10408340600969403
    [10] A. Desgagné, Efficient and robust estimation of regression and scale parameters, with outlier detection, Comput. Stat. Data Anal., 155 (2021), 1–19. https://doi.org/10.1016/j.csda.2020.107114 doi: 10.1016/j.csda.2020.107114
    [11] V. Barnett, T. Lewis, Outliers in statistical data, Chichester, UK: Wiley, 1994.
    [12] W. J. Dixon, Analysis of extreme values, Ann. Math. Stat., 21 (1950), 488–506.
    [13] F. E. Grubbs, Procedures for detecting outlying observations in samples, Technometrics, 11 (1969), 1–21.
    [14] B. Rosner, Percentage points for a generalized ESD many-outlier procedure, Technometrics, 25 (1983), 165–172.
    [15] U. Balasooriya, Y. K. Tse, Y. S. Liew, An empirical comparison of some statistics for identifying outliers and influential observations in linear regression models, J. Appl. Stat., 14 (1987), 177–184. https://doi.org/10.1080/02664768700000022 doi: 10.1080/02664768700000022
    [16] J. F. Lawless, Statistical models and methods for life time data, New York: Wiley, 2003.
    [17] D. Jearkpaporn, D. C. Montgomery, G. C. Runger, C. M. Borror, Model based process monitoring using robust generalized linear models, Int. J. Prod. Res., 43 (2005), 1337–1354. https://doi.org/10.1080/00207540412331299693 doi: 10.1080/00207540412331299693
    [18] M. L. Segond, C. Onof, H. S. Wheater, Spatial temporal disaggregation of daily rainfall from a generalized linear model, J. Hydrol., 331 (2006), 674–689. https://doi.org/10.1016/j.jhydrol.2006.06.019 doi: 10.1016/j.jhydrol.2006.06.019
    [19] R. N. Das, J. Kim, GLM and joint GML techniques in hydrogeology: An illustration, Int. J. Hydrol. Sci. Technol., 2 (2012), 185–201.
    [20] R. De Marco, F. Locatelli, I. Cerveri, M. Bugiani, A. Marinoni, G. Giammanco, Incidence and remission of asthma: A retrospective study on the natural history of asthma in Italy, J. Allergy Clin. Immun., 110 (2002), 228–235. https://doi.org/10.1067/mai.2002.125600 doi: 10.1067/mai.2002.125600
    [21] M. Faddy, N. Graves, A. Pettitt, Modeling length of stay in hospital and other right skewed data: Comparison of phase-type, Gamma and log-normal distributions, Value Health, 12 (2009), 309–314. https://doi.org/10.1111/j.1524-4733.2008.00421.x doi: 10.1111/j.1524-4733.2008.00421.x
    [22] Y. Murakami, T. Okamura, K. Nakamura, K. Miura, H. Ueshima, The clustering of cardiovascular disease risk factors and their impacts on annual medical expenditure in Japan: Community-based cost analysis using Gamma regression models, BMJ Open, 3 (2013), 1–6.
    [23] D. Griffie, L. James, S. Goetz, B. Balotti, Y. H. Shr, M. Corbin, et al., Outcomes and economic benefits of Penn State extension's dining with diabetes program, Prev. Chronic Dis., 15 (2018), 1–13. https://doi.org/10.5888/pcd15.170407 doi: 10.5888/pcd15.170407
    [24] N. Kumar, S. Lalitha, Testing for upper outliers in gamma sample, Commun. Stat.-Theory Methods, 41 (2012), 820–828. https://doi.org/10.1080/03610926.2010.531366 doi: 10.1080/03610926.2010.531366
    [25] M. J. Nooghabi, H. J. Nooghabi, P. Nasiri, Detecting outliers in gamma distribution, Commun. Stat. Theory Methods, 39 (2010), 698–706. https://doi.org/10.1080/03610920902783856 doi: 10.1080/03610920902783856
    [26] A. C. Kimber, Tests for a single outlier in a gamma sample with unknown shape and scale parameters, J. Roy. Stat. Soc. Ser. C, 28 (1979), 243–250. https://doi.org/10.2307/2347194 doi: 10.2307/2347194
    [27] A. C. Kimber, Discordancy testing in gamma samples with both parameters unknown, J. Roy. Stat. Soc. Ser. C, 32 (1983), 304–310. https://doi.org/10.2307/2347953 doi: 10.2307/2347953
    [28] T. Lewis, N. R. J. Fieller, A recursive algorithm for null distribution for outliers: I. Gamma samples, Technometrics, 21 (1979), 371–376.
    [29] M. A. Shayib, D. H. Young, The extreme residuals in gamma regression, Commun. Stat. Theory Methods, 20 (1991), 561–577. https://doi.org/10.1080/03610929108830515 doi: 10.1080/03610929108830515
    [30] G. C. Tiao, I. Guttman, Analysis of outliers with adjusted residuals, Technometrics, 9 (1967), 541–559.
    [31] G. M. Cordeiro, On Pearson's residuals in generalized linear models, Stat. Probabil. Lett., 66 (2004), 213–219. https://doi.org/10.1016/j.spl.2003.09.004 doi: 10.1016/j.spl.2003.09.004
    [32] M. R. Urbano, C. G. Demtrio, G. M. Cordeiro, On Wald residuals in generalized linear models, Commun. Stat. Theory Methods, 41 (2012), 741–758. https://doi.org/10.1080/03610926.2010.529537 doi: 10.1080/03610926.2010.529537
    [33] T. Anholeto, M. C. Sandoval, D. A. Botter, Adjusted Pearson residuals in beta regression models, J. Stat. Comput. Simul., 84 (2014), 999–1014. https://doi.org/10.1080/00949655.2012.736993 doi: 10.1080/00949655.2012.736993
    [34] J. W. Hardin, J. W. Hilbe, Generalized linear models and extensions, Stata Press Publication: Texas, 2012.
    [35] P. J. Green, Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives, J. Roy. Stat. Soc.: Ser. B, 46 (1984), 149–170. https://doi.org/10.1111/j.2517-6161.1984.tb01288.x doi: 10.1111/j.2517-6161.1984.tb01288.x
    [36] D. R. Cox, E. J. Snell, A general definition of residuals (with discussion), J. Roy. Stat. Soc.: Ser. B, 30 (1968), 248–275.
    [37] R. D. Cook, S. Weisberg, Residuals and influence in regression, Chapman Hall, New York, 1982.
    [38] B. Iglewicz, D. C. Hoaglin, How to detect and handle outliers, Milwaukee: ASQC Quality Press, 1993.
    [39] S. Ahmad, M. Aslam, Another proposal about the new two-parameter estimator for linear regression model with correlated regressors, Commun. Stat.-Simul. Comput., 51 (2022), 3054–3072. https://doi.org/10.1080/03610918.2019.1705975 doi: 10.1080/03610918.2019.1705975
    [40] T. E. Barnard, K. S. Booksh, R. G. Brereton, D. H. Coomans, S. N. Deming, Y. Hayashi, Chemometrics in environmental chemistry-statistical methods, Vol. 2, Springer-Verlag Berlin Heidelberg New York, 1995.
    [41] Y. L. Mallet, D. H. Coomans, O. Y. de Vel, Robust non-parametric methods in multiple regressions of environmental data, In: Chemometrics an environmental chemistry-statistical methods, 1995. https://doi.org/10.1007/978-3-540-49148-4_6
    [42] A. Hossain, D. N. Naik, A comparative study on detection of influential observations in linear regression, Stat. Pap., 32 (1991), 55–69. https://doi.org/10.1007/BF02925479 doi: 10.1007/BF02925479
    [43] T. A. Abushal, Parametric inference of Akash distribution with Type-Ⅱ censoring with analyzing of relief times of patients, AIMS Math., 6 (2021), 10789–10801. https://doi.org/10.3934/math.2021627 doi: 10.3934/math.2021627
    [44] T. A. Abushal, A. H. Abdel-Hamid, Inference on a new distribution under progressive-stress accelerated life tests and progressive type-Ⅱ censoring based on a series-parallel system, AIMS Math., 7 (2022), 425–454. https://doi.org/10.3934/math.2022028 doi: 10.3934/math.2022028
  • Reader Comments
  • © 2022 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1663) PDF downloads(90) Cited by(0)

Article outline

Figures and Tables

Figures(1)  /  Tables(6)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog