Department of Statistics, University of Sargodha, Sargodha, Pakistan
2.
Department of Statistics, Bahauddin Zakariya University, Multan, Pakistan
3.
Department of Mathematics (Statistics Option), Pan African University, Institute for Basic Sciences, Technology and Innovation (PAUSTI); Nairobi, 62000-00200, Kenya
4.
Mathematics Department, Faculty of Science, Mansoura University, Mansoura 35516, Egypt
5.
Department of Mathematical Science, Faculty of Applied Science, Umm AL-Qura University, Makkah, 21961, Saudi Arabia
Received:
05 January 2022
Revised:
28 April 2022
Accepted:
11 May 2022
Published:
17 June 2022
In data analysis, the choice of an appropriate regression model and outlier detection are both very important in obtaining reliable results. Gamma regression (GR) is employed when the distribution of the dependent variable is gamma. In this work, we derived new methods for outlier detection in GR. The proposed methods are based upon the adjusted and standardized Pearson residuals. Furthermore, a comparison of available and proposed methods is made using a simulation study and a real-life data set. The results of simulation and real-life application the evidence better performance of the adjusted Pearson residual based outlier detection approach.
Citation: Muhammad Amin, Saima Afzal, Muhammad Nauman Akram, Abdisalam Hassan Muse, Ahlam H. Tolba, Tahani A. Abushal. Outlier detection in gamma regression using Pearson residuals: Simulation and an application[J]. AIMS Mathematics, 2022, 7(8): 15331-15347. doi: 10.3934/math.2022840
Related Papers:
[1]
Zawar Hussain, Atif Akbar, Mohammed M. A. Almazah, A. Y. Al-Rezami, Fuad S. Al-Duais .
Diagnostic power of some graphical methods in geometric regression model addressing cervical cancer data. AIMS Mathematics, 2024, 9(2): 4057-4075.
doi: 10.3934/math.2024198
[2]
Emrah Altun, Mustafa Ç. Korkmaz, M. El-Morshedy, M. S. Eliwa .
The extended gamma distribution with regression model and applications. AIMS Mathematics, 2021, 6(3): 2418-2439.
doi: 10.3934/math.2021147
[3]
Baria A. Helmy, Amal S. Hassan, Ahmed K. El-Kholy, Rashad A. R. Bantan, Mohammed Elgarhy .
Analysis of information measures using generalized type-Ⅰ hybrid censored data. AIMS Mathematics, 2023, 8(9): 20283-20304.
doi: 10.3934/math.20231034
[4]
W. B. Altukhaes, M. Roozbeh, N. A. Mohamed .
Feasible robust Liu estimator to combat outliers and multicollinearity effects in restricted semiparametric regression model. AIMS Mathematics, 2024, 9(11): 31581-31606.
doi: 10.3934/math.20241519
[5]
Sihem Semmar, Omar Fetitah, Mohammed Kadi Attouch, Salah Khardani, Ibrahim M. Almanjahie .
A Bernstein polynomial approach of the robust regression. AIMS Mathematics, 2024, 9(11): 32409-32441.
doi: 10.3934/math.20241554
[6]
Abdulhakim A. Al-Babtain, Amal S. Hassan, Ahmed N. Zaky, Ibrahim Elbatal, Mohammed Elgarhy .
Dynamic cumulative residual Rényi entropy for Lomax distribution: Bayesian and non-Bayesian methods. AIMS Mathematics, 2021, 6(4): 3889-3914.
doi: 10.3934/math.2021231
[7]
Xueying Yu, Chuancun Yin .
Some results on multivariate measures of elliptical and skew-elliptical distributions: higher-order moments, skewness and kurtosis. AIMS Mathematics, 2023, 8(3): 7346-7376.
doi: 10.3934/math.2023370
[8]
Yanting Xiao, Wanying Dong .
Robust estimation for varying-coefficient partially linear measurement error model with auxiliary instrumental variables. AIMS Mathematics, 2023, 8(8): 18373-18391.
doi: 10.3934/math.2023934
Hu Jiayuan, Chen Zhuoyu .
On distribution properties of cubic residues. AIMS Mathematics, 2020, 5(6): 6051-6060.
doi: 10.3934/math.2020388
Abstract
In data analysis, the choice of an appropriate regression model and outlier detection are both very important in obtaining reliable results. Gamma regression (GR) is employed when the distribution of the dependent variable is gamma. In this work, we derived new methods for outlier detection in GR. The proposed methods are based upon the adjusted and standardized Pearson residuals. Furthermore, a comparison of available and proposed methods is made using a simulation study and a real-life data set. The results of simulation and real-life application the evidence better performance of the adjusted Pearson residual based outlier detection approach.
1.
Introduction
Regression analysis is the main tool to study the relationship between and dependence of one or more variables on one or more independent variables. These relationships can be observed in every research area. Regression analysis has a wide variety of applications in different fields [1,2,3,4] and Sarhan et al. [5]. Regression results are reliable only if the quality of data is good and the selected regression model is correct. If the quality of data and the regression model are not appropriate, then the results of measuring such relationships are incorrect. The quality of data refers to the outlier free data set [2,6]. The correct regression model refers to identifying the distribution of the response variable.
An outlier is a point that is far from the rest of the data points [7,8,9]. In the regression context, Desgagné [10] stated that the observations with more distant regression errors conflict with most of the errors originating from the assumed normal distribution. An outlier may or may not affect the regression inferences. An outlier may be in one or multiple variables. Outlier detection in univariate analysis has been done by several researchers i.e. many studies in the literature have focused on univariate analysis [7,8,9,11,12]. Outlier diagnostics in the linear regression model (LRM) has also gained much attention from researchers [13,14]. Balasooriya et al. [15] compared some well-known outlier detection methods by using the LRM. They concluded that all methods do not agree with each other for the detection of outliers.
Regression analysis is used to determine the model for forecasting/prediction purposes. There is a variety of regression models, e.g., LRMs, generalized linear models (GLMs) and non-linear models, Gamma regression (GR) is employed, when the distribution of the dependent variable is gamma. GR has a variety of applications in the literature with examples in health sciences, industries and environment, for more details see [16,17,18,19,20,21,22,23].
Outlier detection using univariate gamma response without considering any independent variable is also available in the literature [24,25,26,27,28]. Shayib and Young [29] first studied the extreme residuals in GR and proposed the Pearson and Anscombe residuals with modified forms; they concluded that the modified forms of these residuals are not good.
The detection of outliers in the GR model has not been addressed in the literature. This paper deals with outlier detection in the GR model by using a new approach i.e., an adjusted Pearson residuals (PRs) approach. Outlier detection with the adjusted form of residuals (other than PRs) was first studied by Tiao and Guttman [30]. This approach was also proposed for some of the GLM responses. Cordeiro [31] introduced the adjusted PRs for the Poisson regression model. In addition, adjusted Wald residuals and PRs for beta regression have been suggested by various authors [32,33]. In these studies, most of the researchers focused on the adjusted PRs probability distributions. They observed that the adjusted residuals performed better than others.
The main objectives of the current research were to propose some outlier diagnostics based upon Pearson (standardized and adjusted) residuals in GR, modify some available LRM-based outlier detection methods for GR and then make a comparison of these modified and proposed outlier detection methods with the help of simulations and a real data set.
2.
Materials and methods
The probability density function of the gamma response variable (y) is given by
f(y;ν,τ)=1τνΓ(ν)yν−1e−yτ,y>0,ν>0,τ>0.
(2.1)
The mean and variance of Eq (2.1) are respectively given as E(y)=ντ and Var(y)=ντ2. According to Hardin and Hilbe [34], Eq (2.1) can be transformed with parameters ν=ϕ−1 and τ=μϕ. Given these parameters, the gamma density for y is given by
f(y;μ,ϕ)=1Γ(1ϕ)(1μϕ)1ϕy1ϕ−1e−yμϕ,y>0,μ>0,ϕ>0.
(2.2)
The mean and variance of Eq (2.2) are respectively given as E(y)=μ and Var(y)=ϕμ2. For the ith observation, let xi1,…xip represent the p independent variables. Then, the GR for the mean of the response variable y is given by
g(μi)=ηi=xTiβ,i=1,…,n.
where xTi=(1,xi1,…,xip),β=(β0,β1,…,βp) is the vector of regression coefficients including intercept and g(.) is the link function. This link function in the GR can be reciprocal or log.
Let li be the log-likelihood function of the response variable of Eq (2.2) which is mathematically defined by
Let ^β,ˆμ and ˆϕ be the maximum likelihood estimates (MLEs) which are obtained by maximizing the log-likelihood of Eq (2.3) using the Netwon-Raphson iterative method. The MLE of β is computed by solving the system of equations. For this purpose, we equate the first derivative of Eq (2.3) to zero; then, we have
U(β)=∂l∂β=−1ϕ(y−(Xβ)−1)X=0,
(2.4)
where U(β)is the score vector of the order(p+1)×1. Since Eq (2.4) is nonlinear in β, Newton-Raphson methods can be employed for the estimation of β[34]. Suppose βm is the approximated MLE of β at the mth iteration; then, the iterative reweighted method [35] gives the following expression
βm+1=βm+{I(βm)}−1U(βm),
(2.5)
where I(βm) is the (p+1)×(p+1) fisher information matrix at the mth iteration. Applying convergence in deviance to Eq (2.5), the unknown parameters can be computed as
ˆβ=(XTˆWX)−1XTˆWˆz,
(2.6)
where ˆzi=ˆηi+yi−ˆμiˆμ2i is the adjusted response variable, ˆW=diag(ˆμ21,…,ˆμ2n) and ˆμi=1xTiˆβ.
Several types of the GLM residuals are available in the literature [33] but we consider the most popular PR for the detection of an outlier.
The PRs for GR are defined by
χi=yi−μi√V(μi)=yi−μiμi.
(2.7)
The standardized PRs are characterized as
χ′i=χi√ϕ(1−hii),
(2.8)
where hii=diag(H=ˆw12X(XTˆWX)−1XTˆW12).
Generally, E(χi) to order O(n−1) does not converge to zero and Var(χi) does not tend to one. Here n is the sample size. To handle such a situation, we require some adjustments to these residuals. To do this, Cox and Snell [36] obtained some matrix formulae for the adjusted residuals.
Various criteria are available in the literature [2,12] for testing the quality of regression models. These criteria include mean quadratic error prediction (MEP), the Akaike information criterion (AIC), standard errors and coefficients of determination. As our study is concerned with the GR model, we consider some different criteria for testing the goodness of the GR model after diagnosing the outlying points. These include the Pearson chi-square statistic (χ2), MEP, the AIC, Efron's pseudo r-squared criterion (R2Efron) and the dispersion parameter (ˆϕ); these criteria are computed according to the following relations:
χ2=∑ni=1χ2i, where χi is the ith PR defined in Eq (2.7).
MEP=∑ni=1V1/2(μi)χ2i(1−hii)n, where V(μi)=μ2i is the variance function of the GR model.
AIC=−li(MK)+2pn, where li is the log likelihood function of the GR model defined in Eq (2.3).
R2Efron=1−∑ni=1(yi−ˆμi)2∑ni=1(yi−ˉy)2,
ˆϕ=χ2n−p−1.
3.
Outlier detection method
This section comprises two subsections. In the first subsection, the proposed outlier detection methods based on the PRs of the GR model are presented. The second one comprises a review of some existing outlier detection methods.
3.1. Proposed outlier detection methods
The PRs have a significant contribution in regression diagnostics. Here we propose an outlier detection method based upon PRs.
3.1.1. Standardized PRs
In regression analysis, the standardized residuals are generally used for the detection of outliers. So, here we consider standardized PRs, which have been defined in Section 2 by Eq (2.8) as
χ′i=χi√ϕ(1−hii).
(3.1)
On the basis of standardized PRs, the ith point is considered to be an outlier if |χ′i|>3.
3.1.2. Jackknife PRs
There are some analytical methods that are used for the detection of outliers. One of these analytical methods is the use of jackknife residuals. Cook and Weisberg [37] suggested that the outliers can be detected with the help of jackknife residuals. They defined the jackknife residuals for LRMs as
eJi=ri√n−p−1n−p−r2i,
(3.2)
where ri is the standardized residual of the LRMs. The decision rule for the detection outlier is that if |eJi|>tα2n (with n−p−1 degrees of freedom), then there is an indication for the existence of outliers. The application of these residuals for the detection of outliers in chemometrics with reference to an LRM has been studied by Meloun and Militky [2]. Now, we modify Eq (3.2) for the GR by following Amin et al. [1] and obtaining
χJi=χ′i√n−p−1n−p−χ′2i.
(3.3)
To identify outliers of the GR, we propose the cut-off point for the jackknife PR to be if |χJ,i|>t(1−α)(n−p−1), then the ith observation is declared as an outlier t is the student t-distribution with (n−p−1) degrees of freedom.
3.1.3. Adjusted PRs
Amin et al. [1] proposed the adjusted PRs in the inverse Gaussian regression for the detection of single influential points in chemometrics. These residuals are proposed for the GR as follows:
χAi=χi−ri√vi,
(3.4)
where ri=(E(Ri))T=−√ϕ2(I−H)Jz, where H=W12X(XTWX)−1XTW12,J=diag(2μ2) and z=(z11,…,znn)Tis a vector; also, vi=(Var(Ri))T=1+ϕ2(QHJ−T)z, where Q=diag(2) and T=diag((2ϕ−1+6)μ2).
Note that ˆriandˆvi are computed by using ˆμi instead of μi.
Amin et al. [1] stated that the adjusted residuals can be used for the detection of an outlier. The decision rule to declare the ith observation is an outlier is that |χAi|>2.
3.2. Available outlier detection methods
In the literature, numerous methods have been recommended for the detection of outliers. We consider a few of them for comparison with the adjusted PR for the GR model.
3.2.1. Z-method
For the identification of outliers in univariate cases, the Z-score is introduced based on the median and inter-quartile range (IQR) as below:
zi=yi−Median(yi)IQR(yi).
(3.5)
The Z-score method declares that the ith observation is an outlier if |Z|>3.
3.2.2. Modified Z-method
The modified Z-statistic (MZS) has been proposed for the detection of outliers based on the median [38]; MZS is defined as
Z∗i=yi−Median(yi)Median|yi−Median(yi)|,
(3.6)
where Median|.| represents the median absolute deviation from the median. One can conclude that the ith observation is an outlier if Z∗i>3.50.
3.2.3. Grubb's test
This test was introduced by Grubbs [13] for the detection of a single outlier in the univariate response variable. The Grubb's (G) test only detects single outliers. Therefore, it suspects that most of the observations are outliers. The G statistic can be defined as G=Max|yi−¯y|S, where s is the sample standard deviation. For our assumed model, the G statistic is modified as
G=|yi−¯y|S.
(3.7)
The decision rule of the G statistic for detecting outliers is given as G > √t2(a2n),n−2n−2+(t2(a2n),n−2), where a is the level of significance and n represents the sample size. The above decision rule may be unable to find the appropriate outlier. So, we propose another decision rule for Grubb's method to diagnose an outlier a GR model. For the ith data point, if G≥2, then this data point is declared as an outlier.
4.
Results and discussion
A comparison of the proposed methods of outlier detection with already available methods through the use of simulations and a real-life data set is presented in this section.
4.1. Simulation study
This section explains the simulation experiment conducted to study the performance of two types of GR residuals, i.e., the standardized and adjusted residuals to detect outliers. We generated the dependent variable of the GR model with reciprocal link function to be yi∼Gamma(μi,ϕ);i=1,...,n, where μi=(β0+β1xi1+...+β4xi4)−1, where xi's are generated from two probability distributions, i.e., uniform (0, 1) and standard normal distributions. The true values of the regression parameters vector β were selected as the normalized eigenvector corresponding to the largest eigenvalue of the XTˆWX matrix such that βTβ = 1 [39]. The sample sizes were generated as n=25,50,75,100,125,150,175and200 and we took on the values of dispersion, i.e., ϕ=0.33,0.67and2. A single outlier was generated for the dependent variable, i.e., the 8th observation was replaced as y8=y8+a0, where a0=¯yi+3(V(yi)). The multiple outliers in the dependent variable are generated as yii=yii+a0, where ii = 8, 15, 20 three outliers. The performance of these diagnostics was assessed by using gamma—produced samples for the identification of a generated outlier(s). The simulated results were computed with the help of the R-statistical language. The simulation study was replicated 1000 times to find the outliers in percentages.
In Table 1, the performance of our proposed methods is gauged for the detection of outliers based on the standard normal generated x's, and by using PRs with dispersion and sample sizes. It can be observed that for ϕ<1, the performance of the χAi method was better than that of the other methods in diagnosing the generated single outlier. For this dispersion, when sample sizes were increased to 100, the performance of the χ′i, χAi and χJ outlier detection methods was improved. The performance of the Z, Z∗ and G outlier diagnostic methods was not affected by the increase in sample size. It can also be observed that the performance of all outlier diagnostic methods increased with increasing in dispersion. Moreover, when ϕ=2 and n>50, the performance of all the diagnostic methods seems to have been identical. This indicates that sample size and dispersion have some significant effect on our proposed method χAi in the detection of a single outlying point. The comparison of the outlier diagnostic methods revealed that χAi is better than other methods. It can be seen that the detection was better with the proposed method than with all of the other diagnostic methods, when the values of X were generated from U(0,1). For further details, see Table 2. It can be seen that the performances of the available methods were not good as compared to the performance of our proposed method.
Table 1.
Comparison for single outlier detection methods for single outlier detection when x′s∼N(0,1).
When we applied these methods for the detection of multiple outlying points with standard normally generated independent variables, the performance of our proposed method seemed to be better and more consistent than those of the other methods. The detection performance of the other diagnostic methods was reduced to 50%. On the other hand, the outlying diagnostic performance of all methods was reduced to some extent when the independent variables were generated from the uniform distribution. The performance of all outlier diagnostic methods increased rapidly as the dispersion crossed 1.0 (see Tables 1–4). We found that our proposed method performed better than the other methods including the case when the X's were generated from the standard normal distribution instead of the x's being generated from the uniform distribution.
Table 3.
Comparison of outlier detection methods for multiple-outlier detection when x′s∼N(0,1).
Based on our findings (Tables 1–4), we can rank the methods as the adjusted PR being the first and the Grubb method being the second best method for outlier detection. Moreover, the results show that the performance of outlier detection increases with an increase in sample size. In studying the effect of dispersions on the outlier detection methods, we have found that outlier detection methods are affected directly by the dispersion. This means that upon increasing the dispersions, the outlier detection efficiencies of different methods are increased. The maximum numbers outliers were detected for larger dispersions.
4.2. Application: ARDENNES dataset
Now, we will evaluate the performance of the proposed methods with the help of a real application. For this purpose, we applied the ARDENNES data taken from Barnard et al. [40]. The main use for this data set was to determine the first etch biopsy, i.e., in the beginning of a layer of extracted incisor enamel (y) based on two explanatory variables for the data on 55 children. These explanatory variables included the etched depth (x1), which was estimated from the amount of calcium removed for the duration of the etch biopsy as the first explanatory variable. The age of the child (x2) which had been transformed to the decimal system from years and months was considered as the second explanatory variable. Mallet et al. [41] applied linear regression and other models to this data set using log (y) as a response variable. After fitting the LRM, they explored the outlier detection analysis of this data and found that the 23rd, 48th and 52nd points were the outliers. However, this data set is not well fitted to the normal distribution since the trend of the dependent variable is positively skewed. From the distribution of the fitting test, we observed that the GR model is well fitted to this data set, the results are reported in Table 5.
Table 5.
Fitting the Probability Distributions of the Response Variable.
where the square brackets contain the standard errors of the estimated parameters. The letter N represents the non-significance and S represents the significance of the regression coefficients. The fitted GR model achieved the following results MEP = 267.08, AIC = 847.19, and Pearson chi-square statistic = 11.61. After fitting the GR model, we next computed the outlier statistics, which are plotted in Figure 1.
Figure 1.
Index plots for outlier detection methods.
From Figure 1, we can observe that the proposed outlier detection methods detected the 9th, 23rd, 48th and 52nd points as outliers. The 9th point was not detected as an outlier in the original work due to misidentification of the regression model but this did not affect the GR estimates. Hossain and Naik [42] also indicated that an outlier may or may not affect the regression estimates. So, in this case this outlying point had minimal effect on the GR estimates while R2Efron also decreased because outliers are related to the response variable. R2Efron only improved due to elimination of the influential points which are related to the explanatory variables for more details see [5,43,44]. All detected outliers indicated that these children had high lead levels in their enamel, but there were no extreme points in the explanatory variables. Moreover, these outliers were also identified successfully by using the Jackknife residuals and adjusted PRs.
After deleting the identified outliers, the refitted GR model was given by
with MEP=169.85, AIC=756.15 and Pearsonchi−squarestatistic=6.89. These results indicate how much the variation decreased after deleting the identified outliers, e.g., the Pearson chi-square statistic (variation measure) was reduced to 59% (see Table 6). Another noticeable point is that the two independent variables were found to be significant after deleting the identified outliers. From Table 6, it can be observed that an individually high outlier is the 48th point, which affected the fitted GR results. Collectively, all detected outlying points affected the GR estimates of β0 and β2 respectively.
Table 6.
Change (%) in the GR results after deleting the outlying points.
Outlier detection in regression models is an important step to getting reliable and valid results. These detections methods are based on some diagnostic test statistics, which can be calculated based on the regression results. To detect the outliers, the first and most important step is the choice of an appropriate regression model, because, sometimes, outliers may arise due to an inappropriate regression model. For the selection of an appropriate regression model, one should test the distribution of the response variable. If the probability distribution of the dependent variable is gamma, the appropriate choice of model is the GR model. In this paper, we proposed outlier diagnostics based on the use of the Pearson (standardized and adjusted) residuals in GR model. Some available LRM-based outlier detection methods were modified for the GR model. These modified methods were compared with our proposed outlier detection methods with the help of simulations and a real data set. These results indicate that our proposed methods for the detection of outliers are better than available methods in terms of improving the results of the selected model to enable better decisions in statistics and other disciplines.
Acknowledgments
The authors would like to thank the Deanship of Scientific Research at Umm Al-Qura University for supporting this work by (Grant Code 22UQU4310063DSR05).
Conflict of interest
The authors declare that they have no conflicts of interest regarding the publication of this article.
References
[1]
M. Amin, M. Amanullah, M. Aslam, Empirical evaluation of the inverse Gaussian regression residuals for the assessment of influential points, J. Chemometr., 30 (2016), 394–404. https://doi.org/10.1002/cem.2805 doi: 10.1002/cem.2805
[2]
M. Meloun, J. Militký, Detection of single influential points in OLS regression model building, Anal. Chim. Acta, 439 (2001), 169–191. https://doi.org/10.1016/S0003-2670(01)01040-6 doi: 10.1016/S0003-2670(01)01040-6
[3]
K. A. Mogaji, Geoelectrical parameter-based multivariate regression borehole yield model for predicting aquifer yield in managing groundwater resource sustainability, J. Taibah Univ. Sci., 10 (2016), 584–600. https://doi.org/10.1016/j.jtusci.2015.12.006 doi: 10.1016/j.jtusci.2015.12.006
[4]
O. S. Alshamrani, Construction cost prediction model for conventional and sustainable college buildings in North America, J. Taibah Univ. Sci., 11 (2017), 315–323. https://doi.org/10.1016/j.jtusci.2016.01.004 doi: 10.1016/j.jtusci.2016.01.004
[5]
A. M. Sarhan, A. I. El-Gohary, A. Mustafa, A. H. Tolba, Statistical analysis of regression competing risks model with covariates using Weibull sub-distributions, Int. J. Reliab. Appl., 20 (2019), 73–88.
[6]
J. Burger, P. Geladi, Hyperspectral NIR image regression part Ⅱ: Dataset preprocessing diagnostics, J. Chemometr., 20 (2006), 106–119. https://doi.org/10.1002/cem.986 doi: 10.1002/cem.986
[7]
D. L. Massart, L. Kaufman, P. J. Rousseeuw, A. Leroy, Least median of squares: A robust method for outlier and model error detection in regression and calibration, Anal. Chem. Acta, 187 (1986), 171–179. https://doi.org/10.1016/S0003-2670(00)82910-4 doi: 10.1016/S0003-2670(00)82910-4
[8]
E. Hund, D. L. Massart, J. Smeyers-Verbeke, Robust regression and outlier detection in the evaluation of robustness tests with different experimental designs, Anal. Chem. Acta., 463 (2002), 53–73. https://doi.org/10.1016/S0003-2670(02)00337-9 doi: 10.1016/S0003-2670(02)00337-9
[9]
P. J. Rousseeuw, M. Debruyne, S. Engelen, M. Hubert, Robustness and outlier detection in chemometrics, Crit. Rev. Anal. Chem., 36 (2006), 221–242. https://doi.org/10.1080/10408340600969403 doi: 10.1080/10408340600969403
[10]
A. Desgagné, Efficient and robust estimation of regression and scale parameters, with outlier detection, Comput. Stat. Data Anal., 155 (2021), 1–19. https://doi.org/10.1016/j.csda.2020.107114 doi: 10.1016/j.csda.2020.107114
[11]
V. Barnett, T. Lewis, Outliers in statistical data, Chichester, UK: Wiley, 1994.
[12]
W. J. Dixon, Analysis of extreme values, Ann. Math. Stat., 21 (1950), 488–506.
[13]
F. E. Grubbs, Procedures for detecting outlying observations in samples, Technometrics, 11 (1969), 1–21.
[14]
B. Rosner, Percentage points for a generalized ESD many-outlier procedure, Technometrics, 25 (1983), 165–172.
[15]
U. Balasooriya, Y. K. Tse, Y. S. Liew, An empirical comparison of some statistics for identifying outliers and influential observations in linear regression models, J. Appl. Stat., 14 (1987), 177–184. https://doi.org/10.1080/02664768700000022 doi: 10.1080/02664768700000022
[16]
J. F. Lawless, Statistical models and methods for life time data, New York: Wiley, 2003.
[17]
D. Jearkpaporn, D. C. Montgomery, G. C. Runger, C. M. Borror, Model based process monitoring using robust generalized linear models, Int. J. Prod. Res., 43 (2005), 1337–1354. https://doi.org/10.1080/00207540412331299693 doi: 10.1080/00207540412331299693
[18]
M. L. Segond, C. Onof, H. S. Wheater, Spatial temporal disaggregation of daily rainfall from a generalized linear model, J. Hydrol., 331 (2006), 674–689. https://doi.org/10.1016/j.jhydrol.2006.06.019 doi: 10.1016/j.jhydrol.2006.06.019
[19]
R. N. Das, J. Kim, GLM and joint GML techniques in hydrogeology: An illustration, Int. J. Hydrol. Sci. Technol., 2 (2012), 185–201.
[20]
R. De Marco, F. Locatelli, I. Cerveri, M. Bugiani, A. Marinoni, G. Giammanco, Incidence and remission of asthma: A retrospective study on the natural history of asthma in Italy, J. Allergy Clin. Immun., 110 (2002), 228–235. https://doi.org/10.1067/mai.2002.125600 doi: 10.1067/mai.2002.125600
[21]
M. Faddy, N. Graves, A. Pettitt, Modeling length of stay in hospital and other right skewed data: Comparison of phase-type, Gamma and log-normal distributions, Value Health, 12 (2009), 309–314. https://doi.org/10.1111/j.1524-4733.2008.00421.x doi: 10.1111/j.1524-4733.2008.00421.x
[22]
Y. Murakami, T. Okamura, K. Nakamura, K. Miura, H. Ueshima, The clustering of cardiovascular disease risk factors and their impacts on annual medical expenditure in Japan: Community-based cost analysis using Gamma regression models, BMJ Open, 3 (2013), 1–6.
[23]
D. Griffie, L. James, S. Goetz, B. Balotti, Y. H. Shr, M. Corbin, et al., Outcomes and economic benefits of Penn State extension's dining with diabetes program, Prev. Chronic Dis., 15 (2018), 1–13. https://doi.org/10.5888/pcd15.170407 doi: 10.5888/pcd15.170407
[24]
N. Kumar, S. Lalitha, Testing for upper outliers in gamma sample, Commun. Stat.-Theory Methods, 41 (2012), 820–828. https://doi.org/10.1080/03610926.2010.531366 doi: 10.1080/03610926.2010.531366
[25]
M. J. Nooghabi, H. J. Nooghabi, P. Nasiri, Detecting outliers in gamma distribution, Commun. Stat. Theory Methods, 39 (2010), 698–706. https://doi.org/10.1080/03610920902783856 doi: 10.1080/03610920902783856
[26]
A. C. Kimber, Tests for a single outlier in a gamma sample with unknown shape and scale parameters, J. Roy. Stat. Soc. Ser. C, 28 (1979), 243–250. https://doi.org/10.2307/2347194 doi: 10.2307/2347194
[27]
A. C. Kimber, Discordancy testing in gamma samples with both parameters unknown, J. Roy. Stat. Soc. Ser. C, 32 (1983), 304–310. https://doi.org/10.2307/2347953 doi: 10.2307/2347953
[28]
T. Lewis, N. R. J. Fieller, A recursive algorithm for null distribution for outliers: I. Gamma samples, Technometrics, 21 (1979), 371–376.
[29]
M. A. Shayib, D. H. Young, The extreme residuals in gamma regression, Commun. Stat. Theory Methods, 20 (1991), 561–577. https://doi.org/10.1080/03610929108830515 doi: 10.1080/03610929108830515
[30]
G. C. Tiao, I. Guttman, Analysis of outliers with adjusted residuals, Technometrics, 9 (1967), 541–559.
[31]
G. M. Cordeiro, On Pearson's residuals in generalized linear models, Stat. Probabil. Lett., 66 (2004), 213–219. https://doi.org/10.1016/j.spl.2003.09.004 doi: 10.1016/j.spl.2003.09.004
[32]
M. R. Urbano, C. G. Demtrio, G. M. Cordeiro, On Wald residuals in generalized linear models, Commun. Stat. Theory Methods, 41 (2012), 741–758. https://doi.org/10.1080/03610926.2010.529537 doi: 10.1080/03610926.2010.529537
[33]
T. Anholeto, M. C. Sandoval, D. A. Botter, Adjusted Pearson residuals in beta regression models, J. Stat. Comput. Simul., 84 (2014), 999–1014. https://doi.org/10.1080/00949655.2012.736993 doi: 10.1080/00949655.2012.736993
[34]
J. W. Hardin, J. W. Hilbe, Generalized linear models and extensions, Stata Press Publication: Texas, 2012.
[35]
P. J. Green, Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives, J. Roy. Stat. Soc.: Ser. B, 46 (1984), 149–170. https://doi.org/10.1111/j.2517-6161.1984.tb01288.x doi: 10.1111/j.2517-6161.1984.tb01288.x
[36]
D. R. Cox, E. J. Snell, A general definition of residuals (with discussion), J. Roy. Stat. Soc.: Ser. B, 30 (1968), 248–275.
[37]
R. D. Cook, S. Weisberg, Residuals and influence in regression, Chapman Hall, New York, 1982.
[38]
B. Iglewicz, D. C. Hoaglin, How to detect and handle outliers, Milwaukee: ASQC Quality Press, 1993.
[39]
S. Ahmad, M. Aslam, Another proposal about the new two-parameter estimator for linear regression model with correlated regressors, Commun. Stat.-Simul. Comput., 51 (2022), 3054–3072. https://doi.org/10.1080/03610918.2019.1705975 doi: 10.1080/03610918.2019.1705975
[40]
T. E. Barnard, K. S. Booksh, R. G. Brereton, D. H. Coomans, S. N. Deming, Y. Hayashi, Chemometrics in environmental chemistry-statistical methods, Vol. 2, Springer-Verlag Berlin Heidelberg New York, 1995.
[41]
Y. L. Mallet, D. H. Coomans, O. Y. de Vel, Robust non-parametric methods in multiple regressions of environmental data, In: Chemometrics an environmental chemistry-statistical methods, 1995. https://doi.org/10.1007/978-3-540-49148-4_6
[42]
A. Hossain, D. N. Naik, A comparative study on detection of influential observations in linear regression, Stat. Pap., 32 (1991), 55–69. https://doi.org/10.1007/BF02925479 doi: 10.1007/BF02925479
[43]
T. A. Abushal, Parametric inference of Akash distribution with Type-Ⅱ censoring with analyzing of relief times of patients, AIMS Math., 6 (2021), 10789–10801. https://doi.org/10.3934/math.2021627 doi: 10.3934/math.2021627
[44]
T. A. Abushal, A. H. Abdel-Hamid, Inference on a new distribution under progressive-stress accelerated life tests and progressive type-Ⅱ censoring based on a series-parallel system, AIMS Math., 7 (2022), 425–454. https://doi.org/10.3934/math.2022028 doi: 10.3934/math.2022028
This article has been cited by:
1.
Ahlam H. Tolba, Abdisalam Hassan Muse, Aisha Fayomi, Hanan M. Baaqeel, Ehab M. Almetwally, Joydeep Bhattacharya,
The Gull Alpha Power Lomax distributions: Properties, simulation, and applications to modeling COVID-19 mortality rates,
2023,
18,
1932-6203,
e0283308,
10.1371/journal.pone.0283308
2.
Nasir Saleem, Atif Akbar,
Empirical evaluation of the gamma-Pareto regression residuals for influence diagnostics,
2024,
94,
0094-9655,
3573,
10.1080/00949655.2024.2397693
3.
Mintodê Nicodème Atchadé, Melchior A.G. N'bouké, Aliou Moussa Djibril, Aned Al Mutairi, Manahil SidAhmed Mustafa, Eslam Hussam, Hassan Alsuhabi, Said G. Nassr,
A new Topp-Leone Kumaraswamy Marshall-Olkin generated family of distributions with applications,
2024,
10,
24058440,
e24001,
10.1016/j.heliyon.2024.e24001
Muhammad Amin, Saima Afzal, Muhammad Nauman Akram, Abdisalam Hassan Muse, Ahlam H. Tolba, Tahani A. Abushal. Outlier detection in gamma regression using Pearson residuals: Simulation and an application[J]. AIMS Mathematics, 2022, 7(8): 15331-15347. doi: 10.3934/math.2022840
Muhammad Amin, Saima Afzal, Muhammad Nauman Akram, Abdisalam Hassan Muse, Ahlam H. Tolba, Tahani A. Abushal. Outlier detection in gamma regression using Pearson residuals: Simulation and an application[J]. AIMS Mathematics, 2022, 7(8): 15331-15347. doi: 10.3934/math.2022840