
We introduced the Gauss hypergeometric Gleser (GHG) distribution, a novel extension of the Gleser (G) distribution that unifies families of Gleser distributions. We studied their representations and some basic properties and showed that the GHG distribution is heavy-tailed. The maximum likelihood method is used for parameter estimation, and the Fisher information matrix derived. We assessed the performance of the maximum likelihood estimators via Monte Carlo simulations. Moreover, we present applications to two data sets in which the GHG distribution shows a better fit than other known distributions.
Citation: Neveka M. Olmos, Emilio Gómez-Déniz, Osvaldo Venegas. The Gauss hypergeometric Gleser distribution with applications to flood peaks exceedance and income data[J]. AIMS Mathematics, 2025, 10(6): 13575-13593. doi: 10.3934/math.2025611
[1] | Khaled M. Alqahtani, Mahmoud El-Morshedy, Hend S. Shahen, Mohamed S. Eliwa . A discrete extension of the Burr-Hatke distribution: Generalized hypergeometric functions, different inference techniques, simulation ranking with modeling and analysis of sustainable count data. AIMS Mathematics, 2024, 9(4): 9394-9418. doi: 10.3934/math.2024458 |
[2] | Alanazi Talal Abdulrahman, Khudhayr A. Rashedi, Tariq S. Alshammari, Eslam Hussam, Amirah Saeed Alharthi, Ramlah H Albayyat . A new extension of the Rayleigh distribution: Methodology, classical, and Bayes estimation, with application to industrial data. AIMS Mathematics, 2025, 10(2): 3710-3733. doi: 10.3934/math.2025172 |
[3] | Pingyun Li, Chuancun Yin . Tail risk measures with application for mixtures of elliptical distributions. AIMS Mathematics, 2022, 7(5): 8802-8821. doi: 10.3934/math.2022491 |
[4] | Guangshuai Zhou, Chuancun Yin . Family of extended mean mixtures of multivariate normal distributions: Properties, inference and applications. AIMS Mathematics, 2022, 7(7): 12390-12414. doi: 10.3934/math.2022688 |
[5] | Mahmoud Abul-Ez, Mohra Zayed, Ali Youssef . Further study on the conformable fractional Gauss hypergeometric function. AIMS Mathematics, 2021, 6(9): 10130-10163. doi: 10.3934/math.2021588 |
[6] | Naif Alotaibi, A. S. Al-Moisheer, Ibrahim Elbatal, Salem A. Alyami, Ahmed M. Gemeay, Ehab M. Almetwally . Bivariate step-stress accelerated life test for a new three-parameter model under progressive censored schemes with application in medical. AIMS Mathematics, 2024, 9(2): 3521-3558. doi: 10.3934/math.2024173 |
[7] | Ahmed Sedky Eldeeb, Muhammad Ahsan-ul-Haq, Mohamed S. Eliwa . A discrete Ramos-Louzada distribution for asymmetric and over-dispersed data with leptokurtic-shaped: Properties and various estimation techniques with inference. AIMS Mathematics, 2022, 7(2): 1726-1741. doi: 10.3934/math.2022099 |
[8] | Refah Alotaibi, Hassan Okasha, Hoda Rezk, Abdullah M. Almarashi, Mazen Nassar . On a new flexible Lomax distribution: statistical properties and estimation procedures with applications to engineering and medical data. AIMS Mathematics, 2021, 6(12): 13976-13999. doi: 10.3934/math.2021808 |
[9] | Tahani A. Abushal, Alaa H. Abdel-Hamid . Inference on a new distribution under progressive-stress accelerated life tests and progressive type-II censoring based on a series-parallel system. AIMS Mathematics, 2022, 7(1): 425-454. doi: 10.3934/math.2022028 |
[10] | Muajebah Hidan, Mohamed Akel, Hala Abd-Elmageed, Mohamed Abdalla . Solution of fractional kinetic equations involving extended (k,τ)-Gauss hypergeometric matrix functions. AIMS Mathematics, 2022, 7(8): 14474-14491. doi: 10.3934/math.2022798 |
We introduced the Gauss hypergeometric Gleser (GHG) distribution, a novel extension of the Gleser (G) distribution that unifies families of Gleser distributions. We studied their representations and some basic properties and showed that the GHG distribution is heavy-tailed. The maximum likelihood method is used for parameter estimation, and the Fisher information matrix derived. We assessed the performance of the maximum likelihood estimators via Monte Carlo simulations. Moreover, we present applications to two data sets in which the GHG distribution shows a better fit than other known distributions.
Distributions with heavy right tail are commonly used to model data in various areas of knowledge in which extreme observations generally occur, including environmental studies, geosciences, hydrology, actuarial sciences, economics, finance, product reliability assessment, and income distribution analysis. These datasets tend to be positive with positive asymmetry and heavy right tail (see [1]). Distributions with positive support and these characteristics are usually used to model such data. Areas where heavy-tailed distributions are applicable include engineering, physics, and finance; for other fields, we refer the reader to [2] and its references.
A frequently used distribution in statistical modeling is the Pareto model (see book by Arnold [3]), named after economist Vilfredo Pareto. A random variable X follows a Pareto distribution when its probability density function (pdf) is given by:
fX(x;α,β)=βαβxβ+1,x≥α, | (1.1) |
where α>0 is the scale parameter and β>0 the shape parameter. We denote this by X∼Pareto(α,β). The corresponding cumulative distribution function (cdf) is:
FX(x;α,β)=1−(αx)β,x≥α. | (1.2) |
Johnson et al. [4] classify several types of Pareto distributions, referring to the above model as Pareto Type Ⅰ. Beyond this standard form, various extensions have been proposed in the literature. For instance, Pickands [5] examines the generalized Pareto distribution for making statistical inferences about the upper tail of a distribution. Choulakian and Stephens [6] develop goodness-of-fit tests for this generalized model, while Davison and Smith [7] apply it to hydrological data, analyzing river-flow exceedances over 35 years.
As a further extension, Gupta et al. [8] introduce a family of distributions by raising any positive support CDF to a positive power. A special case arises when using the Pareto cdf (1.2), leading to the exponentiated Pareto (EP) distribution. A random variable X follows an EP distribution if its pdf is defined as:
fX(x;α,β,γ)=βγαβxβ+1(1−(αx)β)γ−1,x≥α, | (1.3) |
where α>0 is the scale parameter and β,γ>0 are the shape parameters. This model, denoted X∼ EP(α,β,γ), was first studied by Stoppa [9] (see also [10]), and is a particular case of the family studied by Gupta et al. [8]. Another generalization is the beta-Pareto distribution introduced by Akinsete et al. [11], and Boumaraf et al. [12] perform several optimization methods on the beta-Pareto distribution. For a comprehensive review of Pareto models (see Arnold's book [3]).
Andrews and Mallows [13] conducted research on normal distribution scale mixtures, resulting in the creation of a class of heavy-tailed distributions (including notable examples like Student's t-distribution). These distribution families are extensively applied in robust statistical analysis for symmetrical datasets.
A scale mixture distribution is formed by combining a base probability density with a scaling distribution. The mathematical representation of this mixture takes the form:
fZ(z)=∫fZ|U=u(z;κ(u))fU(u)du, |
where fZ|U=u is the conditional distribution of the random variable Z given U=u, while κ(u) represents a positive function of the random variable U with density function fU, which influences the scale of the random variable Z. A particularly noteworthy case is the Slash distribution (see [4]), where Z|U=u∼N(0,κ(u)), U∼Uniform(0,1) and κ(u)=u1/q. This family's theoretical properties, inferential methods, and extensions have been extensively studied by Rogers and Tukey [14] and Mosteller and Tukey [15], while maximum likelihood (ML) estimation of its location and scale parameters are addressed in [16,17,18]. The same idea has been applied when the conditional distribution is with positive support, for example, to extend the Birnbaum-Saunders distribution (see [19]), and to extend the generalized half-normal distribution (see [20]).
The beta function (B) plays a fundamental role in this paper. It is defined as:
B(a,b)=∫10ta−1(1−t)b−1dt=∫∞0ta−1(1+t)a+bdt=Γ(a)Γ(b)Γ(a+b), | (1.4) |
where a>0, b>0, and Γ(⋅) denotes the gamma function. A random variable Y follows a beta distribution with parameters a and b if its pdf is given by:
fY(y;a,b)=1B(a,b)ya−1(1−y)b−1,0<y<1, | (1.5) |
where a>0 and b>0.
The incomplete beta function, denoted by B(y;a,b), is defined as:
B(y;a,b)=∫y0ta−1(1−t)b−1dt,0<y<1, | (1.6) |
where a>0 and b>0. A related function is the regularized incomplete beta function, Iy(a,b), expressed as Iy(a,b)=B(y;a,b)/B(a,b).
Also used in this paper is the Gauss hypergeometric function (see [21]); it is denoted by 2F1, and is given by
2F1(a,b,c;x)=Γ(c)Γ(b)Γ(c−b)∫10vb−1(1−v)c−b−1(1−xv)−adv, | (1.7) |
where a,b,c>0 and c>b.
Lerch transcendent function (Φ), is also used in this paper, and can be expressed as:
Φ(−z,s,a)=1Γ(s)∫10xa−1logs−1(1x)1+zxdx, | (1.8) |
where a>0, s>0 and z>−1.
A random variable X follows a G distribution with parameters σ and α when its density function is expressed as:
fX(x;σ,α)=σαx−αB(¯α,α)(σ+x),x>0, | (1.9) |
with scale parameter σ>0, shape parameter 0<α<1, and ¯α=1−α. The notation X∼G(σ,α) indicates this distribution, which possesses these key characteristics:
a) The G distribution is unimodal, its mode located at zero.
b) For Y∼Beta(1−α,α), the transformed variable σY1−Y follows a G(σ,α) distribution.
c) The cdf of X is expressed as:
FX(x;σ,α)=Iy(¯α,α),x>0, | (1.10) |
where y=xσ+x and Iy(⋅,⋅) is the regularised incomplete beta function.
This distribution was first introduced by Gleser [22] and later studied by Olmos et al. [23]. More recently, Olmos et al. [24] proposed a scale mixture of the G and Beta distributions, naming it scale mixture of Gleser (SMG) distribution. They show that the SMG distribution exhibits a heavy right tail, making it a viable alternative to other heavy-tailed distributions such as the Pareto distribution.
A random variable X follows a SMG distribution with parameters σ and α when its density function is:
fX(x;σ,α)=ασαB(¯α,α)x−(α+1)log(1+xσ),x>0, | (1.11) |
where σ>0 represents the scale parameter and 0<α<1 the shape parameter. We denote this by X∼SMG(σ,α).
The main motivation of this work is to extend the study of the G distribution by pursuing the following objectives. First, we aim to introduce the GHG distribution, generated from a scale mixture of the G and Beta distributions, similar to how the SMG distribution was derived. Notably, both the SMG and G distributions emerge as special cases of the GHG distribution. Second, we conduct a comprehensive study of the GHG distribution, demonstrating its applicability as an alternative model for actuarial data (e.g., insurance claims, income, and expenses) and hydrological data (e.g., flood peak exceedances), among other fields.
The article is organized as follows: In Section 2, we give a representation and the density of the extension of the G distribution, and its basic properties. In Section 3, we carry out an inference by the ML method, with a simulation study and the Fisher information matrix. Section 4 contains two applications to real data. In Section 5, we offer some conclusions.
In this section, we present the mathematical representation, pdf, key properties, and graphical illustrations of the GHG distribution.
The pdf of the GHG distribution is formally defined below, including several special cases.
Definition 2.1. Let Z∼GHG(σ,α,β). Then, the pdf of Z is given by
fZ(z;σ,α,β)=βB(¯α+β,1)σ¯αB(¯α,α)z−α2F1(1,¯α+β,¯α+β+1;−zσ),z>0, | (2.1) |
where σ>0 represents the scale parameter, 0<α<1 and β>0 are shape parameters and 2F1 is the Gauss hypergeometric function.
Remark 1. It is verified as the following:
(1) If β<α, then the GHG distribution can be written as
fZ(z;σ,α,β)=βσβz−(β+1)B(¯α,α)B(zσ+z,¯α+β,α−β),z>0. | (2.2) |
(2) If β=α, the SMG(σ,α) distribution is obtained, introduced by Olmos et al. [24].
(3) If zσ<1 y using the Lerch transcendent function we obtain
fZ(z;σ,α,β)=βσ−¯αz−αB(¯α,α)Φ(−zσ,1,¯α+β),z>0. | (2.3) |
Figure 1 displays the density plots of the GHG distribution for σ=1, α=0.5 and two values of β, while Table 1 presents the values of P(Z>z) for various values of z in the distributions mentioned. It can be seen that the right tail of the GHG distribution is heavier as the value of parameter β decreases.
Distribution | P(Z>4) | P(Z>5) | P(Z>6) |
GHG(1, 0.5, 1) | 0.437 | 0.406 | 0.381 |
GHG(1, 0.5, 3) | 0.341 | 0.311 | 0.288 |
GHG(1, 0.5, 10) | 0.308 | 0.280 | 0.258 |
In this subsection, we show some properties of the GHG distribution.
Proposition 2.1. Let Z|U=u∼G(σu−1,α) and U∼Beta(β,1) then Z∼GHG(σ,α,β).
Proof. Proceeding straight to the integral calculation, we obtain:
fZ(z;σ,α,β)=∫10fZ|U(z)fU(u)du=∫10(σu−1)αz−αB(¯α,α)(σu−1+z)βuβ−1du=βz−αB(¯α,α)σ¯α∫10uβ−α1+zσudu. |
The result follows from applying (1.7).
Proposition 2.2. Let X∼G(1,α) and Y∼Beta(β,1) be independent random variables. Then, Z=σXY∼GHG(σ,α,β).
Proof. Using the stochastic representation Z=σXY and applying the Jacobian transformation, we obtain the desired distribution.
Proposition 2.3. Let Z∼GHG(σ,α,β). If β→∞ then Z converges in law to a random variable Z∼G(σ,α).
Proof. Let Z∼GHG(σ,α,β). Then, by the representation given in Proposition 2.2, we have that Z=σXY. First, we study the probability convergence of Y.
When Y∼Beta(β,1), we have that E[(Y−1)2]=2(β+1)(β+2), if β→∞, then E[(Y−1)2]=2(β+1)(β+2)→0; this implies that YP→1. Applying Slutsky's Lemma (see the book by Lehmann [25]), we obtain the result.
Remark 2. The result of Proposition 2.1 shows that this distribution arises from a scale mixture of the G and Beta distributions, while Proposition 2.3 indicates that when β→∞ in the GHG distribution, the G distribution is obtained. Figure 2 shows the relationship between the three distributions; we observe that the GHG distribution contains the SMG and G distributions as particular cases.
Proposition 2.4. Let Z∼GHG(σ,α,β). Then, the cdf of Z is given by
FZ(z;σ,α,β)=Iy(¯α,α)−zβfZ(z;σ,α,β),z>0, | (2.4) |
where σ>0, 0<α<1, β>0, y=zσ+z and Iy(,) is the regularised incomplete beta function.
Proof. Applying the definition of the cdf directly, we obtain
FZ(z;σ,α,β)=βB(¯α+β,1)σ¯αB(¯α,α)∫z0t−α2F1(1,¯α+β,¯α+β+1;−tσ)dt,=βσ¯αB(¯α,α)∫z0t−α∫10v¯α+β−11+tvσdvdt,=βB(¯α,α)∫10vβ−1(∫vzσ0w−α1+wdw)dv. |
Hence by integrating by parts and considering u=∫vzσ0w−α1+wdw, we obtain the result.
Corollary 2.1. Let Z∼GHG(σ,α,β), the survival function ¯FX(t) defined by ¯FX(t)=1−FX(t), is given by
¯FZ(t;σ,α,β)=1−Iy(¯α,α)+tβfZ(t;σ,α,β). |
The hazard function h(t), is defined by h(t)=fX(t)¯FX(t), for a random variable GHG is given by
h(t;σ,α,β)=fZ(t;σ,α,β)1−Iy(¯α,α)+tβfZ(t;σ,α,β),t>0, |
where y=tσ+t.
Figure 3 displays the hazard rate function's behavior for varying β parameters, with σ fixed at 1 and α at 0.5.
In the simulation study and in parameter estimation by the ML method, we considered the GHG distribution with only two parameters, i.e., modifying Proposition 2.1 as follows: Z|U=u∼G(u−1,α) and U∼Beta(β,1), and the following pdf of the random variable Z is the result.
fZ(z;β,α)=βB(¯α+β,1)B(¯α,α)z−α2F1(1,¯α+β,¯α+β+1;−z),z>0, | (2.5) |
where β>0 and 0<α<1 are shape parameters. This reduction in the parametric space aims to obtain a two-parameter distribution that serves as an alternative to the Pareto distribution and other two-parameter heavy right-tailed distributions. The random variable U considers parameter β, which influences the scale in the G distribution since it is where the mixing takes place. When including parameter σ, we observe that it can lead to an overestimation of β for small sample sizes.
For generating random numbers from the GHG model, two approaches are available: Algorithm 1 uses Proposition 2.1 and the composition method (see [26]), whereas Algorithm 2 employs the representation from Proposition 2.2.
Algorithm 1 Simulates Z∼GHG(β,α) as follows: |
Step 1: Generate U∼Beta(β,1) Step 2: Compute Z|U=u∼G(u−1,α) |
Algorithm 2 Simulates Z∼GHG(β,α) as follows: |
Step 1: Generate V∼Beta(1−α,α)
Step 2: Compute X=V1−V Step 3: Generate Y∼Beta(β,1) Step 4: Compute Z=XY |
Proposition 2.5. Let T∼GHG(β,α). Then, the hazard function of T is decreasing for all t>0.
Proof. Applying item b) of Glaser's Theorem [27], we defined the function:
η(t)=−f′(t)f(t)=αt+(¯α+β¯α+β+1)2F1(2,¯α+β+1,¯α+β+2;−t), |
where f(t) is the pdf given in (2.5). Their first derivative and sign are:
η′(t)=−(αt2+2(¯α+β¯α+β+2)2F1(3,¯α+β+2,¯α+β+3;−t))<0,∀t>0. |
This completes the proof by Glaser's Theorem.
The following Proposition shows that no moments exist for the GHG distribution; this is quite common in heavy-tailed distributions.
Proposition 2.6. For the random variable Z∼GHG(β,α), the r-th moments does not exist when r≥α.
Proof. We begin with the change of variable t=zw and apply integrating by parts with u=∫z0tβ−α1+tdt. This yields:
E(Zr)=βB(¯α,α)∫∞0zr−α∫10wβ−α1+zwdwdz=βB(¯α,α)∫∞0zr−β−1∫z0tβ−α1+tdtdz=zr−βr−β∫z0tβ−α1+tdt|∞0−1r−β∫∞0zr−α1+zdz. |
The left-hand side expression vanishes for r<β and does not exist when r≥β. As demonstrated in Proposition 1(e) of Olmos et al. [23], the right-hand integral diverges. Consequently, the expression diverges, proving that the r-th moments of Z∼GHG(β,α) do not exist.
Remark 3. By Proposition 2.2, the existence of the r-th moment of Z reduces to that of X, which does not exist (see Olmos et al. [23]).
In approximate terms, heavy-tailed distributions are those whose tails decay at a slower than exponential rate. The exponential distribution is regarded as the limit between heavy and light tails. In the areas of knowledge where they are applied, for example in finance and actuarial sciences, the data are positive and have a heavy right tail. The Pareto distribution has been widely used to model financial datasets, but in many cases, it does not provide a good fit. New distributions with heavy right tail are therefore of interest. Such models are crucial for representing losses in insurance, reinsurance, and catastrophic risk. Formally, a distribution of probability with cdf F(x), on the real line, is said to have a heavy right tail (see [28]) if limsupx→∞(−log(¯FX(x))/x)=0.
Regular variation theory plays a fundamental role in extreme value analysis (see for example [29,30,31,32], etc.). This concept is formalized as follows:
Definition 2.2. A distribution function is called regular varying at infinity with index −β (where β≥0) if:
limx→∞¯FX(tx)¯FX(x)=t−β. |
Here, β is termed the tail index.
The GHG distribution exhibits this important property:
Proposition 2.7. The survival function of Z∼GHG(β,α), has regularly varying tails.
Proof. Applying the above definition and L'Hospital's Rule, we have that
limz→∞¯FZ(tz)¯FZ(z)=limz→∞1−F(tz)1−F(z)=t−α+1limz→∞∫10wβ−α1+tzwdw∫10wβ−α1+zwdw=t−βlimz→∞∫tz0uβ−α1+udu∫z0uβ−α1+udu=t−β, |
and taking into account that β>0 the result follows.
As the GHG distribution is of regularly varying tails, it is therefore heavy-tailed (see [29]).
In this section, we describe ML estimation for the GHG model and include a simulation study assessing the ML estimators' behavior.
Let z1,…,zn be a random sample from the GHG(β,α) distribution. The log-likelihood is given by:
ℓ(β,α)=c(β,α)−αn∑i=1log(zi)+n∑i=1log(2F1(1,¯α+β,¯α+β+1;−zi)), | (3.1) |
where c(β,α)=nlog(β)−nlog(¯α+β)−nlog(Γ(¯α))−nlog(Γ(α)).
By equating the first partial derivatives (w.r.t. each parameter) to zero, we obtain the following system of equations:
(¯α+β)n∑i=1Φ(−zi,2,¯α+β)2F1(1,¯α+β,¯α+β+1;−zi)=nβ, | (3.2) |
(¯α+β)n∑i=1Φ(−zi,2,¯α+β)2F1(1,¯α+β,¯α+β+1;−zi)−n∑i=1log(zi)+nψ(¯α)=nψ(α), | (3.3) |
where ψ(⋅) is the digamma function. From (3.2) and (3.3) we derive:
ˆβ(ˆα)=n∑ni=1log(zi)+nψ(ˆα)−nψ(1−ˆα). | (3.4) |
The ML estimator for α (ˆα) is obtained by numerically solving:
(1−ˆα+ˆβ(ˆα))n∑i=1Φ(−zi,2,1−ˆα+ˆβ(ˆα))2F1(1,1−ˆα+ˆβ(ˆα),2−ˆα+ˆβ(ˆα);−zi)=nˆβ(ˆα), | (3.5) |
The estimator ˆα corresponds to the solution of Eq (3.5), which when substituted into (3.4) yields ˆβ. Equation (3.5) can be solved using numerical methods such as the Newton-Raphson algorithm. Alternatively, both estimates can be obtained by directly maximizing the log-likelihood surface defined in (3.1) using the optim subroutine in the R software package [33].
For a random variable Z∼GHG(β,α), the log-likelihood function corresponding to a single observation z of Z and parameter vector $θθ$=(β,α) is expressed as:
ℓ(θ)=log(β)−log(¯α+β)−log(B(¯α,α))−αlog(z)+log(2F1(1,¯α+β,¯α+β+1;−z)). |
The Appendix contains derivations of both first and second partial derivatives of this log-likelihood function. The Fisher information matrix, denoted IF(⋅), for the G distribution takes the form:
IF(θ)=(1β2−2η31+η222η31−η222η31−η22ψ′(α)+ψ′(¯α)−η31+η22), |
where ψ′(⋅) is the trigamma function and ηij=E(κ0Φ(−Z,i,κ0)2F1(1,κ0,κ1;−Z))j are calculated numerically.
For large samples, the ML estimator ˆθ is asymptotically normal bivariate:
√n(ˆθ−θ)L⟶N2(0,IF(θ)−1). |
Consequently, the asymptotic variance of ˆθ equals the inverse of IF(θ). In practice, since the parameters are typically unknown, we employ the observed information matrix with ML-estimated parameters.
To examine the behavior of estimation by ML, we conducted a simulation study to assess the performance of the estimation method for the β and α parameters of the GHG distribution. Using Algorithm 2 given in Subsection 2.2 to generate random numbers from the GHG distribution, the simulation analysis was performed by generating 1000 samples of sizes n=20, 100 and 300 from the GHG distribution. Table 2 shows the empirical bias (Bias), medium bias (MBias), mean of the standard errors (SE), and root of the empirical mean squared error (RMSE). As shown in Table 2, the estimation performance improves as n increases.
True values | Estimator | n=20 | n=100 | n=300 | ||||||||||
α | β | Bias | MBias | SE | RMSE | Bias | MBias | SE | RMSE | Bias | MBias | SE | RMSE | |
0.25 | 0.5 | ˆα | −0.0124 | −0.0213 | 0.0606 | 0.0542 | −0.0277 | −0.0291 | 0.0233 | 0.0355 | −0.0292 | −0.0296 | 0.0124 | 0.0320 |
ˆβ | 0.2411 | 0.1496 | 0.6230 | 0.3956 | 0.2930 | 0.2213 | 0.2248 | 0.3750 | 0.2615 | 0.2322 | 0.0747 | 0.2963 | ||
0.25 | 1.0 | ˆα | 0.0230 | 0.0173 | 0.0723 | 0.0680 | 0.0002 | −0.0013 | 0.0298 | 0.0289 | −0.0031 | −0.0039 | 0.0162 | 0.0183 |
ˆβ | 0.0484 | −0.2142 | 15.138 | 0.6845 | 0.2650 | −0.0001 | 10.147 | 0.7274 | 0.2239 | 0.0406 | 0.4547 | 0.5718 | ||
0.5 | 0.5 | ˆα | −0.0346 | −0.0318 | 0.1004 | 0.0986 | −0.0447 | −0.0432 | 0.0406 | 0.0728 | −0.0473 | −0.0428 | 0.0188 | 0.0618 |
ˆβ | 0.2436 | 0.1512 | 0.6225 | 0.4308 | 0.2234 | 0.1677 | 0.2024 | 0.3440 | 0.1973 | 0.1543 | 0.0663 | 0.2879 | ||
0.5 | 1.0 | ˆα | 0.0103 | 0.0032 | 0.1067 | 0.0893 | −0.0051 | −0.0067 | 0.0517 | 0.0502 | −0.0101 | −0.0097 | 0.0282 | 0.0328 |
ˆβ | 0.2108 | −0.0227 | 18.389 | 0.7911 | 0.2919 | 0.0633 | 10.861 | 0.7391 | 0.2592 | 0.0624 | 0.5843 | 0.6181 | ||
0.8 | 0.5 | ˆα | −0.0204 | −0.0151 | 0.0554 | 0.0546 | −0.0092 | −0.0083 | 0.0230 | 0.0253 | −0.0045 | −0.0041 | 0.0129 | 0.0133 |
ˆβ | 0.1773 | 0.0777 | 0.4606 | 0.3622 | 0.0733 | 0.0474 | 0.1234 | 0.1473 | 0.0395 | 0.0334 | 0.0622 | 0.0693 | ||
0.8 | 1.0 | ˆα | −0.0060 | 0.0028 | 0.0517 | 0.0484 | −0.0023 | −0.0001 | 0.0227 | 0.0241 | −0.0012 | −0.0010 | 0.0126 | 0.0136 |
ˆβ | 0.2104 | −0.0373 | 16.475 | 0.7766 | 0.2051 | 0.0264 | 0.6920 | 0.6181 | 0.1027 | 0.0168 | 0.2561 | 0.3444 |
In this section, we analyze two real-data applications, comparing the fits of several distributions against the GHG distribution. Model selection is performed using four information criteria: AIC (Akaike Information Criterion; [34]), BIC (Bayesian Information Criterion; [35]), CAIC (Consistent AIC; [36]) and HQIC (Hannan-Quinn Information Criterion; [37]). These criteria are defined as: AIC=−2ˆℓ(⋅)+2p, BIC=−2ˆℓ(⋅)+log(n)p, CAIC=−2ˆℓ(⋅)+(log(n)+1)p=BIC+p, and HQIC=−2ˆℓ(⋅)+2log(log(n))p, where p denotes the number of parameters in the model.
The data correspond to the exceedances of flood peaks (in m3/s) of the Wheaton River near Carcross in Yukon Territory, Canada. The data consist of 72 exceedances for the years 1958-1984, rounded to one decimal place (see [6,11,38]), and are listed in Table 3. Table 4 presents descriptive statistics, including the sample skewness coefficient (CS) and kurtosis coefficient (CK). We compare the fits of the EP and Pareto distributions, with the fit of the GHG distribution with two parameters given in (2.5).
1.7 | 2.2 | 14.4 | 1.1 | 0.4 | 20.6 | 5.3 | 0.7 | 1.9 | 13.0 | 12.0 | 9.3 |
1.4 | 18.7 | 8.5 | 25.5 | 11.6 | 14.1 | 22.1 | 1.1 | 2.5 | 14.4 | 1.7 | 37.6 |
0.6 | 2.2 | 39.0 | 0.3 | 15.0 | 11.0 | 7.3 | 22.9 | 1.7 | 0.1 | 1.1 | 0.6 |
9.0 | 1.7 | 7.0 | 20.1 | 0.4 | 2.8 | 14.1 | 9.9 | 10.4 | 10.7 | 30.0 | 3.6 |
5.6 | 30.8 | 13.3 | 4.2 | 25.5 | 3.4 | 11.9 | 21.5 | 27.6 | 36.4 | 2.7 | 64.0 |
1.5 | 2.5 | 27.4 | 1.0 | 27.1 | 20.2 | 16.8 | 5.3 | 9.7 | 27.5 | 2.5 | 27.0 |
n | Median | Mean | SD | CS | CK |
72 | 9.5 | 12.204 | 12.297 | 1.473 | 5.890 |
The box plot in Figure 4 shows one very extreme datum. This outlier makes the right tail heavier. While most data points cluster around 10 flood peak exceedances (m3/s), there is one notable outlier at 64 exceedances (m3/s).
Table 5 presents the ML estimates for the GHG, EP, and Pareto model parameters, along with their corresponding AIC, BIC, CAIC, and HQIC values.
Model | ML estimates | AIC | BIC | CAIC | HQIC |
GHG(β,α) | ˆβ=1.236(0.465), ˆα=0.403(0.045) | 569.8 | 574.3 | 576.3 | 571.6 |
EP(α,β,γ) | ˆα=0.1, ˆβ=0.424(0.046), ˆγ=2.880(0.491) | 578.6 | 583.2 | 586.2 | 581.3 |
Pareto(α,β) | ˆα=0.1, ˆβ=0.244(0.029) | 608.1 | 610.4 | 611.4 | 609.0 |
The GHG model demonstrates superior fit to the data, as evidenced by its consistently lower AIC, BIC, CAIC, and HQIC values compared to both EP and Pareto models.
In Figure 5, the left side displays the histogram of the data along with the curves of the respective fitted distributions. It is evident that the GHG distribution provides a better fit for the exceedances of flood peak data compared to the EP and Pareto distributions. On the right side, there is a zoomed-in view of the right tail of the histogram, which more conclusively demonstrates that the GHG distribution concentrates more probability in the right tail than the other two distributions for this data set.
The dataset used in this application comes from the Survey of Consumer Finances (SCF), a nationally representative sample containing extensive information on assets, liabilities, income, and demographic characteristics of respondents (potential U.S. customers). We analyzed a random sample of 500 income-positive U.S. households interviewed in the 2004 survey. The variable of interest is each household's annual income in US dollars. The data can be accessed from the Federal Reserve's webpage: https://www.federalreserve.gov/econres/scfindex.htm. Descriptive statistics for these data are presented in Table 6. We compared the fits of the SMG and Pareto distributions with that of the three-parameter GHG distribution given in (2.1).
n | Median | Mean | SD | CS | CK |
500 | 54000 | 321021.9 | 3410936 | 21.127 | 461.575 |
Figure 6 displays a boxplot revealing several extreme values. These outliers create a heavy right tail in the distribution. Notably, while most observations cluster around 5,400 (representing typical household income), there is an extreme value of 75 million in family income.
Table 7 presents the ML estimates for the parameters and log-likelihood values of the GHG, SMG, and Pareto models. Table 8 displays the corresponding values for the AIC, BIC, CAIC, and HQIC criteria for each model.
Model | ML estimates | Log-likelihood |
GHG(σ,α,β) | ˆσ=56417.5(2174.647), ˆα=0.5(0.015), ˆβ=756.154(62.525) | -6466.192 |
SMG(σ,α) | ˆσ=22584.770(4027.346), ˆα=0.581(0.018) | -6541.776 |
Pareto(α,β) | ˆα=260, ˆβ=0.186(0.008) | -6802.113 |
Model | AIC | BIC | CAIC | HQIC |
GHG(σ,α,β) | 12938.38 | 12951.03 | 12954.03 | 12943.35 |
SMG(σ,α) | 13087.55 | 13095.98 | 13097.98 | 13090.86 |
Pareto(α,β) | 13606.23 | 13610.44 | 13611.44 | 13606.05 |
We observed that the GHG model yields the lowest values for the AIC, BIC, CAIC, and HQIC criteria, indicating that it provides a better fit to the data compared to both the SMG and Pareto models.
Figure 7 shows the empirical cdf with estimated GHG, SMG, and Pareto cdf's, which also shows a favorable agreement between the GHG model and the income data.
In this work, we introduce a new heavy-tailed distribution through a scale mixture extension of the G distribution, combining it with the Beta distribution to create the GHG model. We study some properties, perform parameter estimation by the ML method, and show two applications to real data. The GHG distribution contains the G and SMG distributions as special cases. This is of interest, since it can be used in actuarial sciences and related fields. The GHG model is a viable alternative for fitting data with extreme observations. Some other properties of the GHG model are:
● The GHG distribution exhibits a heavy right tail.
● The GHG distribution has two representations, as presented in Propositions 2.1 and 2.2.
● The pdf, cdf, and hazard function have explicit forms expressed through known special functions, such as the Gauss hypergeometric function.
● The hazards function is monotonically decreasing, which provides information about the distribution's tail behavior (specifically, its heavy right tail; see [28]).
● In the first application, we demonstrate that the GHG distribution effectively models flood peak exceedance data, outperforming two well-known distributions: the EP and Pareto distributions.
● In the second application, we show that the three-parameter GHG distribution provides a good fit for income data, surpassing the SMG and Pareto distributions.
● A more comprehensive estimation study of the GHG distribution's three parameters will be addressed in future work, where we will analyze the effects and limitations of the σ and β parameters in the ML estimation method.
Conceptualization: N.M. Olmos and E. Gómez-Déniz; methodology: N.M. Olmos and E. Gómez-Déniz; software: N.M. Olmos and E. Gómez-Déniz; validation: N.M. Olmos, E. Gómez-Déniz, and O. Venegas; formal analysis: N.M. Olmos and O. Venegas; investigation: E. Gómez-Déniz and N.M. Olmos; writing---original draft preparation: N.M. Olmos and E. Gómez-Déniz; writing---review and editing: O. Venegas and E. Gómez-Déniz. All authors have read and approved the final version of the manuscript for publication.
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
EGD was partially funded by grant PID2021-127989OB-I00 (Ministerio de Economía y Competitividad, Spain) and by grant TUR-RETOS2022-075 (Ministerio de Industria, Comercio y Turismo).
Prof. Emilio Gómez-Déniz is the Guest Editor of special issue "Advances in Probability Distributions and Social Science Statistics" for AIMS Mathematics. Prof. Emilio Gómez-Déniz was not involved in the editorial review and the decision to publish this article.
The authors declare no conflicts of interest.
The first-order partial derivatives of ℓ(θ) with respect to θ are given by:
∂ℓ(θ)∂β=1β−κ0Φ(−z,2,κ0)2F1(1,κ0,κ1;−z),∂ℓ(θ)∂α=ψ(¯α)−ψ(α)−log(z)+κ0Φ(−z,2,κ0)2F1(1,κ0,κ1;−z). |
The second-order partial derivatives of l($θθ$) with respect to θ are given by:
∂2ℓ(θ)∂β2=−1β2+2κ0Φ(−z,3,κ0)2F1(1,κ0,κ1;−z)−(κ0Φ(−z,2,κ0)2F1(1,κ0,κ1;−z))2,∂2ℓ(θ)∂β∂α=(κ0Φ(−z,2,κ0)2F1(1,κ0,κ1;−z))2−2κ0Φ(−z,3,κ0)2F1(1,κ0,κ1;−z),∂2ℓ(θ)∂α2=−ψ′(¯α)−ψ′(α)+2κ0Φ(−z,3,κ0)2F1(1,κ0,κ1;−z)−(κ0Φ(−z,2,κ0)2F1(1,κ0,κ1;−z))2, |
where κi=¯α+β+i, ψ(⋅) and ψ′(⋅) are the digamma and trigamma functions, respectively.
[1] | R. Ibragimov, A. Prokhorov, Heavy tails and copulas: topics in dependence modelling in economics and finance, Singapore: World Scientific, 2017. https://doi.org/10.1142/9644 |
[2] |
I. B. Aban, M. M. Meerschaert, A. K. Panorska, Parameter estimation for the truncated Pareto distribution, J. Am. Stat. Assoc., 101 (2006), 270–277. https://doi.org/10.1198/016214505000000411 doi: 10.1198/016214505000000411
![]() |
[3] | B. C. Arnold, Pareto distributions, 2 Eds., Boca Raton: Chapman & Hall, 2015. https://doi.org/10.1201/b18141 |
[4] | N. L. Johnson, S. Kotz, N. Balakrishnan, Continuous univariate distributions, 2 Eds., Vol. 1, New York: Wiley, 1995. |
[5] |
J. Pickands, Statistical inference using extreme order statistics, Ann. Statist., 3 (1975), 119–131. https://doi.org/10.1214/aos/1176343003 doi: 10.1214/aos/1176343003
![]() |
[6] | V. Choulakian, M. A. Stephens, Goodness-of-fit for the generalized Pareto distribution, Technometrics, 43 (2001), 478–484. |
[7] | A. C. Davison, R. L. Smith, Models for exceedances over high thresholds, J. R. Stat. Soc., Ser. B, 52 (1990), 393–442. |
[8] |
R. C. Gupta, P. L. Gupta, R. D. Gupta, Modeling failure time data by Lehman alternatives, Commun. Stat. Theory Methods, 27 (1998), 887–904. https://doi.org/10.1080/03610929808832134 doi: 10.1080/03610929808832134
![]() |
[9] | G. Stoppa, Proprieta campionarie di un nuovo modello Pareto generalizzato, Proceedings of the Atti XXXV Riunione Scientifica della Societa Italiana di Statistica, Cedam: Padova, Italy, 1990,137–144. |
[10] | C. Kleiber, S. Kotz, Statistical size distributions in economics and actuarial sciences, Haboken: John Wiley & Sons, 2003. https://doi.org/10.1002/0471457175 |
[11] | A. Akinsete, F. Famoye, C. Lee, The Beta-Pareto distribution, Statistics, 42 (2008), 547–563. |
[12] |
B. Boumaraf, N. Seddik-Ameur, V. S. Barbu, Estimation of Beta-Pareto distribution based on several optimization methods, Mathematics, 8 (2020), 1055. https://doi.org/10.3390/math8071055 doi: 10.3390/math8071055
![]() |
[13] |
D. F. Andrews, C. L. Mallows, Scale mixtures of normal distributions, J. Roy. Stat Soc.: Ser. B, 36 (1974), 99–102. https://doi.org/10.1111/j.2517-6161.1974.tb00989.x doi: 10.1111/j.2517-6161.1974.tb00989.x
![]() |
[14] |
W. H. Rogers, J. W. Tukey, Understanding some long-tailed symmetrical distributions, Stat. Neerl., 26 (1972), 211–226. https://doi.org/10.1111/j.1467-9574.1972.tb00191.x doi: 10.1111/j.1467-9574.1972.tb00191.x
![]() |
[15] | F. Mosteller, J. W. Tukey, Data analysis and regression, Reading: Addison-Wesley, 1977. |
[16] |
K. Kafadar, A biweight approach to the one-sample problem, J. Am. Stat. Assoc., 77 (1982), 416–424. https://doi.org/10.1080/01621459.1982.10477827 doi: 10.1080/01621459.1982.10477827
![]() |
[17] |
H. W. Gómez, F. A. Quintana, F. J. Torres, A new family of slash-distributions with elliptical contours, Statist. Probab. Lett., 77 (2007), 717–725. https://doi.org/10.1016/j.spl.2006.11.006 doi: 10.1016/j.spl.2006.11.006
![]() |
[18] |
H. W. Gómez, O. Venegas, Erratum to: A new family of slash-distributions with elliptical contours [Statist. Probab. Lett. 77 (2007) 717–725], Statist. Probab. Lett., 78 (2008), 2273–2274. https://doi.org/10.1016/j.spl.2008.02.015 doi: 10.1016/j.spl.2008.02.015
![]() |
[19] |
H. W. Gómez, J. F. Olivares-Pacheco, H. Bolfarine, An extension of the generalized Birnbaum-Saunders distribution, Statist. Probab. Lett., 79 (2009), 331–338. https://doi.org/10.1016/j.spl.2008.08.014 doi: 10.1016/j.spl.2008.08.014
![]() |
[20] |
N. M. Olmos, H. Varela, H. Bolfarine, H. W. Gómez, An extension of the generalized half-normal distribution, Stat. Papers, 55 (2014), 967–981. https://doi.org/10.1007/s00362-013-0546-6 doi: 10.1007/s00362-013-0546-6
![]() |
[21] | M. Abramowitz, I. A. Stegun, Handbook of mathematical functions with formulas, graphs, and mathematical tables, 9 Eds., Dover Publications, 1965. |
[22] |
L. J. Gleser, The Gamma distribution as a mixture of exponential distributions, Am. Stat., 43 (1989), 115–117. https://doi.org/10.2307/2684515 doi: 10.2307/2684515
![]() |
[23] |
N. M. Olmos, E. Gómez-Déniz, O. Venegas, The heavy-tailed Gleser model: properties, estimation, and applications, Mathematics, 10 (2022), 4577. https://doi.org/10.3390/math10234577 doi: 10.3390/math10234577
![]() |
[24] |
N. M. Olmos, E. Gómez-Déniz, O. Venegas, Scale mixture of Gleser distribution with an application to insurance data, Mathematics, 12 (2024), 1397. https://doi.org/10.3390/math12091397 doi: 10.3390/math12091397
![]() |
[25] | E. L. Lehman, Elements of large-sample theory, New York: Springer, 1999. https://doi.org/10.1007/b98855 |
[26] | M. A. Tanner, Tools for statistical inference, 3 Eds., New York: Springer, 1996. https://doi.org/10.1007/978-1-4612-4024-2 |
[27] |
R. E. Glaser, Bathtub and related failure rate characterizations, J. Am. Stat. Assoc., 75 (1980), 667–672. https://doi.org/10.1080/01621459.1980.10477530 doi: 10.1080/01621459.1980.10477530
![]() |
[28] | T. Rolski, H. Schmidli, V. Schmidt, J. Teugel, Stochastic processes for insurance and finance, Hoboken: John Wiley & Sons, 1999. https://doi.org/10.1002/9780470317044 |
[29] | W. Feller, An introduction to probability theory and its applications, 2 Eds., Vol. 2, New York: John Wiley & Sons, 1968. |
[30] | N. H. Bingham, C. M. Goldie, J. L. Teugels Regular variation, Cambridge: Cambridge University Press, 1987. https://doi.org/10.1017/CBO9780511721434 |
[31] |
J. Beirlant, G. Matthys, G. Dierckx, Heavy-tailed distributions and rating, ASTIN Bull., 31 (2001), 37–58. https://doi.org/10.2143/AST.31.1.993 doi: 10.2143/AST.31.1.993
![]() |
[32] | D. G. Konstantinides, Risk theory: a heavy tail approach, World Scientific Publishing, 2017. |
[33] | R Core Team, Gradient estimates for solutions of nonlinear elliptic and parabolic equations, R Foundation for Statistical Computing: Vienna, Austria, 2021. Available from: https://www.R-project.org/. |
[34] |
H. Akaike, A new look at the statistical model identification, IEEE Trans. Automatic Control, 19 (1974), 716–723. https://doi.org/10.1109/TAC.1974.1100705 doi: 10.1109/TAC.1974.1100705
![]() |
[35] | G. Schwarz, Estimating the dimension of a model, Ann. Stat., 6 (1978), 461–464. |
[36] |
H. Bozdogan, Model selection and Akaike's information criterion (AIC): the general theory and its analytical extensions, Psychometrika, 52 (1987), 345–370. https://doi.org/10.1007/BF02294361 doi: 10.1007/BF02294361
![]() |
[37] | E. J. Hannan, B. G. Quinn, The determination of the order of an autoregression., J. R. Stat. Soc.: Ser. B, 41 (1979), 190–195. |
[38] |
E. Mahmoudi, The beta generalized Pareto distribution with application to lifetime data, Math. Comput. Simulat., 81 (2011), 2414–2430. https://doi.org/10.1016/j.matcom.2011.03.006 doi: 10.1016/j.matcom.2011.03.006
![]() |
Distribution | P(Z>4) | P(Z>5) | P(Z>6) |
GHG(1, 0.5, 1) | 0.437 | 0.406 | 0.381 |
GHG(1, 0.5, 3) | 0.341 | 0.311 | 0.288 |
GHG(1, 0.5, 10) | 0.308 | 0.280 | 0.258 |
True values | Estimator | n=20 | n=100 | n=300 | ||||||||||
α | β | Bias | MBias | SE | RMSE | Bias | MBias | SE | RMSE | Bias | MBias | SE | RMSE | |
0.25 | 0.5 | ˆα | −0.0124 | −0.0213 | 0.0606 | 0.0542 | −0.0277 | −0.0291 | 0.0233 | 0.0355 | −0.0292 | −0.0296 | 0.0124 | 0.0320 |
ˆβ | 0.2411 | 0.1496 | 0.6230 | 0.3956 | 0.2930 | 0.2213 | 0.2248 | 0.3750 | 0.2615 | 0.2322 | 0.0747 | 0.2963 | ||
0.25 | 1.0 | ˆα | 0.0230 | 0.0173 | 0.0723 | 0.0680 | 0.0002 | −0.0013 | 0.0298 | 0.0289 | −0.0031 | −0.0039 | 0.0162 | 0.0183 |
ˆβ | 0.0484 | −0.2142 | 15.138 | 0.6845 | 0.2650 | −0.0001 | 10.147 | 0.7274 | 0.2239 | 0.0406 | 0.4547 | 0.5718 | ||
0.5 | 0.5 | ˆα | −0.0346 | −0.0318 | 0.1004 | 0.0986 | −0.0447 | −0.0432 | 0.0406 | 0.0728 | −0.0473 | −0.0428 | 0.0188 | 0.0618 |
ˆβ | 0.2436 | 0.1512 | 0.6225 | 0.4308 | 0.2234 | 0.1677 | 0.2024 | 0.3440 | 0.1973 | 0.1543 | 0.0663 | 0.2879 | ||
0.5 | 1.0 | ˆα | 0.0103 | 0.0032 | 0.1067 | 0.0893 | −0.0051 | −0.0067 | 0.0517 | 0.0502 | −0.0101 | −0.0097 | 0.0282 | 0.0328 |
ˆβ | 0.2108 | −0.0227 | 18.389 | 0.7911 | 0.2919 | 0.0633 | 10.861 | 0.7391 | 0.2592 | 0.0624 | 0.5843 | 0.6181 | ||
0.8 | 0.5 | ˆα | −0.0204 | −0.0151 | 0.0554 | 0.0546 | −0.0092 | −0.0083 | 0.0230 | 0.0253 | −0.0045 | −0.0041 | 0.0129 | 0.0133 |
ˆβ | 0.1773 | 0.0777 | 0.4606 | 0.3622 | 0.0733 | 0.0474 | 0.1234 | 0.1473 | 0.0395 | 0.0334 | 0.0622 | 0.0693 | ||
0.8 | 1.0 | ˆα | −0.0060 | 0.0028 | 0.0517 | 0.0484 | −0.0023 | −0.0001 | 0.0227 | 0.0241 | −0.0012 | −0.0010 | 0.0126 | 0.0136 |
ˆβ | 0.2104 | −0.0373 | 16.475 | 0.7766 | 0.2051 | 0.0264 | 0.6920 | 0.6181 | 0.1027 | 0.0168 | 0.2561 | 0.3444 |
1.7 | 2.2 | 14.4 | 1.1 | 0.4 | 20.6 | 5.3 | 0.7 | 1.9 | 13.0 | 12.0 | 9.3 |
1.4 | 18.7 | 8.5 | 25.5 | 11.6 | 14.1 | 22.1 | 1.1 | 2.5 | 14.4 | 1.7 | 37.6 |
0.6 | 2.2 | 39.0 | 0.3 | 15.0 | 11.0 | 7.3 | 22.9 | 1.7 | 0.1 | 1.1 | 0.6 |
9.0 | 1.7 | 7.0 | 20.1 | 0.4 | 2.8 | 14.1 | 9.9 | 10.4 | 10.7 | 30.0 | 3.6 |
5.6 | 30.8 | 13.3 | 4.2 | 25.5 | 3.4 | 11.9 | 21.5 | 27.6 | 36.4 | 2.7 | 64.0 |
1.5 | 2.5 | 27.4 | 1.0 | 27.1 | 20.2 | 16.8 | 5.3 | 9.7 | 27.5 | 2.5 | 27.0 |
n | Median | Mean | SD | CS | CK |
72 | 9.5 | 12.204 | 12.297 | 1.473 | 5.890 |
Model | ML estimates | AIC | BIC | CAIC | HQIC |
GHG(β,α) | ˆβ=1.236(0.465), ˆα=0.403(0.045) | 569.8 | 574.3 | 576.3 | 571.6 |
EP(α,β,γ) | ˆα=0.1, ˆβ=0.424(0.046), ˆγ=2.880(0.491) | 578.6 | 583.2 | 586.2 | 581.3 |
Pareto(α,β) | ˆα=0.1, ˆβ=0.244(0.029) | 608.1 | 610.4 | 611.4 | 609.0 |
n | Median | Mean | SD | CS | CK |
500 | 54000 | 321021.9 | 3410936 | 21.127 | 461.575 |
Model | ML estimates | Log-likelihood |
GHG(σ,α,β) | ˆσ=56417.5(2174.647), ˆα=0.5(0.015), ˆβ=756.154(62.525) | -6466.192 |
SMG(σ,α) | ˆσ=22584.770(4027.346), ˆα=0.581(0.018) | -6541.776 |
Pareto(α,β) | ˆα=260, ˆβ=0.186(0.008) | -6802.113 |
Model | AIC | BIC | CAIC | HQIC |
GHG(σ,α,β) | 12938.38 | 12951.03 | 12954.03 | 12943.35 |
SMG(σ,α) | 13087.55 | 13095.98 | 13097.98 | 13090.86 |
Pareto(α,β) | 13606.23 | 13610.44 | 13611.44 | 13606.05 |
Distribution | P(Z>4) | P(Z>5) | P(Z>6) |
GHG(1, 0.5, 1) | 0.437 | 0.406 | 0.381 |
GHG(1, 0.5, 3) | 0.341 | 0.311 | 0.288 |
GHG(1, 0.5, 10) | 0.308 | 0.280 | 0.258 |
True values | Estimator | n=20 | n=100 | n=300 | ||||||||||
α | β | Bias | MBias | SE | RMSE | Bias | MBias | SE | RMSE | Bias | MBias | SE | RMSE | |
0.25 | 0.5 | ˆα | −0.0124 | −0.0213 | 0.0606 | 0.0542 | −0.0277 | −0.0291 | 0.0233 | 0.0355 | −0.0292 | −0.0296 | 0.0124 | 0.0320 |
ˆβ | 0.2411 | 0.1496 | 0.6230 | 0.3956 | 0.2930 | 0.2213 | 0.2248 | 0.3750 | 0.2615 | 0.2322 | 0.0747 | 0.2963 | ||
0.25 | 1.0 | ˆα | 0.0230 | 0.0173 | 0.0723 | 0.0680 | 0.0002 | −0.0013 | 0.0298 | 0.0289 | −0.0031 | −0.0039 | 0.0162 | 0.0183 |
ˆβ | 0.0484 | −0.2142 | 15.138 | 0.6845 | 0.2650 | −0.0001 | 10.147 | 0.7274 | 0.2239 | 0.0406 | 0.4547 | 0.5718 | ||
0.5 | 0.5 | ˆα | −0.0346 | −0.0318 | 0.1004 | 0.0986 | −0.0447 | −0.0432 | 0.0406 | 0.0728 | −0.0473 | −0.0428 | 0.0188 | 0.0618 |
ˆβ | 0.2436 | 0.1512 | 0.6225 | 0.4308 | 0.2234 | 0.1677 | 0.2024 | 0.3440 | 0.1973 | 0.1543 | 0.0663 | 0.2879 | ||
0.5 | 1.0 | ˆα | 0.0103 | 0.0032 | 0.1067 | 0.0893 | −0.0051 | −0.0067 | 0.0517 | 0.0502 | −0.0101 | −0.0097 | 0.0282 | 0.0328 |
ˆβ | 0.2108 | −0.0227 | 18.389 | 0.7911 | 0.2919 | 0.0633 | 10.861 | 0.7391 | 0.2592 | 0.0624 | 0.5843 | 0.6181 | ||
0.8 | 0.5 | ˆα | −0.0204 | −0.0151 | 0.0554 | 0.0546 | −0.0092 | −0.0083 | 0.0230 | 0.0253 | −0.0045 | −0.0041 | 0.0129 | 0.0133 |
ˆβ | 0.1773 | 0.0777 | 0.4606 | 0.3622 | 0.0733 | 0.0474 | 0.1234 | 0.1473 | 0.0395 | 0.0334 | 0.0622 | 0.0693 | ||
0.8 | 1.0 | ˆα | −0.0060 | 0.0028 | 0.0517 | 0.0484 | −0.0023 | −0.0001 | 0.0227 | 0.0241 | −0.0012 | −0.0010 | 0.0126 | 0.0136 |
ˆβ | 0.2104 | −0.0373 | 16.475 | 0.7766 | 0.2051 | 0.0264 | 0.6920 | 0.6181 | 0.1027 | 0.0168 | 0.2561 | 0.3444 |
1.7 | 2.2 | 14.4 | 1.1 | 0.4 | 20.6 | 5.3 | 0.7 | 1.9 | 13.0 | 12.0 | 9.3 |
1.4 | 18.7 | 8.5 | 25.5 | 11.6 | 14.1 | 22.1 | 1.1 | 2.5 | 14.4 | 1.7 | 37.6 |
0.6 | 2.2 | 39.0 | 0.3 | 15.0 | 11.0 | 7.3 | 22.9 | 1.7 | 0.1 | 1.1 | 0.6 |
9.0 | 1.7 | 7.0 | 20.1 | 0.4 | 2.8 | 14.1 | 9.9 | 10.4 | 10.7 | 30.0 | 3.6 |
5.6 | 30.8 | 13.3 | 4.2 | 25.5 | 3.4 | 11.9 | 21.5 | 27.6 | 36.4 | 2.7 | 64.0 |
1.5 | 2.5 | 27.4 | 1.0 | 27.1 | 20.2 | 16.8 | 5.3 | 9.7 | 27.5 | 2.5 | 27.0 |
n | Median | Mean | SD | CS | CK |
72 | 9.5 | 12.204 | 12.297 | 1.473 | 5.890 |
Model | ML estimates | AIC | BIC | CAIC | HQIC |
GHG(β,α) | ˆβ=1.236(0.465), ˆα=0.403(0.045) | 569.8 | 574.3 | 576.3 | 571.6 |
EP(α,β,γ) | ˆα=0.1, ˆβ=0.424(0.046), ˆγ=2.880(0.491) | 578.6 | 583.2 | 586.2 | 581.3 |
Pareto(α,β) | ˆα=0.1, ˆβ=0.244(0.029) | 608.1 | 610.4 | 611.4 | 609.0 |
n | Median | Mean | SD | CS | CK |
500 | 54000 | 321021.9 | 3410936 | 21.127 | 461.575 |
Model | ML estimates | Log-likelihood |
GHG(σ,α,β) | ˆσ=56417.5(2174.647), ˆα=0.5(0.015), ˆβ=756.154(62.525) | -6466.192 |
SMG(σ,α) | ˆσ=22584.770(4027.346), ˆα=0.581(0.018) | -6541.776 |
Pareto(α,β) | ˆα=260, ˆβ=0.186(0.008) | -6802.113 |
Model | AIC | BIC | CAIC | HQIC |
GHG(σ,α,β) | 12938.38 | 12951.03 | 12954.03 | 12943.35 |
SMG(σ,α) | 13087.55 | 13095.98 | 13097.98 | 13090.86 |
Pareto(α,β) | 13606.23 | 13610.44 | 13611.44 | 13606.05 |