Loading [MathJax]/jax/output/SVG/jax.js
Research article

The status of geo-environmental health in Mississippi: Application of spatiotemporal statistics to improve health and air quality

  • Data enabled research with a spatial perspective may help to combat human diseases in an informed and cost-effective manner. Understanding the changing patterns of environmental degradation is essential to help in determining the health outcomes such as asthma of a community. In this research, Mississippi asthma-related prevalence data for 2003–2011 were analyzed using spatial statistical techniques in Geographic Information Systems. Geocoding by ZIP code, choropleth mapping, and hotspot analysis techniques were applied to map the spatial data. Disease rates were calculated for every ZIP code region from 2009 to 2011. The highest rates (4–5.5%) were found in Prairie in Monroe County for three consecutive years. Statistically significant hotspots were observed in urban regions of Jackson and Gulf port with steady increase near urban Jackson and the area between Jackson and meridian metropolis. For 2009–2011, spatial signatures of urban risk factors were found in dense population areas, which was confirmed from regression analysis of asthma patients with population data (linear increase of R2 = 0.648, as it reaches a population size of 3,5000 per ZIP code and the relationship decreased to 59% as the population size increased above 3,5000 to a maximum of 4,7000 per ZIP code). The observed correlation coefficient (r) between monthly mean O3 and asthma prevalence was moderately positive during 2009–2011 (r = 0.57). The regression model also indicated that 2011 annual PM2.5 has a statistically significant influence on the aggravation of the asthma cases (adjusted R-squared 0.93) and the 2011 PM2.5 depended on asthma per capita and poverty rate as well. The present study indicates that Jackson urban area and coastal Mississippi are to be observed for disease prevalence in future. The current results and GIS disease maps may be used by federal and state health authorities to identify at-risk populations and health advisory.

    Citation: Swatantra R. Kethireddy, Grace A. Adegoye, Paul B. Tchounwou, Francis Tuluri, H. Anwar Ahmad, John H. Young, Lei Zhang. The status of geo-environmental health in Mississippi: Application of spatiotemporal statistics to improve health and air quality[J]. AIMS Environmental Science, 2018, 5(4): 273-293. doi: 10.3934/environsci.2018.4.273

    Related Papers:

    [1] Khaled M. Alqahtani, Mahmoud El-Morshedy, Hend S. Shahen, Mohamed S. Eliwa . A discrete extension of the Burr-Hatke distribution: Generalized hypergeometric functions, different inference techniques, simulation ranking with modeling and analysis of sustainable count data. AIMS Mathematics, 2024, 9(4): 9394-9418. doi: 10.3934/math.2024458
    [2] Alanazi Talal Abdulrahman, Khudhayr A. Rashedi, Tariq S. Alshammari, Eslam Hussam, Amirah Saeed Alharthi, Ramlah H Albayyat . A new extension of the Rayleigh distribution: Methodology, classical, and Bayes estimation, with application to industrial data. AIMS Mathematics, 2025, 10(2): 3710-3733. doi: 10.3934/math.2025172
    [3] Pingyun Li, Chuancun Yin . Tail risk measures with application for mixtures of elliptical distributions. AIMS Mathematics, 2022, 7(5): 8802-8821. doi: 10.3934/math.2022491
    [4] Guangshuai Zhou, Chuancun Yin . Family of extended mean mixtures of multivariate normal distributions: Properties, inference and applications. AIMS Mathematics, 2022, 7(7): 12390-12414. doi: 10.3934/math.2022688
    [5] Mahmoud Abul-Ez, Mohra Zayed, Ali Youssef . Further study on the conformable fractional Gauss hypergeometric function. AIMS Mathematics, 2021, 6(9): 10130-10163. doi: 10.3934/math.2021588
    [6] Naif Alotaibi, A. S. Al-Moisheer, Ibrahim Elbatal, Salem A. Alyami, Ahmed M. Gemeay, Ehab M. Almetwally . Bivariate step-stress accelerated life test for a new three-parameter model under progressive censored schemes with application in medical. AIMS Mathematics, 2024, 9(2): 3521-3558. doi: 10.3934/math.2024173
    [7] Ahmed Sedky Eldeeb, Muhammad Ahsan-ul-Haq, Mohamed S. Eliwa . A discrete Ramos-Louzada distribution for asymmetric and over-dispersed data with leptokurtic-shaped: Properties and various estimation techniques with inference. AIMS Mathematics, 2022, 7(2): 1726-1741. doi: 10.3934/math.2022099
    [8] Refah Alotaibi, Hassan Okasha, Hoda Rezk, Abdullah M. Almarashi, Mazen Nassar . On a new flexible Lomax distribution: statistical properties and estimation procedures with applications to engineering and medical data. AIMS Mathematics, 2021, 6(12): 13976-13999. doi: 10.3934/math.2021808
    [9] Tahani A. Abushal, Alaa H. Abdel-Hamid . Inference on a new distribution under progressive-stress accelerated life tests and progressive type-II censoring based on a series-parallel system. AIMS Mathematics, 2022, 7(1): 425-454. doi: 10.3934/math.2022028
    [10] Muajebah Hidan, Mohamed Akel, Hala Abd-Elmageed, Mohamed Abdalla . Solution of fractional kinetic equations involving extended $ (k, \tau) $-Gauss hypergeometric matrix functions. AIMS Mathematics, 2022, 7(8): 14474-14491. doi: 10.3934/math.2022798
  • Data enabled research with a spatial perspective may help to combat human diseases in an informed and cost-effective manner. Understanding the changing patterns of environmental degradation is essential to help in determining the health outcomes such as asthma of a community. In this research, Mississippi asthma-related prevalence data for 2003–2011 were analyzed using spatial statistical techniques in Geographic Information Systems. Geocoding by ZIP code, choropleth mapping, and hotspot analysis techniques were applied to map the spatial data. Disease rates were calculated for every ZIP code region from 2009 to 2011. The highest rates (4–5.5%) were found in Prairie in Monroe County for three consecutive years. Statistically significant hotspots were observed in urban regions of Jackson and Gulf port with steady increase near urban Jackson and the area between Jackson and meridian metropolis. For 2009–2011, spatial signatures of urban risk factors were found in dense population areas, which was confirmed from regression analysis of asthma patients with population data (linear increase of R2 = 0.648, as it reaches a population size of 3,5000 per ZIP code and the relationship decreased to 59% as the population size increased above 3,5000 to a maximum of 4,7000 per ZIP code). The observed correlation coefficient (r) between monthly mean O3 and asthma prevalence was moderately positive during 2009–2011 (r = 0.57). The regression model also indicated that 2011 annual PM2.5 has a statistically significant influence on the aggravation of the asthma cases (adjusted R-squared 0.93) and the 2011 PM2.5 depended on asthma per capita and poverty rate as well. The present study indicates that Jackson urban area and coastal Mississippi are to be observed for disease prevalence in future. The current results and GIS disease maps may be used by federal and state health authorities to identify at-risk populations and health advisory.


    Distributions with heavy right tail are commonly used to model data in various areas of knowledge in which extreme observations generally occur, including environmental studies, geosciences, hydrology, actuarial sciences, economics, finance, product reliability assessment, and income distribution analysis. These datasets tend to be positive with positive asymmetry and heavy right tail (see [1]). Distributions with positive support and these characteristics are usually used to model such data. Areas where heavy-tailed distributions are applicable include engineering, physics, and finance; for other fields, we refer the reader to [2] and its references.

    A frequently used distribution in statistical modeling is the Pareto model (see book by Arnold [3]), named after economist Vilfredo Pareto. A random variable X follows a Pareto distribution when its probability density function (pdf) is given by:

    fX(x;α,β)=βαβxβ+1,xα, (1.1)

    where α>0 is the scale parameter and β>0 the shape parameter. We denote this by XPareto(α,β). The corresponding cumulative distribution function (cdf) is:

    FX(x;α,β)=1(αx)β,xα. (1.2)

    Johnson et al. [4] classify several types of Pareto distributions, referring to the above model as Pareto Type Ⅰ. Beyond this standard form, various extensions have been proposed in the literature. For instance, Pickands [5] examines the generalized Pareto distribution for making statistical inferences about the upper tail of a distribution. Choulakian and Stephens [6] develop goodness-of-fit tests for this generalized model, while Davison and Smith [7] apply it to hydrological data, analyzing river-flow exceedances over 35 years.

    As a further extension, Gupta et al. [8] introduce a family of distributions by raising any positive support CDF to a positive power. A special case arises when using the Pareto cdf (1.2), leading to the exponentiated Pareto (EP) distribution. A random variable X follows an EP distribution if its pdf is defined as:

    fX(x;α,β,γ)=βγαβxβ+1(1(αx)β)γ1,xα, (1.3)

    where α>0 is the scale parameter and β,γ>0 are the shape parameters. This model, denoted X EP(α,β,γ), was first studied by Stoppa [9] (see also [10]), and is a particular case of the family studied by Gupta et al. [8]. Another generalization is the beta-Pareto distribution introduced by Akinsete et al. [11], and Boumaraf et al. [12] perform several optimization methods on the beta-Pareto distribution. For a comprehensive review of Pareto models (see Arnold's book [3]).

    Andrews and Mallows [13] conducted research on normal distribution scale mixtures, resulting in the creation of a class of heavy-tailed distributions (including notable examples like Student's t-distribution). These distribution families are extensively applied in robust statistical analysis for symmetrical datasets.

    A scale mixture distribution is formed by combining a base probability density with a scaling distribution. The mathematical representation of this mixture takes the form:

    fZ(z)=fZ|U=u(z;κ(u))fU(u)du,

    where fZ|U=u is the conditional distribution of the random variable Z given U=u, while κ(u) represents a positive function of the random variable U with density function fU, which influences the scale of the random variable Z. A particularly noteworthy case is the Slash distribution (see [4]), where Z|U=uN(0,κ(u)), UUniform(0,1) and κ(u)=u1/q. This family's theoretical properties, inferential methods, and extensions have been extensively studied by Rogers and Tukey [14] and Mosteller and Tukey [15], while maximum likelihood (ML) estimation of its location and scale parameters are addressed in [16,17,18]. The same idea has been applied when the conditional distribution is with positive support, for example, to extend the Birnbaum-Saunders distribution (see [19]), and to extend the generalized half-normal distribution (see [20]).

    The beta function (B) plays a fundamental role in this paper. It is defined as:

    B(a,b)=10ta1(1t)b1dt=0ta1(1+t)a+bdt=Γ(a)Γ(b)Γ(a+b), (1.4)

    where a>0, b>0, and Γ() denotes the gamma function. A random variable Y follows a beta distribution with parameters a and b if its pdf is given by:

    fY(y;a,b)=1B(a,b)ya1(1y)b1,0<y<1, (1.5)

    where a>0 and b>0.

    The incomplete beta function, denoted by B(y;a,b), is defined as:

    B(y;a,b)=y0ta1(1t)b1dt,0<y<1, (1.6)

    where a>0 and b>0. A related function is the regularized incomplete beta function, Iy(a,b), expressed as Iy(a,b)=B(y;a,b)/B(a,b).

    Also used in this paper is the Gauss hypergeometric function (see [21]); it is denoted by 2F1, and is given by

    2F1(a,b,c;x)=Γ(c)Γ(b)Γ(cb)10vb1(1v)cb1(1xv)adv, (1.7)

    where a,b,c>0 and c>b.

    Lerch transcendent function (Φ), is also used in this paper, and can be expressed as:

    Φ(z,s,a)=1Γ(s)10xa1logs1(1x)1+zxdx, (1.8)

    where a>0, s>0 and z>1.

    A random variable X follows a G distribution with parameters σ and α when its density function is expressed as:

    fX(x;σ,α)=σαxαB(¯α,α)(σ+x),x>0, (1.9)

    with scale parameter σ>0, shape parameter 0<α<1, and ¯α=1α. The notation XG(σ,α) indicates this distribution, which possesses these key characteristics:

    a) The G distribution is unimodal, its mode located at zero.

    b) For YBeta(1α,α), the transformed variable σY1Y follows a G(σ,α) distribution.

    c) The cdf of X is expressed as:

    FX(x;σ,α)=Iy(¯α,α),x>0, (1.10)

    where y=xσ+x and Iy(,) is the regularised incomplete beta function.

    This distribution was first introduced by Gleser [22] and later studied by Olmos et al. [23]. More recently, Olmos et al. [24] proposed a scale mixture of the G and Beta distributions, naming it scale mixture of Gleser (SMG) distribution. They show that the SMG distribution exhibits a heavy right tail, making it a viable alternative to other heavy-tailed distributions such as the Pareto distribution.

    A random variable X follows a SMG distribution with parameters σ and α when its density function is:

    fX(x;σ,α)=ασαB(¯α,α)x(α+1)log(1+xσ),x>0, (1.11)

    where σ>0 represents the scale parameter and 0<α<1 the shape parameter. We denote this by XSMG(σ,α).

    The main motivation of this work is to extend the study of the G distribution by pursuing the following objectives. First, we aim to introduce the GHG distribution, generated from a scale mixture of the G and Beta distributions, similar to how the SMG distribution was derived. Notably, both the SMG and G distributions emerge as special cases of the GHG distribution. Second, we conduct a comprehensive study of the GHG distribution, demonstrating its applicability as an alternative model for actuarial data (e.g., insurance claims, income, and expenses) and hydrological data (e.g., flood peak exceedances), among other fields.

    The article is organized as follows: In Section 2, we give a representation and the density of the extension of the G distribution, and its basic properties. In Section 3, we carry out an inference by the ML method, with a simulation study and the Fisher information matrix. Section 4 contains two applications to real data. In Section 5, we offer some conclusions.

    In this section, we present the mathematical representation, pdf, key properties, and graphical illustrations of the GHG distribution.

    The pdf of the GHG distribution is formally defined below, including several special cases.

    Definition 2.1. Let ZGHG(σ,α,β). Then, the pdf of Z is given by

    fZ(z;σ,α,β)=βB(¯α+β,1)σ¯αB(¯α,α)zα2F1(1,¯α+β,¯α+β+1;zσ),z>0, (2.1)

    where σ>0 represents the scale parameter, 0<α<1 and β>0 are shape parameters and 2F1 is the Gauss hypergeometric function.

    Remark 1. It is verified as the following:

    (1) If β<α, then the GHG distribution can be written as

    fZ(z;σ,α,β)=βσβz(β+1)B(¯α,α)B(zσ+z,¯α+β,αβ),z>0. (2.2)

    (2) If β=α, the SMG(σ,α) distribution is obtained, introduced by Olmos et al. [24].

    (3) If zσ<1 y using the Lerch transcendent function we obtain

    fZ(z;σ,α,β)=βσ¯αzαB(¯α,α)Φ(zσ,1,¯α+β),z>0. (2.3)

    Figure 1 displays the density plots of the GHG distribution for σ=1, α=0.5 and two values of β, while Table 1 presents the values of P(Z>z) for various values of z in the distributions mentioned. It can be seen that the right tail of the GHG distribution is heavier as the value of parameter β decreases.

    Figure 1.  Examples of the GHG(1,0.5,β).
    Table 1.  Tails comparison for different GHG distributions.
    Distribution P(Z>4) P(Z>5) P(Z>6)
    GHG(1, 0.5, 1) 0.437 0.406 0.381
    GHG(1, 0.5, 3) 0.341 0.311 0.288
    GHG(1, 0.5, 10) 0.308 0.280 0.258

     | Show Table
    DownLoad: CSV

    In this subsection, we show some properties of the GHG distribution.

    Proposition 2.1. Let Z|U=uG(σu1,α) and UBeta(β,1) then ZGHG(σ,α,β).

    Proof. Proceeding straight to the integral calculation, we obtain:

    fZ(z;σ,α,β)=10fZ|U(z)fU(u)du=10(σu1)αzαB(¯α,α)(σu1+z)βuβ1du=βzαB(¯α,α)σ¯α10uβα1+zσudu.

    The result follows from applying (1.7).

    Proposition 2.2. Let XG(1,α) and YBeta(β,1) be independent random variables. Then, Z=σXYGHG(σ,α,β).

    Proof. Using the stochastic representation Z=σXY and applying the Jacobian transformation, we obtain the desired distribution.

    Proposition 2.3. Let ZGHG(σ,α,β). If β then Z converges in law to a random variable ZG(σ,α).

    Proof. Let ZGHG(σ,α,β). Then, by the representation given in Proposition 2.2, we have that Z=σXY. First, we study the probability convergence of Y.

    When YBeta(β,1), we have that E[(Y1)2]=2(β+1)(β+2), if β, then E[(Y1)2]=2(β+1)(β+2)0; this implies that YP1. Applying Slutsky's Lemma (see the book by Lehmann [25]), we obtain the result.

    Remark 2. The result of Proposition 2.1 shows that this distribution arises from a scale mixture of the G and Beta distributions, while Proposition 2.3 indicates that when β in the GHG distribution, the G distribution is obtained. Figure 2 shows the relationship between the three distributions; we observe that the GHG distribution contains the SMG and G distributions as particular cases.

    Figure 2.  Particular cases for the GHG distribution.

    Proposition 2.4. Let ZGHG(σ,α,β). Then, the cdf of Z is given by

    FZ(z;σ,α,β)=Iy(¯α,α)zβfZ(z;σ,α,β),z>0, (2.4)

    where σ>0, 0<α<1, β>0, y=zσ+z and Iy(,) is the regularised incomplete beta function.

    Proof. Applying the definition of the cdf directly, we obtain

    FZ(z;σ,α,β)=βB(¯α+β,1)σ¯αB(¯α,α)z0tα2F1(1,¯α+β,¯α+β+1;tσ)dt,=βσ¯αB(¯α,α)z0tα10v¯α+β11+tvσdvdt,=βB(¯α,α)10vβ1(vzσ0wα1+wdw)dv.

    Hence by integrating by parts and considering u=vzσ0wα1+wdw, we obtain the result.

    Corollary 2.1. Let ZGHG(σ,α,β), the survival function ¯FX(t) defined by ¯FX(t)=1FX(t), is given by

    ¯FZ(t;σ,α,β)=1Iy(¯α,α)+tβfZ(t;σ,α,β).

    The hazard function h(t), is defined by h(t)=fX(t)¯FX(t), for a random variable GHG is given by

    h(t;σ,α,β)=fZ(t;σ,α,β)1Iy(¯α,α)+tβfZ(t;σ,α,β),t>0,

    where y=tσ+t.

    Figure 3 displays the hazard rate function's behavior for varying β parameters, with σ fixed at 1 and α at 0.5.

    Figure 3.  Plots of the hazard function h(t) for β=1,3,10.

    In the simulation study and in parameter estimation by the ML method, we considered the GHG distribution with only two parameters, i.e., modifying Proposition 2.1 as follows: Z|U=uG(u1,α) and UBeta(β,1), and the following pdf of the random variable Z is the result.

    fZ(z;β,α)=βB(¯α+β,1)B(¯α,α)zα2F1(1,¯α+β,¯α+β+1;z),z>0, (2.5)

    where β>0 and 0<α<1 are shape parameters. This reduction in the parametric space aims to obtain a two-parameter distribution that serves as an alternative to the Pareto distribution and other two-parameter heavy right-tailed distributions. The random variable U considers parameter β, which influences the scale in the G distribution since it is where the mixing takes place. When including parameter σ, we observe that it can lead to an overestimation of β for small sample sizes.

    For generating random numbers from the GHG model, two approaches are available: Algorithm 1 uses Proposition 2.1 and the composition method (see [26]), whereas Algorithm 2 employs the representation from Proposition 2.2.

    Algorithm 1 Simulates ZGHG(β,α) as follows:
      Step 1: Generate UBeta(β,1)
      Step 2: Compute Z|U=uG(u1,α)

    Algorithm 2 Simulates ZGHG(β,α) as follows:
      Step 1: Generate VBeta(1α,α)
      Step 2: Compute X=V1V
      Step 3: Generate YBeta(β,1)
      Step 4: Compute Z=XY

    Proposition 2.5. Let TGHG(β,α). Then, the hazard function of T is decreasing for all t>0.

    Proof. Applying item b) of Glaser's Theorem [27], we defined the function:

    η(t)=f(t)f(t)=αt+(¯α+β¯α+β+1)2F1(2,¯α+β+1,¯α+β+2;t),

    where f(t) is the pdf given in (2.5). Their first derivative and sign are:

    η(t)=(αt2+2(¯α+β¯α+β+2)2F1(3,¯α+β+2,¯α+β+3;t))<0,t>0.

    This completes the proof by Glaser's Theorem.

    The following Proposition shows that no moments exist for the GHG distribution; this is quite common in heavy-tailed distributions.

    Proposition 2.6. For the random variable ZGHG(β,α), the r-th moments does not exist when rα.

    Proof. We begin with the change of variable t=zw and apply integrating by parts with u=z0tβα1+tdt. This yields:

    E(Zr)=βB(¯α,α)0zrα10wβα1+zwdwdz=βB(¯α,α)0zrβ1z0tβα1+tdtdz=zrβrβz0tβα1+tdt|01rβ0zrα1+zdz.

    The left-hand side expression vanishes for r<β and does not exist when rβ. As demonstrated in Proposition 1(e) of Olmos et al. [23], the right-hand integral diverges. Consequently, the expression diverges, proving that the r-th moments of ZGHG(β,α) do not exist.

    Remark 3. By Proposition 2.2, the existence of the r-th moment of Z reduces to that of X, which does not exist (see Olmos et al. [23]).

    In approximate terms, heavy-tailed distributions are those whose tails decay at a slower than exponential rate. The exponential distribution is regarded as the limit between heavy and light tails. In the areas of knowledge where they are applied, for example in finance and actuarial sciences, the data are positive and have a heavy right tail. The Pareto distribution has been widely used to model financial datasets, but in many cases, it does not provide a good fit. New distributions with heavy right tail are therefore of interest. Such models are crucial for representing losses in insurance, reinsurance, and catastrophic risk. Formally, a distribution of probability with cdf F(x), on the real line, is said to have a heavy right tail (see [28]) if limsupx(log(¯FX(x))/x)=0.

    Regular variation theory plays a fundamental role in extreme value analysis (see for example [29,30,31,32], etc.). This concept is formalized as follows:

    Definition 2.2. A distribution function is called regular varying at infinity with index β (where β0) if:

    limx¯FX(tx)¯FX(x)=tβ.

    Here, β is termed the tail index.

    The GHG distribution exhibits this important property:

    Proposition 2.7. The survival function of ZGHG(β,α), has regularly varying tails.

    Proof. Applying the above definition and L'Hospital's Rule, we have that

    limz¯FZ(tz)¯FZ(z)=limz1F(tz)1F(z)=tα+1limz10wβα1+tzwdw10wβα1+zwdw=tβlimztz0uβα1+uduz0uβα1+udu=tβ,

    and taking into account that β>0 the result follows.

    As the GHG distribution is of regularly varying tails, it is therefore heavy-tailed (see [29]).

    In this section, we describe ML estimation for the GHG model and include a simulation study assessing the ML estimators' behavior.

    Let z1,,zn be a random sample from the GHG(β,α) distribution. The log-likelihood is given by:

    (β,α)=c(β,α)αni=1log(zi)+ni=1log(2F1(1,¯α+β,¯α+β+1;zi)), (3.1)

    where c(β,α)=nlog(β)nlog(¯α+β)nlog(Γ(¯α))nlog(Γ(α)).

    By equating the first partial derivatives (w.r.t. each parameter) to zero, we obtain the following system of equations:

    (¯α+β)ni=1Φ(zi,2,¯α+β)2F1(1,¯α+β,¯α+β+1;zi)=nβ, (3.2)
    (¯α+β)ni=1Φ(zi,2,¯α+β)2F1(1,¯α+β,¯α+β+1;zi)ni=1log(zi)+nψ(¯α)=nψ(α), (3.3)

    where ψ() is the digamma function. From (3.2) and (3.3) we derive:

    ˆβ(ˆα)=nni=1log(zi)+nψ(ˆα)nψ(1ˆα). (3.4)

    The ML estimator for α (ˆα) is obtained by numerically solving:

    (1ˆα+ˆβ(ˆα))ni=1Φ(zi,2,1ˆα+ˆβ(ˆα))2F1(1,1ˆα+ˆβ(ˆα),2ˆα+ˆβ(ˆα);zi)=nˆβ(ˆα), (3.5)

    The estimator ˆα corresponds to the solution of Eq (3.5), which when substituted into (3.4) yields ˆβ. Equation (3.5) can be solved using numerical methods such as the Newton-Raphson algorithm. Alternatively, both estimates can be obtained by directly maximizing the log-likelihood surface defined in (3.1) using the optim subroutine in the R software package [33].

    For a random variable ZGHG(β,α), the log-likelihood function corresponding to a single observation z of Z and parameter vector $θθ$=(β,α) is expressed as:

    (θ)=log(β)log(¯α+β)log(B(¯α,α))αlog(z)+log(2F1(1,¯α+β,¯α+β+1;z)).

    The Appendix contains derivations of both first and second partial derivatives of this log-likelihood function. The Fisher information matrix, denoted IF(), for the G distribution takes the form:

    IF(θ)=(1β22η31+η222η31η222η31η22ψ(α)+ψ(¯α)η31+η22),

    where ψ() is the trigamma function and ηij=E(κ0Φ(Z,i,κ0)2F1(1,κ0,κ1;Z))j are calculated numerically.

    For large samples, the ML estimator ˆθ is asymptotically normal bivariate:

    n(ˆθθ)LN2(0,IF(θ)1).

    Consequently, the asymptotic variance of ˆθ equals the inverse of IF(θ). In practice, since the parameters are typically unknown, we employ the observed information matrix with ML-estimated parameters.

    To examine the behavior of estimation by ML, we conducted a simulation study to assess the performance of the estimation method for the β and α parameters of the GHG distribution. Using Algorithm 2 given in Subsection 2.2 to generate random numbers from the GHG distribution, the simulation analysis was performed by generating 1000 samples of sizes n=20, 100 and 300 from the GHG distribution. Table 2 shows the empirical bias (Bias), medium bias (MBias), mean of the standard errors (SE), and root of the empirical mean squared error (RMSE). As shown in Table 2, the estimation performance improves as n increases.

    Table 2.  Bias, MBias, SE, and RMSE for the estimators of α and β.
    True values Estimator n=20 n=100 n=300
    α β Bias MBias SE RMSE Bias MBias SE RMSE Bias MBias SE RMSE
    0.25 0.5 ˆα −0.0124 −0.0213 0.0606 0.0542 −0.0277 −0.0291 0.0233 0.0355 −0.0292 −0.0296 0.0124 0.0320
    ˆβ 0.2411 0.1496 0.6230 0.3956 0.2930 0.2213 0.2248 0.3750 0.2615 0.2322 0.0747 0.2963
    0.25 1.0 ˆα 0.0230 0.0173 0.0723 0.0680 0.0002 −0.0013 0.0298 0.0289 −0.0031 −0.0039 0.0162 0.0183
    ˆβ 0.0484 −0.2142 15.138 0.6845 0.2650 −0.0001 10.147 0.7274 0.2239 0.0406 0.4547 0.5718
    0.5 0.5 ˆα −0.0346 −0.0318 0.1004 0.0986 −0.0447 −0.0432 0.0406 0.0728 −0.0473 −0.0428 0.0188 0.0618
    ˆβ 0.2436 0.1512 0.6225 0.4308 0.2234 0.1677 0.2024 0.3440 0.1973 0.1543 0.0663 0.2879
    0.5 1.0 ˆα 0.0103 0.0032 0.1067 0.0893 −0.0051 −0.0067 0.0517 0.0502 −0.0101 −0.0097 0.0282 0.0328
    ˆβ 0.2108 −0.0227 18.389 0.7911 0.2919 0.0633 10.861 0.7391 0.2592 0.0624 0.5843 0.6181
    0.8 0.5 ˆα −0.0204 −0.0151 0.0554 0.0546 −0.0092 −0.0083 0.0230 0.0253 −0.0045 −0.0041 0.0129 0.0133
    ˆβ 0.1773 0.0777 0.4606 0.3622 0.0733 0.0474 0.1234 0.1473 0.0395 0.0334 0.0622 0.0693
    0.8 1.0 ˆα −0.0060 0.0028 0.0517 0.0484 −0.0023 −0.0001 0.0227 0.0241 −0.0012 −0.0010 0.0126 0.0136
    ˆβ 0.2104 −0.0373 16.475 0.7766 0.2051 0.0264 0.6920 0.6181 0.1027 0.0168 0.2561 0.3444

     | Show Table
    DownLoad: CSV

    In this section, we analyze two real-data applications, comparing the fits of several distributions against the GHG distribution. Model selection is performed using four information criteria: AIC (Akaike Information Criterion; [34]), BIC (Bayesian Information Criterion; [35]), CAIC (Consistent AIC; [36]) and HQIC (Hannan-Quinn Information Criterion; [37]). These criteria are defined as: AIC=2ˆ()+2p, BIC=2ˆ()+log(n)p, CAIC=2ˆ()+(log(n)+1)p=BIC+p, and HQIC=2ˆ()+2log(log(n))p, where p denotes the number of parameters in the model.

    The data correspond to the exceedances of flood peaks (in m3/s) of the Wheaton River near Carcross in Yukon Territory, Canada. The data consist of 72 exceedances for the years 1958-1984, rounded to one decimal place (see [6,11,38]), and are listed in Table 3. Table 4 presents descriptive statistics, including the sample skewness coefficient (CS) and kurtosis coefficient (CK). We compare the fits of the EP and Pareto distributions, with the fit of the GHG distribution with two parameters given in (2.5).

    Table 3.  Data on exceedances of Wheaton River floods.
    1.7 2.2 14.4 1.1 0.4 20.6 5.3 0.7 1.9 13.0 12.0 9.3
    1.4 18.7 8.5 25.5 11.6 14.1 22.1 1.1 2.5 14.4 1.7 37.6
    0.6 2.2 39.0 0.3 15.0 11.0 7.3 22.9 1.7 0.1 1.1 0.6
    9.0 1.7 7.0 20.1 0.4 2.8 14.1 9.9 10.4 10.7 30.0 3.6
    5.6 30.8 13.3 4.2 25.5 3.4 11.9 21.5 27.6 36.4 2.7 64.0
    1.5 2.5 27.4 1.0 27.1 20.2 16.8 5.3 9.7 27.5 2.5 27.0

     | Show Table
    DownLoad: CSV
    Table 4.  Descriptive statistics for exceedances of flood peaks data.
    n Median Mean SD CS CK
    72 9.5 12.204 12.297 1.473 5.890

     | Show Table
    DownLoad: CSV

    The box plot in Figure 4 shows one very extreme datum. This outlier makes the right tail heavier. While most data points cluster around 10 flood peak exceedances (m3/s), there is one notable outlier at 64 exceedances (m3/s).

    Figure 4.  Boxplot for exceedances of flood peaks data.

    Table 5 presents the ML estimates for the GHG, EP, and Pareto model parameters, along with their corresponding AIC, BIC, CAIC, and HQIC values.

    Table 5.  Exceedances of flood peaks data: model, ML estimates, AIC, BIC, CAIC, and HQIC values.
    Model ML estimates AIC BIC CAIC HQIC
    GHG(β,α) ˆβ=1.236(0.465), ˆα=0.403(0.045) 569.8 574.3 576.3 571.6
    EP(α,β,γ) ˆα=0.1, ˆβ=0.424(0.046), ˆγ=2.880(0.491) 578.6 583.2 586.2 581.3
    Pareto(α,β) ˆα=0.1, ˆβ=0.244(0.029) 608.1 610.4 611.4 609.0

     | Show Table
    DownLoad: CSV

    The GHG model demonstrates superior fit to the data, as evidenced by its consistently lower AIC, BIC, CAIC, and HQIC values compared to both EP and Pareto models.

    In Figure 5, the left side displays the histogram of the data along with the curves of the respective fitted distributions. It is evident that the GHG distribution provides a better fit for the exceedances of flood peak data compared to the EP and Pareto distributions. On the right side, there is a zoomed-in view of the right tail of the histogram, which more conclusively demonstrates that the GHG distribution concentrates more probability in the right tail than the other two distributions for this data set.

    Figure 5.  Plots of the GHG (black), EP (red) and Pareto (green) models.

    The dataset used in this application comes from the Survey of Consumer Finances (SCF), a nationally representative sample containing extensive information on assets, liabilities, income, and demographic characteristics of respondents (potential U.S. customers). We analyzed a random sample of 500 income-positive U.S. households interviewed in the 2004 survey. The variable of interest is each household's annual income in US dollars. The data can be accessed from the Federal Reserve's webpage: https://www.federalreserve.gov/econres/scfindex.htm. Descriptive statistics for these data are presented in Table 6. We compared the fits of the SMG and Pareto distributions with that of the three-parameter GHG distribution given in (2.1).

    Table 6.  Descriptive statistics for income data.
    n Median Mean SD CS CK
    500 54000 321021.9 3410936 21.127 461.575

     | Show Table
    DownLoad: CSV

    Figure 6 displays a boxplot revealing several extreme values. These outliers create a heavy right tail in the distribution. Notably, while most observations cluster around 5,400 (representing typical household income), there is an extreme value of 75 million in family income.

    Figure 6.  Boxplot for Descriptive statistics for income data.

    Table 7 presents the ML estimates for the parameters and log-likelihood values of the GHG, SMG, and Pareto models. Table 8 displays the corresponding values for the AIC, BIC, CAIC, and HQIC criteria for each model.

    Table 7.  Model, ML estimates, and Log-likelihood.
    Model ML estimates Log-likelihood
    GHG(σ,α,β) ˆσ=56417.5(2174.647), ˆα=0.5(0.015), ˆβ=756.154(62.525) -6466.192
    SMG(σ,α) ˆσ=22584.770(4027.346), ˆα=0.581(0.018) -6541.776
    Pareto(α,β) ˆα=260, ˆβ=0.186(0.008) -6802.113

     | Show Table
    DownLoad: CSV
    Table 8.  Model, AIC, BIC, CAIC, and HQIC values.
    Model AIC BIC CAIC HQIC
    GHG(σ,α,β) 12938.38 12951.03 12954.03 12943.35
    SMG(σ,α) 13087.55 13095.98 13097.98 13090.86
    Pareto(α,β) 13606.23 13610.44 13611.44 13606.05

     | Show Table
    DownLoad: CSV

    We observed that the GHG model yields the lowest values for the AIC, BIC, CAIC, and HQIC criteria, indicating that it provides a better fit to the data compared to both the SMG and Pareto models.

    Figure 7 shows the empirical cdf with estimated GHG, SMG, and Pareto cdf's, which also shows a favorable agreement between the GHG model and the income data.

    Figure 7.  Plots of the empirical cdf, with estimated GHG cdf, estimated SMG, and estimated Pareto cdf models.

    In this work, we introduce a new heavy-tailed distribution through a scale mixture extension of the G distribution, combining it with the Beta distribution to create the GHG model. We study some properties, perform parameter estimation by the ML method, and show two applications to real data. The GHG distribution contains the G and SMG distributions as special cases. This is of interest, since it can be used in actuarial sciences and related fields. The GHG model is a viable alternative for fitting data with extreme observations. Some other properties of the GHG model are:

    ● The GHG distribution exhibits a heavy right tail.

    ● The GHG distribution has two representations, as presented in Propositions 2.1 and 2.2.

    ● The pdf, cdf, and hazard function have explicit forms expressed through known special functions, such as the Gauss hypergeometric function.

    ● The hazards function is monotonically decreasing, which provides information about the distribution's tail behavior (specifically, its heavy right tail; see [28]).

    ● In the first application, we demonstrate that the GHG distribution effectively models flood peak exceedance data, outperforming two well-known distributions: the EP and Pareto distributions.

    ● In the second application, we show that the three-parameter GHG distribution provides a good fit for income data, surpassing the SMG and Pareto distributions.

    ● A more comprehensive estimation study of the GHG distribution's three parameters will be addressed in future work, where we will analyze the effects and limitations of the σ and β parameters in the ML estimation method.

    Conceptualization: N.M. Olmos and E. Gómez-Déniz; methodology: N.M. Olmos and E. Gómez-Déniz; software: N.M. Olmos and E. Gómez-Déniz; validation: N.M. Olmos, E. Gómez-Déniz, and O. Venegas; formal analysis: N.M. Olmos and O. Venegas; investigation: E. Gómez-Déniz and N.M. Olmos; writing---original draft preparation: N.M. Olmos and E. Gómez-Déniz; writing---review and editing: O. Venegas and E. Gómez-Déniz. All authors have read and approved the final version of the manuscript for publication.

    The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

    EGD was partially funded by grant PID2021-127989OB-I00 (Ministerio de Economía y Competitividad, Spain) and by grant TUR-RETOS2022-075 (Ministerio de Industria, Comercio y Turismo).

    Prof. Emilio Gómez-Déniz is the Guest Editor of special issue "Advances in Probability Distributions and Social Science Statistics" for AIMS Mathematics. Prof. Emilio Gómez-Déniz was not involved in the editorial review and the decision to publish this article.

    The authors declare no conflicts of interest.

    The first-order partial derivatives of (θ) with respect to θ are given by:

    (θ)β=1βκ0Φ(z,2,κ0)2F1(1,κ0,κ1;z),(θ)α=ψ(¯α)ψ(α)log(z)+κ0Φ(z,2,κ0)2F1(1,κ0,κ1;z).

    The second-order partial derivatives of l($θθ$) with respect to θ are given by:

    2(θ)β2=1β2+2κ0Φ(z,3,κ0)2F1(1,κ0,κ1;z)(κ0Φ(z,2,κ0)2F1(1,κ0,κ1;z))2,2(θ)βα=(κ0Φ(z,2,κ0)2F1(1,κ0,κ1;z))22κ0Φ(z,3,κ0)2F1(1,κ0,κ1;z),2(θ)α2=ψ(¯α)ψ(α)+2κ0Φ(z,3,κ0)2F1(1,κ0,κ1;z)(κ0Φ(z,2,κ0)2F1(1,κ0,κ1;z))2,

    where κi=¯α+β+i, ψ() and ψ() are the digamma and trigamma functions, respectively.

    [1] World Health Organization. WHO Air quality guidelines for particulate matter, ozone, nitrogen dioxide and sulfur dioxide: global update 2005: summary of risk assessment. Geneva: World Health Organization, 2006. Available from: http://whqlibdoc.who.int/hq/2006/WHO%7B_%7DSDE%7B_%7DPHE%7B_%7DOEH%7B_%7D06.02%7B_%7Deng.pdf?ua=1
    [2] Health Effects Institute. STATE OF GLOBAL AIR/2017, 2017. Available from: https://www.stateofglobalair.org/sites/default/files/SOGA2017_report.pdf
    [3] Chan TC, Chen ML, Lin IF, et al. (2009) Spatiotemporal analysis of air pollution and asthma patient visits in Taipei, Taiwan. Int J Health Geogr 8: 26. doi: 10.1186/1476-072X-8-26
    [4] Jerrett M, Burnett RT, Ma R, et al. (2005) Spatial analysis of air pollution and mortality in Los Angeles. Epidemiology 16: 727–736.
    [5] Lemke LD, Lamerato LE, Xu X, et al. (2014) Geospatial relationships of air pollution and acute asthma events across the Detroit–Windsor international border: Study design and preliminary results. J Expo Sci Environ Epidemiol 24: 346–357. doi: 10.1038/jes.2013.78
    [6] Koenig JQ (1999) Air pollution and asthma. J Allergy Clin Immunol 104: 717–722. doi: 10.1016/S0091-6749(99)70280-0
    [7] D'Amato G (2002) Environmental urban factors (air pollution and allergens) and the rising trends in allergic respiratory diseases. Allergy 57: 30–33. doi: 10.1034/j.1398-9995.57.s72.5.x
    [8] Khatri SB, Holguin FC, Ryan PB, et al. (2009) Association of ambient ozone exposure with airway inflammation and allergy in adults with asthma. J Asthma 46: 777–785. doi: 10.1080/02770900902779284
    [9] Lin S, Liu X, Le LH, et al. (2008) Chronic exposure to ambient ozone and asthma hospital admissions among children. Environ Health Perspect 116: 1725–1730. doi: 10.1289/ehp.11184
    [10] Lu H, Qiu F, Cheng Y (2003) Temporal and Spatial Relationship of Ozone and Asthma. In: Esri International User Conference. p. 14. Available from: http://proceedings.esri.com/library/userconf/proc03/p0911.pdf
    [11] Kumar N, Liang D, Comellas A, et al. (2013) Satellite-based PM concentrations and their application to COPD in Cleveland, OH. J Expo Sci Environ Epidemiol 23: 637–646.
    [12] Schwartz J (2004) Air pollution and children's health. Pediatrics 113: 1037–1043.
    [13] D'Amato G, Liccardi G, D'Amato M, et al. (2005) Environmental risk factors and allergic bronchial asthma. Clin Exp Allergy 35: 1113–1124. doi: 10.1111/j.1365-2222.2005.02328.x
    [14] D'Amato G, Cecchi L, D'Amato M, et al. (2010) Urban air pollution and climate change as environmental risk factors of respiratory allergy: an update. J Investig Allergol Clin Immunol 20: 95–102.
    [15] World Health Organization, Health is the key in motivating to solve environmental problems. World Health Organization, 2014.
    [16] Davenhall B (2013) Geomedicine. Redlands, CA: Environmental Systems Research Institute, 32. Available from: http://www.esri.com/library/ebooks/geomedicine.pdf
    [17] Davenhall B (2010) The Missing Component. ArcUser, 10–11. Available from: http://www.esri.com/news/arcuser/0110/files/geomedicine.pdf
    [18] CDC (2010) Asthma's Impact on the Nation, Data from the CDC National Asthma Control Program, 1–4.
    [19] McLafferty S (2003) GIS and health care. Annu Rev Public Health 24: 25–42. doi: 10.1146/annurev.publhealth.24.012902.141012
    [20] MSDH (2011) 2011–2015 Mississippi State Asthma Plan.
    [21] Roy SR, McGinty EE, Hayes SC, et al. (2010) Regional and racial disparities in asthma hospitalizations in Mississippi. J Allergy Clin Immunol 125: 636–642.
    [22] Carter LM, Jones JW, Berry L, et al. (2014) Climate Change Impacts in the United States: The Third National Climate Assessment.
    [23] Portier CJ, Thigpen TK, Carter SR, et al. (2010) A Human Health Perspective on Climate Change: A Report Outlining the Research Needs on the Human Health Effects of Climate Change. Research Triangle Park, NC: Environmental Health Perspectives and the National Institute of Environmental Health Sciences.
    [24] NRDC (2012) Toxic Power: How Power Plants Contaminate Our Air and States. Natural Resources Defense Council Environmental News.
    [25] McMillin N, Listen to the lungs-The Daily Mississippian. The Daily Mississippian.
    [26] Geography UCB, Guide to State and Local Geography-Mississippi, 2014. Available from: https://www.census.gov/geo/reference/guidestloc/st28_ms.html
    [27] The National Science and Technology Council, Air Quality Observation Systems in the United States, 2013. Available from: https://obamawhitehouse.archives.gov/sites/default/files/.../air_quality_obs_2013.pdf
    [28] U.S. Census Bureau (2008) A Compass for Understanding and Using american Community Survey Data: What General Data Users Need to Know. Washington DC: U.S. Government Printing Office.
    [29] U.S. Census Bureau (2008) How Poverty is Calculated in the ACS.
    [30] Kurland KS, Gorr WL (2012) GIS Tutorial for Health. Fourth edi. Redlands: Esri Press.
    [31] MDEQ, Monitoring Network Plan-2014, 13.
    [32] ESRI (2003) The principles of geostatistical analysis. In: ArcGIS 9 Using ArcGIS Geostatistical Analyst. Redlands, CA: Esri Press, 49–78.
    [33] Environmental Systems Research Institute I. How Kriging works-Help | ArcGIS for Desktop, 2016. Available from: http://desktop.arcgis.com/en/arcmap/10.3/tools/3d-analyst-toolbox/how-kriging-works.htm#
    [34] Susanto F, de Souza P, He J (2016) Spatiotemporal Interpolation for Environmental Modelling. Sensors (Basel) 16. Available from: http://www.ncbi.nlm.nih.gov/pubmed/27509497
    [35] Mukaka MM (2012) Statistics corner: A guide to appropriate use of correlation coefficient in medical research. Malawi Med J 24: 69–71.
    [36] The Economist explains, Why are so many people leaving the Mississippi Delta? 2013. Available from: https://www.economist.com/blogs/economist-explains/2013/10/economist-explains-12
    [37] Malik HU, Kumar K, Frieri M (2012) Minimal difference in the prevalence of asthma in the urban and rural environment. Clin Med Insights Pediatr 6: 33–39.
    [38] USA Today. Miss. coast shows modest population gains post-Katrina, 2011. Available from: http://usatoday30.usatoday.com/news/nation/census/2011-02-03-census-mississippi%7B_%7DN.htm
    [39] CDC(2013) Introduction to Hotspot Analysis.
    [40] Valet RS, Perry TT, Hartert TV (2009) Rural health disparities in asthma care and outcomes. J Allergy Clin Immunol 123: 1220–1225. doi: 10.1016/j.jaci.2008.12.1131
    [41] Luvall J, Quattrochi D, Rickman D (2015) Boundary Layer (Atmospheric) and Air Pollution. In: North GR, Pyle J, Zhang F, editors. Encyclopedia of Atmospheric Sciences. 2nd edition. Elsevier Ltd, 310–318.
    [42] Sears MR (2008) Epidemiology of asthma exacerbations. J Allergy Clin Immunol 122: 662–668. doi: 10.1016/j.jaci.2008.08.003
    [43] Sears MR, Johnston NW (2007) Understanding the September asthma epidemic. J Allergy Clin Immunol 120: 526–529. doi: 10.1016/j.jaci.2007.05.047
    [44] Minnesota Department of Health, Asthma Hospitalizations Peak in September, 2008. Available from: http://www.health.state.mn.us/asthma/documents/08asthmahosppeaksept.pdf
    [45] Federal Register (2013) National Ambient Air Quality Standards for Particulate Matter. 40 CFR Parts 50, 51, 52, 53 and 58 United States of America, 3086–3287.
    [46] Rona RJ (2000) Asthma and poverty. Thorax 55: 239–244. doi: 10.1136/thorax.55.3.239
    [47] Zheng J (2011) Shacks in the Mississippi Delta. Arkansas Rev AJ Delta Stud 42: 30–33.
    [48] Eudy RL (2009) Infant Mortality in the Lower Mississippi Delta: Geography, Poverty and Race. Matern Child Health J 13: 806–813. doi: 10.1007/s10995-008-0311-y
    [49] Abrokwa A, Chan A, Ha M (2010) Coverage Is Not Enough: Health Care Reform through the Lens of the Mississippi Delta. Kennedy Sch Rev. Available from: https://www.highbeam.com/doc/1G1-247740034.html
    [50] Seltenrich N (2014) Remote-Sensing Applications for Environmental Health Research. Environ Health Perspect 122: 268–275.
    [51] Gotway CA, Young LJ (2002) Combining incompatible spatial data. J Am Stat Assoc 97: 632–648. doi: 10.1198/016214502760047140
    [52] Pratt M, Moore H, Craig T (2014) Solving a Public Health Problem Using Location-Allocation. ArcUser, 56–59. Available from: http://www.esri.com/~/media/Files/Pdfs/news/arcuser/0614/solving-a-public-health-problem.pdf%5Cnhttp://www.esri.com/~/media/Files/Pdfs/news/arcuser/0614/summer-2014.pdf
  • Reader Comments
  • © 2018 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(5721) PDF downloads(887) Cited by(3)

Figures and Tables

Figures(12)  /  Tables(2)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog