1.
Introduction
The heavy-tailed distribution is an important model in extreme value statistics with applications in finance, insurance, meteorology, and hydrology. Its primary parameter is the positive extreme value index (also known as extreme value index of heavy-tailed distribution, abbreviated as heavy-tailed index), which characterizes the probability of extreme events such as the catastrophic flood on the 100-year return period, enormous earthquakes, large insurance claims, and so on (see [1,2]). Therefore, estimators for the extreme value index has become one of the main research problems in extreme value statistics.
The estimation of the extreme value index started earlier. However, in the 1970s, researchers started working on semi-parametric estimators of extreme value index. The seminal and most famous estimator is the Hill estimator [3], which is favored by many scholars for its simplicity in form; however, it is sensitive to the threshold k. In general, the Hill estimator has a large variance for small values of k, while large values of k usually induce a high bias. This means that an inadequate selection of k can result in large expected errors. Therefore, the threshold selection is also one of the most fundamental problems in extreme value statistics. The existing methods include graphical diagnostics and heuristic procedures based on sample paths' stability as a function of k, as well as minimization of the estimator of the mean square error, also as functions of k. More comprehensive reviews about threshold selection can be found in [4,5].
Afterward, Dekkers et al. [6] proposed a moment estimator by constructing a moment statistic, which improved the Hill estimator. By use of the moment statistic, many estimators for the extreme value index have been proposed, such as the moment ratio estimator [7], the estimator proposed by Gomes and Martins [8], the estimator proposed by Caeiro and Gomes [9], and Lehmer's mean-of-order-p(Lp) estimator [10]; more estimators can be found in [11,12]. As a further study, based on the moment statistic, this paper constructs a class of heavy-tailed index estimators with four parameters. Many estimators can be obtained from specific parameter values, which include not only the existing estimators in the literature but also newly derived ones. The consistency and asymptotic normality of the proposed estimators are investigated under the first-order regular variation and second-order regular variation conditions. The asymptotical unbiasedness of the specified estimators is discussed, and a comparison of the main components of the asymptotic variance is performed in the asymptotical unbiased estimators. In addition, their finite sample performance is analyzed through a Monte-Carlo simulation in terms of the simulated mean value and mean square error. The results show that some of the new estimators perform better.
Let X be a non-negative random variable with a distribution function F(x), for sufficiently large x>0, satisfying
we call F(x) a heavy-tailed distribution, and γ is the heavy-tailed index, one of the primary parameters of extreme events, where L(x)>0 is a slow varying function at infinity, that is, for all t>0,
If a positive measure function g(x) has infinite right endpoints and satisfies that for all t>0, there are
then g(x) is said to be a regularly varying function with index α, denoted g∈Rα.
Let us denote U(t):=F←(1−1/t)t>1 as the tail quantile function, and F←(t):=inf{x:F(x)≥t},0<t<1 as a generalized inverse function of F(x); from [13], the heavy-tailed distribution F(x) has the following equivalence relation
In general, the equivalence relation (1.2) is called the first-order regular variation condition.
The classical heavy-tailed index estimators are all composed of the number k of the top order statistics, whose consistency are obtained under the first-order regular variation condition (1.2) and the following condition on the sequences k,
In general, we call this k intermediate if it satisfies (1.3).
In addition, in order to obtain the asymptotic normality of the heavy-tailed index estimator, the second-order regular variation condition is often needed. The F(x) is said to satisfy the second-order regular variation condition if there exists an eventually positive regular varying function A with index ρ≤0, that is |A(t)|∈Rρ, such that
for every x>0, where ρ is the second-order parameter. In this paper, we only consider the case ρ<0.
To construct new estimators, we shall consider the following moment statistic and give some heavy-tailed index estimators.
Let X1,X2,⋯,Xn be a sample of n independent observations from a common unknown distribution function F(x), X1,n⩽X2,n⩽⋯⩽Xn,n denote the ascending order statistics associated with the sample X1,X2,⋯,Xn.
Let us consider the moment statistic
Now we introduce several heavy-tailed index estimators constructed from the moment statistic.
∙ The Hill estimator is expressed as follows:
∙ The monent estimator is expressed as follows:
The moment estimator is an asymptotical unbiased estimator when γ,ρ satisfy γ−γρ+ρ=0.
∙ The moment ratio estimator is expressed as follows:
∙ The estimators in [8] are expressed as follows:
and
where for any ρ<0, when α>2, there exists α0 satisfying (1−ρ)α0−1[1+ρ(α0−2)]=1, such that GM1(α0) is an asymptotical unbiased estimator. In addition, note that the estimator GM1(α) becomes the moment ratio estimator when α=2.
∙ The estimator in [9] is expressed as follows:
where for any ρ<0, CG(α) is an asymptotical unbiased estimator when α=α0=−ln[1−ρ−√(1−ρ)2−1]ln(1−ρ).
∙ The Lp estimator in [10] is expressed as follows:
where the Lp estimator becomes the moment ratio estimator when p=2.
2.
New estimators
Based on the moment statistic (1.5), we construct a class of heavy-tailed index estimators with four parameters, whose expression is as follows:
where a,b≥0,α,β∈R, M(a)n(k) and H(k), see (1.5) and (1.6), respectively.
Before discussing the properties of the proposed estimator, the following two lemmas are introduced.
Lemma 2.1. [8] Let X1,X2,⋯,Xn be a sample of n independent observations from a common unknown distribution function F(x), X1,n⩽X2,n⩽⋯⩽Xn,n denote the ascending order statistics associated with the sample X1,X2,⋯,Xn. If F(x) satisfies the first-order regular variation condition (1.2) and k is intermediate, then the moment statistic
Lemma 2.2. [8] Let X1,X2,⋯,Xn be a sample of n independent observations from a common unknown distribution function F(x), X1,n⩽X2,n⩽⋯⩽Xn,n denote the ascending order statistics associated with the sample X1,X2,⋯,Xn. If F(x) satisfies the second-order regular variation condition (1.4) and k is intermediate, then the moment statistic has the following asymptotic distributional representation
where bM(α)=Γ(α+1)ρ(1(1−ρ)α−1), σM(α)=√Γ(2α+1)−Γ2(α+1), Z(α)k:=1kk∑i=1(Eαi−Γ(α+1))σM(α)/√k is an asymptotical standard normal random variable, {Ei}, i=1,2,⋯,n are independent identically distributed standard exponential random variables, and the covariance of Z(α)k and Z(β)k is denoted by
The consistency and asymptotic normality of the new estimators are discussed below.
Theorem 2.1. Let X1,X2,⋯,Xn be a sample of n independent observations from a common unknown distribution function F(x), X1,n⩽X2,n⩽⋯⩽Xn,n denote the ascending order statistics associated with the sample X1,X2,⋯,Xn. If F(x) satisfies the first-order regular variation condition (1.2) and k is intermediate, then
Proof. According to Lemma 2.1 and the continuous mapping theorem, we have H(k)P⟶γ, (M(a)n(k)Γ(a+1)Ha(k))αP⟶1, and (Γ(b+1)Hb(k)M(b)n(k))βP⟶1. Using the continuous mapping theorem again, we obtain ˆγMG(k,a,b,α,β)P⟶γ. □
Theorem 2.2. Let X1,X2,⋯,Xn be a sample of n independent observations from a common unknown distribution function F(x), X1,n⩽X2,n⩽⋯⩽Xn,n denote the ascending order statistics associated with the sample X1,X2,⋯,Xn. If F(x) satisfies the second-order regular variation condition (1.4) and k is intermediate, then
where
is an asymptotical standard normal random variable.
Proof. By Lemma 2.2, we obtain
and
Using (1+x)a=1+ax+o(x),x→0, one gets
From 11+x=1−x+o(x),x→0, it follows that
Combining with (2.6) and (2.7), we obtain
Applying (1+x)α=1+αx+o(x),x→0 again, we obtain
Similar to the proof of (2.8), we have
Thus, combining with (2.5), (2.8), and (2.9), it follows that
From the above equation, we can see that
Let A=αγΓ(a+1), B=βγΓ(b+1) and C=(βb−αa+1)γ, we have
From Lemma 2.2, one can deduce that {Yi}, i=1,2,⋯,k are independent, identically distributed random variables. It is easy to obtain that E(Yi)=AΓ(a+1)+BΓ(b+1)+C and Var(Yi)=σ2MG(a,b,α,β). Applying Lindeberg-Lévy central limit theorem, we get that PMGk is an asymptotical standard normal random variable, and
□
Corollary 2.1. Under the conditions of Theorem 2.2, suppose that √kA(n/k)→λ<+∞,asn→+∞, then
Remark 1. For every ρ<0, if there exist a,b,α,β, such that bMG(a,b,α,β)=0, then the corresponding MG(a,b,α,β) is an asymptotical unbiased estimator of γ, even when λ≠0.
3.
Specific expression of new estimators
The estimator MG(a,b,α,β) has four parameters, and for the sake of dealing with practical problems, one can consider specifying the parameters to obtain different specific estimator expressions. We give ten specific estimators below.
(E1) MG(a,b,0,0)=H(k):=M(1)n(k);
(E2) MG(a,b,1,0)=GM1(a):=M(a)n(k)Γ(a+1)[M(1)n(k)]a−1,a>0;
(E3) MG(a,b,1/a,0)=GM2(a):=(M(a)n(k)Γ(a+1))1/a,a>0;
(E4) MG(2a,a−1,1/2,1)=CG(a):=Γ(a)M(a−1)n(k)(M(2a)n(k)Γ(2a+1))1/2,a≥1;
(E5) MG(a,a−1,1,1)=La(k):=M(a)n(k)aM(a−1)n(k),a≥1;
(E6) MG(2a,a,1/2,1)=H(k)(M(2a)n(k)Γ(2a+1))1/2Γ(a+1)M(a)n(k);
(E7) MG(2a,a,1/2,0)=H(k)(M(2a)n(k)Γ(2a+1)H2a(k))1/2;
(E8) MG(2,b,a,0)=H(k)(M(2)n(k)2H2(k))a;
(E9) MG(a,2,1,a/2)=2a/2H(k)M(a)n(k)Γ(a+1)(M(2)n(k))a/2;
(E10) MG(2,a,1,−1)=M(2)n(k)M(a)n(k)2Γ(a+1)Ha+1(k).
Among the above-mentioned ten estimators, (E1)–(E5) are existing estimators and (E6)–(E10) are new estimators. As described in the literature [3,8,9,10], (E1), (E3), and (E5) are asymptotical biased estimators, while (E2) and (E4) are asymptotical unbiased estimators when appropriate parameters are chosen. The asymptotic unbiasedness of the new estimators (E6)–(E10) is discussed below.
Let f(x)=1−(1−ρ)xρ(1−ρ)x,x>0,ρ<0, then f′(x)=−ln(1−ρ)ρ1(1−ρ)x>0. Therefore, f(x) is an increasing function for a given ρ.
Thus, the main component of the asymptotic bias of the new estimator bMG(a,b,α,β) in Theorem 2.2 can be rewritten as
Since the main component of the asymptotic bias of the estimator (E6)–(E10) involves only one parameter, it can be abbreviated as bMGi(a),i=6,7,8,9,10 for convenience. The asymptotic unbiasedness of the estimators (E6)–(E10) is discussed below.
For the estimator (E6), the main component of the asymptotic bias is bMG6(a)=12f(2a)−f(a)+f(1). Since dbMG6(a)da=f′(2a)−f′(a)=−ln(1−ρ)ρ(1(1−ρ)2a−1(1−ρ)a)<0, bMG6(a) is a decreasing function on a for a given ρ<0. Also, bMG6(0)=11−ρ>0, and lima→+∞bMG6(a)=1+ρ2ρ(1−ρ). It is easy to see that 1+ρ2ρ(1−ρ)<0 when 1+ρ>0. There exists a0 such that bMG6(a0)=0, i.e., the estimator (E6) with the parameter a0 is an asymptotical unbiased estimator.
For the estimator (E7), the main component of the asymptotic bias is bMG7(a)=12f(2a)+(1−a)f(1). Since bMG7(0)=11−ρ>0, and lima→+∞bMG7(a)=−∞, there always exists a0 such that bMG7(a0)=0, i.e., the estimator (E7) with the parameter a0 is asymptotically unbiased, where a0 satisfies (1−ρ)2a0−1[(2a0−3)ρ+1]=1.
For the estimator (E8), the main component of the asymptotic bias is bMG8(a)=af(2)+(1−2a)f(1). Since bMG8(0)=11−ρ>0, and lima→+∞bMG8(a)=−∞, there always exists a0 such that bMG8(a0)=0, i.e., the estimator (E8) with the parameter a0 is asymptotically unbiased, where a0=1−1/ρ.
For the estimator (E9), the main component of the asymptotic bias is bMG9(a)=f(a)−a2f(2)+f(1). Since bMG9(0)=11−ρ>0, and lima→+∞bMG9(a)=−∞, there always exists a0 such that bMG9(a0)=0, i.e., the estimator (E9) with the parameter a0 is asymptotically unbiased, where a0 satisfies (1−ρ)a0−2[(4−a0)ρ2+(2a0−6)ρ+2]=2.
For the estimator (E10), the main component of the asymptotic bias is bMG10(a)=f(a)+f(2)−(1+a)f(1). Since bMG10(0)=f(2)−f(1)>0, and lima→+∞bMG10(a)=−∞, there always exists a0 such that bMG10(a0)=0, i.e., the estimator (E10) with the parameter a0 is asymptotically unbiased, where a0 satisfies (1−ρ)a0−2[(1−a0)ρ2+(a0−3)ρ+1]=1.
In summary, the estimator (E6) can be asymptotically unbiased only if it satisfies 1+ρ>0 and a suitable a is chosen. However, for any ρ<0, one can always find a such that the estimators (E7)–(E10) are asymptotically unbiased and the value of a depends only on ρ.
Next, in the above-mentioned asymptotical unbiased estimators, we compare the main components of their asymptotic variances σ2MG(a,b,α,β). The asymptotical unbiased estimators (E2), (E4), (E7), (E8), (E9), and (E10) are considered here. For convenience, the main components of the asymptotic variance of the estimators involved are denoted as σ2i,i=2,4,7,8,9,10. Given different values of ρ, we compute the ratio of σ2i,i=2,4,7,8,9,10 to γ2 in the compared estimators (the ratio depends only on ρ) and the corresponding values of a0. The results of the calculations are shown in Table 1. From Table 1, we can draw the following conclusions.
1) All ratios decrease as ρ decreases.
2) For a given ρ, σ22/γ2, σ28/γ2 and σ210/γ2 are smaller, followed by σ24/γ2 and σ27/γ2, and finally σ29/γ2.
3) For σ22/γ2, σ28/γ2 and σ210/γ2, when ρ>−1, σ28/γ2<σ210/γ2<σ22/γ2; when ρ<−1, σ210/γ2<σ28/γ2<σ22/γ2.
Overall, among the asymptotical unbiased estimators compared, the estimators (E8) and (E10) perform better with smaller values of asymptotic variance.
4.
Comparison of finite sample properties
In order to investigate the performance of the estimators (E2), (E4), (E7)–(E10) mentioned in the previous section in the finite sample case, Monte-Carlo simulations are used to generate N=100 samples of sample size n=1000 from the following model.
1) Frˊechet(γ) model, F(x)=exp(−x−1/γ),x≥0,ρ=−1;
2) Burr(γ,ρ) model, F(x)=1−(1+x−ρ/γ)1/ρ,x≥0.
For convenience, the estimators involved are denoted as ˆγi,i=2,4,7,8,9,10. And for each estimator, we computed the simulated mean value (E) and mean square error (MSE) of the estimators. The calculation formulas are as follows:
From the discussion in the previous section, it can be seen that the parameter values of the asymptotic unbiased estimators ˆγi,i=2,4,7,8,9,10 only depend on the parameter ρ. Therefore, the following simulations will be divided into two cases: ρ is known and ρ is unknown. For the case where ρ is unknown, we will adopt the approach of the literature [14,15] to give an estimator of ρ, which in turn can be used to obtain an estimator of the parameter a0 in the estimators compared. Specifically, the following is discussed.
We estimate the parameter ρ by the following estimator proposed in [14].
where
To decide which value (0 or 1) of the parameter τ to take in the aforementioned estimator, we use the algorithm provided in [15]. And for the estimator ˆρ, following the recommendation from [15], we use k=min{n−1,[2n0.995/ln(ln(n))]}.
The simulation results are shown in Figures 1–4. Figure 1 shows the simulated mean values and MSEs of the estimators involved under study for sample of size n=1000 from the Frˊechet(1) model when ρ is unknown. The simulated mean values show that all estimators are almost asymptotically unbiased. Regarding the simulated MSEs, for almost all values of k, we can see that
and MSE(ˆγ10) is almost equal to MSE(ˆγ8). This is generally consistent with the theoretical analysis.
Figure 2 is the equivalent of Figure 1 when ρ is known and similar to Figure 7 in [9]. The simulated mean values show that all estimators are asymptotically unbiased. Regarding the simulated MSEs, for every k, we can see that
In addition, MSE(ˆγ10) is almost equal to MSE(ˆγ8) and MSE(ˆγ4) is closer to MSE(ˆγ7). This is consistent with the theoretical analysis.
Figures 3 and 4 show the simulated mean values and MSEs of the estimators involved under study for sample of size n=1000 from the Burr(1, -1) model when ρ is unknown and known, respectively. They perform essentially the same in terms of the simulated mean value and MSEs. The simulated mean values show that the estimators seem to be asymptotically unbiased for some of the values of k.
At this time, for most of the values of k, it is clear from the simulated MSEs that
For other different values of Burr's model, the conclusions obtained are similar to the results of the above analysis.
In the following, the above-mentioned estimators are considered to be compared in terms of the simulated mean value and mean square error at the optimal level k0,
where ∙ denote ˆγi,i=2,4,7,8,9,10. The estimator of the parameter α is given by the estimator of ρ in (4.1). And the estimators ˆγi,i=2,4,7,8,9,10 are denoted as ˆγi(αi0,ki0),i=2,4,7,8,9,10 at the optimal level. We have implemented Monte-Carlo simulation experiments of size 100 for sample sizes n=100,200,500,1000, and 2000, from the Frˊechet and Burr models. For each model, mean values with the smallest squared bias and the smallest root mean square error (RMSE) are presented in bold, where RMSE:=√MSE. The simulated results are shown in Tables 2–5.
From Tables 2–5, we now provide a few comments.
1) For all simulated models, the simulated values of ˆγ8(α80,k80) and ˆγ10(α100,k100) are almost equal especially in terms of RMSE.
2) For models with ρ≥−1, ˆγ4(α40,k40) has a smaller squared bias than other estimators in terms of the simulated mean value.
3) For models with ρ<−1, ˆγ7(α70,k70) has a smaller squared bias than other estimators in terms of the simulated mean value. However, ˆγ9(α90,k90) has a smaller RMSE than other estimators when k≥500.
Overall, the new estimators perform well within a certain range.
5.
Conclusions
In extreme value statistics, the estimator for the heavy-tailed index is one of the most important current research topics. By means of the moment statistic, this paper constructs a class of heavy-tailed index estimators with four parameters. The consistency and asymptotic normality of the proposed estimators are proved under first-order and second-order regular variation conditions. For the proposed estimators, ten estimators are given by the specific values of the parameters, which include both the existing estimators in the literature and new estimators. The asymptotic unbiasedness is discussed for specific new estimators. In the asymptotical unbiased estimators, some of the new estimators are compared with the existing ones in terms of asymptotic variance, and the new estimators perform better. In the finite sample case, the simulated mean and mean square error of the estimators compared were calculated by Monte-Carlo simulation. The results show that the size relationship of the simulated mean square error is consistent with the theoretical analysis in asymptotical unbiased estimators. In addition, we compare the simulated mean value and mean square error of the mentioned estimators at the optimal level. It is concluded that the new estimators perform better within a certain range. Although we propose a class of parameterized heavy-tailed index estimators, the selection of parameters is still an open problem and will require further study.
Use of AI tools declaration
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
Acknowledgments
The authors would like to express their gratitude to the editor and two anonymous reviewers for their valuable comments and suggestions, which have greatly improved the quality of the paper. This work was supported by the National Natural Science Foundation of China (12001395).
Conflict of interest
The authors declare there is no conflicts of interest.