1.
Introduction
Let X be a p-dimensional random vector. Estimating its covariance matrix Σ=(σuv)p×p is of interest in high-dimensional statistics (Mendelson and Zhivotovskiy [1], Dendramis et al. [2] and Zhang et al. [3]). Until now, a commonly adopted strategy for evaluating the covariance matrix has been to impose sparse structure on itself (Belomestny[4], Kang and Deng[5], Bettache et al.[6] and Liang et al. [7]).
If X is sub-Gaussian, Bickel and Levina[8], Cai and Liu[9] and Cai and Zhou[10] considered all the rows or columns of the covariance matrix belonging to lq-ball, weighted lq-ball or weak lq-ball as a kind of sparse assumption. Moreover, they proposed the corresponding thresholding estimators and established the convergence rates in sense of probability or expectation respectively.
When each component of X=(X1,⋯,Xp)T is subject to heavy-tailed distribution, i.e., the distribution of Xu satisfies ∫RetxdFu(x)=∞ for t>0, Avella-Medina et al.[11] introduced a pilot estimator ˜Σ=(˜σuv)p×p satisfying
for positive constant C0 and logp=o(n). Where εn,p is a deterministic positive sequence and satisfies limn,p→∞εn,p=0. Avella-Medina et al. pointed out the sample covariance matrix
must be a pilot estimator if X1,⋯,Xn are i.i.d. sub-Gaussian random samples. In addition, some other pilot estimators were provided under bounded fourth moment assumption. The authors also considered convergence rate of the thresholding pilot estimator in terms of probability when the rows or columns of the covariance matrix are in weighted lq-ball.
However, missing data (also called incomplete data) always occurs in high-dimensional sampling setting, see Hawkins et al.[12], Lounici[13] and Loh and Wain-wright[14]. Instead of obtaining whole i.i.d. samples X1,⋯,Xn, one can only collect some parts of them. Let the vector Si∈{0,1}p(i=1,⋯,n) denote by
where Xiu and Siu are the u-th coordinate of the Xi and Si respectively. This paper denotes the samples with missing values by X∗i=(X∗i1,⋯,X∗ip)T where X∗iu=XiuSiu. The following missing mechanism introduced by Cai and Zhang[15] is adopted.
Assumption 1.1 (Missing completely at random). S={S1,⋯,Sn} can be either deterministic or random and is independent of X={X1,⋯,Xn}.
Define
i.e., n∗uv is the number of the u-th and v-th entries of X∗i being both observed. For convenience, let
Then, it is easy to see
Meanwhile, the generalized sample mean ˉX∗=(ˉX∗u)1≤u≤p is defined by
and the generalized sample covariance matrix ˆΣ∗=(ˆσ∗uv) is given by
Our goal is to construct the thresholding estimator of the sparse covariance matrix Σ based on incomplete heavy-tailed data. Furthermore, the convergence rates of the thresholding estimator are investigated in terms of probability and expectation respectively.
The rest of paper is organized as follows. Section 2 introduces the definition of generalized pilot estimator based on missing data. Then under bounded fourth moment assumption, two kinds of generalized pilot estimators are given. In Section 3, we construct the thresholding generalized pilot estimator and explore its convergence rates in sense of probability under spectral and Frobenius norms respectively. In Section 4, the convergence rates are given in terms of expectation under an extra mild condition. Then, Section 5 investigates the numerical performances of the thresholding generalized Huber pilot estimator and thresholding generalized truncated mean pilot estimator respectively and compares these two estimators with the adaptive thresholding estimator proposed by Cai and Zhang [15].
2.
Generalized pilot estimator
Definition 2.1. Any symmetric matrix ˜Σ∗=(˜σ∗uv)p×p based on incomplete data X∗1,⋯,X∗n is said to be a generalized pilot estimator of Σ, if for L>0 there exists constant C0(L) such that
holds with logp=o(n∗min).
Remark 2.1. If one can obtain complete data, the generalized pilot estimator defined by (2.1) coincides with the pilot estimator proposed by Avella-Medina et al.[11] except εn,p is replaced by O(p−L).
Remark 2.2. If X is a sub-Gaussian random vector and the items σuu(u=1,⋯p) of Σ are uniformly bounded, the generalized sample covariance matrix ˆΣ∗ given by (1.1) must be a generalized pilot estimator of Σ.
In fact, Theorem 3.1 in [15] tells that for any 0<x≤1 there exists constants C,c>0 such that
By n∗min≤n∗uv and logp=o(n∗min), one knows logp=o(n∗uv). If x=√(2+L)logp/(cn∗uv) with L>0, (2.2) reduces to
with C0(L)=√(2+L)σuuσvv/c.
Furthermore,
Therefore, ˆΣ∗ given by (1.1) is a generalized pilot estimator of Σ.
We introduce the following theorem in order to provide two other kinds of generalized pilot estimators under bounded fourth moment assumption.
Theorem 2.1. Suppose max1≤u≤pE|Xu|4≤k4, logp=o(n∗min), EXu=μu,E(XuXv)=μuv, and Assumption 1.1 holds. If ˜μ∗u and ˜μ∗uv satisfy
with absolute constants L>0 and c≥1 then ˜Σ∗=(˜σ∗uv)p×p:=(˜μ∗uv−˜μ∗u˜μ∗v)p×p must be a generalized pilot estimator of Σ.
Proof. Let K:=ck√2+L. Thus,
thanks to condition (2.3). Moreover,
Similarly, one derives
due to (2.4) and c≥1.
By max1≤u≤pE|Xu|4≤k4, one obtains |μu|≤(E|Xu|4)1/4≤k(u=1,⋯,p) and
the above last inequality follows from K≥ck. Thus, one concludes
thanks to (2.5) and (2.6).
Since n∗uv≤min{n∗u,n∗v}, the above result reduces to
Furthermore, according to logp=o(n∗min) and n∗min≤n∗uv, one knows logp=o(n∗uv). Therefore, (logp)/n∗uv≤((logp)/n∗uv)1/2 and
hold.
Note that
Then,
follows from (2.7) and (2.8).
Hence, ˜Σ∗ is a generalized pilot estimator of Σ with C0(L)=2ck2(1+c)(2+L). □
We shall give two generalized pilot estimators based on incomplete heavy-tailed samples.
Denote Huber function by
where α>0 and ψ(x)={x,|x|≤1,signx,|x|>1. For any constant L>0, let (˜μ∗H)u(u=1,⋯,p) satisfy
with αu:=√n∗uζ2/(2+L)logp and ζ≥√DXu. Similarly, (˜μ∗H)uv(u,v=1,⋯,p) satisfies
with αuv:=√n∗uvζ21/(2+L)logp and ζ1≥√D(XuXv). Then, we have the following estimator.
Example 2.1. (Generalized Huber estimator). Suppose conditions of Theorem 2.1 hold, then ˜Σ∗H:=((˜μ∗H)uv−(˜μ∗H)u(˜μ∗H)v)p×p is a generalized pilot estimator of Σ, where (˜μ∗H)j(j=u,v) and (˜μ∗H)uv are defined by (2.9) and (2.10).
Proof. With the definition of X∗iu, (2.9) is equivalent to
where Au={i:Siu≠0}. Obviously, |Au|=∑ni=1Siu. By the definition of n∗u, we have |Au|=n∗u.
Similarly, we find (2.10) is equivalent to
with Auv={i:SiuSiv≠0} and |Auv|=n∗uv.
By max1≤u≤pE|Xu|4≤k4, we get
On the other hand,
due to Cauchy-Schwarz inequality. Thus,
Obviously, it holds
thanks to n∗min≤n∗uv≤n∗u and logp=o(n∗min).
According to (2.11)–(2.13) and Theorem 5 in [16], we know that if (2+L)logp/n∗u≤1/8 and (2+L)logp/n∗uv≤1/8 then
i.e., (˜μ∗H)u and (˜μ∗H)uv reach the expected results (2.3) and (2.4).□
In order to give another generalized pilot estimator, let (˜μ∗T)u(u=1,⋯,p),(˜μ∗T)uv(u,v=1,⋯,p) be defined by
respectively where L>0, β≥√E|Xu|2 and β1≥√E|XuXv|2. Then, we have the second estimator.
Example 2.2 (Generalized truncated mean estimator). Suppose conditions of Theorem 2.1 hold. Then, ˜Σ∗T:=((˜μ∗T)uv−(˜μ∗T)u(˜μ∗T)v)p×p is a generalized pilot estimator of Σ, where (˜μ∗T)j(j=u,v) and (˜μ∗T)uv are defined by (2.14) and (2.15).
Proof. We first show (˜μ∗T)u satisfies (2.3). According to max1≤u≤pE|Xu|4≤k4, we have
So, (2.14) is equivalent to
where Au={i:Siu≠0}. Let a:=k√n∗u/(2+L)logp. We derive
Therefore, upon combining E|Xu|2≤k2 and |Au|=n∗u
According to E(X2iu1{|Xiu|≤a})≤k2 and Bernstein's inequality in [17],
for any t>0.
By (2.16) and (2.17) and taking t=(2+L)logp, we have
Substituting a=k√n∗u/(2+L)logp into the above inequality, we know
which is the expected condition (2.3) of Theorem 2.1.
Similarly, we can derive
i.e., the condition (2.4) of Theorem 2.1 holds.□
3.
Convergence rates in terms of probability
We introduce the thresholding function and the space of sparse covariance matrices.
Definition 3.1. For any constant λ>0, a real valued function τλ(⋅) is said to be thresholding function if
(i) τλ(z)=0,|z|≤λ;
(ii) |τλ(z)−z|≤λ;
(iii) |τλ(z)|≤c0|y| for |z−y|≤λ and the constant c0>0.
In fact, many functions satisfy conditions (i)–(iii). For example, the soft thresholding function τλ(z)=sign(z)(|z|−λ)+, the adaptive lasso thresholding function τλ(z)=z(1−|λ/z|η) with η≥1 and the smoothly clipped absolute deviation thresholding rule proposed by Rothman et al.[18].
This paper considers the following class of covariance matrices introduced by [15]
Next, we define the thresholding generalized pilot estimator (˜Σ∗)τ=((˜σ∗uv)τ)p×p and consider its convergence rates in terms of probability under spectral and Frobenius norms respectively over the parametric space H(sn,p).
Let ˜Σ∗=(˜σ∗uv) be a generalized pilot estimator and define
where τλuv(⋅) is the thresholding function with
The constant δ will be specified in the proving process of Lemma 3.1.
The following lemma is useful for inferring Theorem 3.1 and Theorem 4.1.
Lemma 3.1. Suppose minuσuu≥γ>0, logp=o(n∗min) and Assumption 1.1 hold. Denote the events Q1,Q2 as
Then, for any L>0
(i) there exists C1(L)>0 such that
holds under the event Q1∩Q2.
(ii) P(Q1∩Q2)≥1−O(p−L).
Proof. (i) Under the event Q1, one knows
thanks to conditions (ii) and (iii) of Definition 3.1.
Define
where C0(L) is given in Definition 2.1. By (3.2) when the event Q2 happens as well (3.6) reduces to
According to (3.5) and (3.8), one obtains
under the event Q1∩Q2. Therefore,
holds due to n∗min≤n∗uv≤n. This reaches the conclusion (i) of Lemma 3.1.
(ii) In order to show P(Q1∩Q2)≥1−O(p−L), one first estimates P(Qc2).
Clearly,
and
Define the event
Since ˜Σ∗=(˜σ∗uv)p×p is a generalized pilot estimator of Σ=(σuv)p×p then one gets
By logp=o(n∗min) and n∗min≤n∗uv, one knows logp=o(n∗uv). Furthermore, it holds
under the event E because of minuσuu≥γ. Hence,
Next to estimate P(Qc1). One observes ˜σ∗uu˜σ∗vv≥σuuσvv−|˜σ∗uu−σuu||˜σ∗vv|−|˜σ∗vv−σvv||˜σ∗uu|−|˜σ∗uu−σuu||˜σ∗vv−σvv| and it follows that
holds true on the event E due to minuσuu≥γ and logp=o(n∗uv). Hence,
Let λuv=δ√˜σ∗uu˜σ∗vvlogp/n∗uv given by (3.2). It can be shown that
follows from (3.10).
Note that ˜Σ∗=(˜σ∗uv) is a generalized pilot estimator of Σ. Then, one derives
thanks to δ=√2C0(L)/γ defined in (3.7). Substituting the above result into (3.11) gives
Combining this with (3.9), one obtains the stated result
□
Finally, we give the upper bounds of ‖(˜Σ∗)τ−Σ‖2,F in terms of probability and ‖A‖2,F denotes the spectral and Frobenius norms of matrix A respectively.
Theorem 3.1. Suppose minuσuu≥γ>0, logp=o(n∗min) and Assumption 1.1 hold. Then,
Proof. (i) Define the event Q:=Q1∩Q2, where Q1,Q2 are given by (3.3) and (3.4) respectively. Then, it is easy to see
thanks to Lemma 3.1 and Σ∈H(sn,p).
Gersgorin theorem tells ‖(˜Σ∗)τ−Σ‖2≤‖(˜Σ∗)τ−Σ‖1 and this combining (3.12) implies
on the event Q.
On the other hand, Lemma 3.1 tells P(Q)≥1−O(p−L). Hence, Theorem 3.1(i) holds.
(ii) One observes
and it follows
on the event Q according to Lemma 3.1.
Note that maxvp∑u=1|auv|2≤(maxvp∑u=1|auv|)2. Then, (3.13) reduces to
as long as Σ∈H(sn,p).
Therefore, conclusion (ii) reaches since Lemma 3.1 says P(Q)≥1−O(p−L).
(iii) By maxuσuu≤M, one knows
Furthermore, it holds
under the event Q due to (3.13) and Σ∈H(sn,p).
Thus, the claim (iii) follows from Lemma 3.1 immediately.□
Remark 3.1. Theorem 3.1(i) generalizes the result of [15] which requires X is the sub-Gaussian random vector. In addition, if n∗min=n, Theorem 3.1(i) yields the result of [11] thanks to the parametric class H(sn,p) containing the class of sparse covariance matrices defined in [11].
Remark 3.2. From the proving process of Theorem 3.1(i), we find that
under the event Q.
Furthermore, let ‖A‖ω denote the matrix lω-operator norm of A, Lemma 7.2 in [19] tells
for any symmetric matrix A. Hence,
holds under the event Q.
Then, using Lemma 3.1 indicates
4.
Convergence rates in terms of expectation
This section studies the convergence rates of the thresholding generalized pilot estimator (˜Σ∗)τ in terms of expectation over H(sn,p).
We introduce the following technical lemma.
Lemma 4.1. Let minuσuu≥γ>0, logp=o(n∗min), p≥(n∗min)ξ(ξ>0), E|˜σ∗uv−σuv|2≤M and Assumption 1.1 holds. Then,
Where Q1,Q2 are defined by (3.3) and (3.4). x≲y denotes x≤cy with a absolute constant c>0.
Proof. Denote Q:=Q1∩Q2 and
Then, In,p≤sn,pP(Qc).
According to Lemma 3.1, one knows P(Qc)≤O(p−L) and In,p≲sn,pp−L. Taking L=ξ−1+3>0, one obtains
due to p≥(nmin)ξ. Hence, it follows that
which is the desired conclusion (i).
Similarly, the definition of H(sn,p) and Lemma 3.1 imply
Moreover, the above result combining (4.1) concludes (ii).
To show (iii), Hölder inequality tells
On the other hand, it holds
Furthermore, one obtains
due to the given condition E|˜σ∗uv−σuv|2≤M. Hence,
follows from P(Qc)≤O(p−L).
By L=ξ−1+3 and the assumptions p≥(n∗min)ξ, one finds
Substituting this into (4.2) implies the expected result (iii) holding.□
Theorem 4.1. Let (˜Σ∗)τ=((˜σ∗uv)τ)p×p given by (3.1), E|˜σ∗uv−σuv|2≤M, minuσuu≥γ>0 and Assumption 1.1 holds. If logp=o(n∗min), p≥(n∗min)ξ(ξ>0) and sn,p≳1 then
Proof. (i) Let the event Q:=Q1∩Q2 where Q1,Q2 are given by (3.3) and (3.4). Then, by Gersgorin theorem ‖(˜Σ∗)τ−Σ‖2≤‖(˜Σ∗)τ−Σ‖1 we have
Clearly, ‖(˜Σ∗)τ−Σ‖1:=maxv∑pu=1|τλuv(˜σ∗uv)−σuv| and it follows
under the event Q thanks to Lemma 3.1 and the definition of H(sn,p). Hence,
Then, we just need to show
for finishing the proof of (i).
According to condition (iii) of Definition 3.1, we obtain |τλuv(˜σ∗uv)|≤c0|˜σ∗uv| and
By |σuv|≤√σuuσvv and logp=o(n∗min), we know
due to n∗min≤n. Hence, it holds
and
Therefore, (4.3) follows from Lemma 4.1(i), (iii) and sn,p≳1. This reaches (i).
(ii) To show (ii), we observe
Clearly,
According to Lemma 3.1, we have
on the event Q. Furthermore,
holds due to the definition of H(sn,p).
Hence, it suffices to prove
By (4.4), we find
Substituting the above inequality into (4.5) leads to
Since √|a|+|b|≤√|a|+√|b| and (maxvp∑u=1|auv|2)12≤maxvp∑u=1|auv|, we obtain
Thus, (4.7) follows from Lemma 4.1(i), (iii) and sn,p≳1.
(iii) By maxuσuu≤M, we obtain
On the other hand, (4.6), (4.9) and Σ∈H(sn,p) tells
under the event Q. Therefore,
Using (4.8) and (4.9), we have
Hence, it holds
due to Lemma 4.1(ii), (iii) and sn,p≳1. Finally, conclusion (iii) follows from (4.10) to (4.11). □
Remark 4.1. The upper bound of Theorem 4.1(i) is optimal due to Proposition 3.1 in [15]. In addition, Theorem 4.1(i) performs better than Theorem 3.1 of [15] which requires X to be sub-Gaussian.
Remark 4.2. From the proving process of Theorem 4.1(i), we observe
Note that ‖A‖ω≤‖A‖1(1≤ω≤∞) for any symmetric matrix A. Then,
holds.
Remark 4.3. The condition
in Lemma 4.1 and Theorem 4.1 is mild. In fact, the generalized Huber estimator (Example 2.1) and generalized truncated mean estimator (Example 2.2) both satisfy (4.12). The details can be found in Appendix.
5.
Simulation studies
Let (˜Σ∗H)τ and (˜Σ∗T)τ be defined by (3.1) and (3.2), this section investigates the numerical properties and performances of the estimators (˜Σ∗H)τ, (˜Σ∗T)τ and compares these two estimators with the adaptive thresholding estimator ˆΣat proposed by [15]. The following two types of sparse covariance matrices are considered:
Model 1. (Rothman et al., [18]) Σ=(σuv)p×p with σuv=max{1−|u−v|/5,0}.
Model 2. (Cai and Zhang, [15]) Σ=Ip+(D+DT)/(‖D+DT‖2+0.01), where D=(duv)p×p is given by duu=0(u=1,⋯,p) and
Under each model we generate random samples Xi∈Rp(i=1,⋯,n) by two different scenarios:
(i) Xi are independently drawn from multivariate t-distribution tν(0,Σ) with freedom ν=4.5;
(ii) Xi are independently drawn from multivariate skewed t-distribution stν(0,Σ,ϵ) with freedom ν=5 and skew parameter ϵ=10.
In each simulation setting we adopt the following two cases of the missingness for the data matrix Y=(X1,⋯,Xn) with Xi=(Xi1,⋯,Xip)T which proposed by Cai and Zhang[15]. The first case is missing uniformly and completely at random(MUCR) in which every entries Xik are observed with probability 0<ρ≤1. The second case is missing not uniformly but completely at random(MCR) in which Y is divided into four equal-size parts,
where every entries of Y11,Y22 are observed with probability 0<ρ(1)≤1 every entries of Y12,Y21 are observed with probability 0<ρ(2)≤1.
Moreover, for each procedure we set p=50,200,300 and n=50,100,200 respectively and 50 replications are used. Meanwhile, we choose the soft thresholding rule and measure the errors by the spectral and Frobenius norm respectively in each setting. The tuning parameter in thresholding estimator is chosen by 10-fold cross-validation which is explained in Section 4 of Cai and Zhang [15], and unspecified tuning parameters in the generalized pilot estimator are chosen by the method suggested in Section 6 of Avella-Medina et al.[11].
Tables 1 and 2 demonstrate that thredsholding estimators (˜Σ∗H)τ and (˜Σ∗T)τ perform better than the adaptive thresholding estimator ˆΣat under both MUCR and MCR settings. Moreover, thresholding generalized Huber estimator (˜Σ∗H)τ outperforms thresholding generalized truncated mean estimator (˜Σ∗T)τ. We also find that the errors decrease if sample size n gets larger. Meanwhile, we observe that the errors under Model 1 is larger than under Model 2 since the covariance matrix in Model 1 is more dense than in Model 2. All these numerical results are consistent with our theoretical results.
6.
Conclusions
In this paper, we propose the generalized pilot estimator in the presence of incomplete heavy-tailed data. Moreover, two kinds of generalized pilot estimators are provided under the bounded fourth moment assumption while lots of previous studies hinged upon the sub-Gaussian condition. In addition, we establish the thresholding pilot estimator for a family of sparse covariance matrices and give the convergence rates in terms of probability and expectation respectively.
In the future, we may consider the compositional data with missing data under lower bounded moment assumption by referring Li et al. [20]. Moreover, we can adopt the different methods to estimate the sparse covariance matrix with incomplete data such as the proximal distance algorithm [21] or continuous matrix shrinkage [22].
Use of AI tools declaration
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
Acknowledgments
This paper is supported by the National Natural Science Foundation of China (No. 12171016).
Conflict of interest
All authors declare no conflicts of interest in this paper.
Appendix
In order to show Example 2.1 and Example 2.2 satisfying condition (4.12), we introduce a Proposition A.1.
Proposition A.1. Let max1≤u≤pE|Xu|4≤k4, EXu=μu, E(XuXv)=μuv, and Assumption 1.1 holds. If ˜μ∗u and ˜μ∗uv satisfy
where A,B⊆{1,⋯,n}. Then, ˜Σ∗=(˜σ∗uv)p×p=(˜μ∗uv−˜μ∗u˜μ∗v)p×p obeys (4.12).
Proof. It suffices to prove
By max1≤u≤pE|Xu|4≤k4, one knows E|Xu|≤(E|Xu|4)1/4≤k and
Thus, it holds
which reaches (A.3).
For (A.4), one observes
According to (A.1) and Jensen's inequality, it follows
Furthermore, upon combining Cauchy–Schwarz inequality leads to
Similarly, (A.2) implies
By Cauchy-Schwarz inequality and max1≤u≤pE|Xu|4≤k4, one finds
Hence,
Finally, the expected conclusion (A.4) follows from (A.5)–(A.7). This completes the proof of Proposition A.1.□
Now, based on Proposition A.1 we verify two kinds of generalized pilot estimators (Example 2.1 and Example 2.2) satisfying (4.12).
For the generalized truncated mean estimator ˜Σ∗T:=((˜μ∗T)uv−(˜μ∗T)u(˜μ∗T)v)p×p, it is easy to see that (˜μ∗T)u, (˜μ∗T)uv obey (A.1), (A.2) respectively.
By (2.14), we know
where Au={i:Siu≠0} and |Au|=n∗u. Similarly, it holds
with Auv={i:SiuSiv≠0} and |A_{uv}| = n_{uv}^*. The above two inequalities imply (\tilde{\mu}^*_{T})_{u}, \; (\tilde{\mu}^*_{T})_{uv} satisfying (A.1), (A.2) respectively.
In fact, it is hard to check the generalized Huber estimator
satisfying (4.12) due to the structures of (\tilde{\mu}^*_H)_u and (\tilde{\mu}^*_{H})_{uv} being unclear. But we can consider a special case.
Proposition A.2. Let A_u = \{i:S_{iu}\neq 0\}, \; A_{uv} = \{i:S_{iu}S_{iv}\neq 0\} . If \alpha_u, \; \alpha_{uv} defined in (2.9) and (2.10) obey
respectively. Then, (\tilde{\mu}^*_H)_u, \; (\tilde{\mu}^*_{H})_{uv} satisfy (A.1), (A.2).
Proof. For i\in A_u , it holds
Obviously, (2.9) is equivalent to
By the definition of \psi_{\alpha_u}(x) , we have
Note that \sum\limits_{i\in A_u}\psi_{\alpha_u}(X_{iu}-(\tilde{\mu}_H^*)_u) is the continuous and decreasing function about (\tilde{\mu}_H^*)_u . Then, the solution of equation \sum\limits_{i\in A_u}\psi_{\alpha_u}(X_{iu}-(\tilde{\mu}_H^*)) = 0 belongs to the interval (\max\limits_{i\in A_u}X_{iu}-\alpha_u, \min\limits_{i\in A_u}X_{iu}+\alpha_u) .
Hence, we obtain \max\limits_{\{i\in A_u\}}X_{iu}-\alpha_u < (\tilde{\mu}_H^*)_u < \min\limits_{\{i\in A_u\}}X_{iu}+\alpha_u and
Furthermore, the above inequality and definition of \psi_{\alpha_u}(x) implies
Therefore, (\tilde{\mu}_H^*)_u = (n_u^*)^{-1}\sum_{i\in A_u}X_{iu} satisfies (A.1).
Following the similar discussion, we can derive (\tilde{\mu}_H^*)_{uv} satisfying (A.2) with
□
In fact, the condition in Proposition A.2 is easy to satisfy, since \log p = o(n_{\min}^*) and n_{\min}^*\leq n_{uv}^*\leq n_{u}^* lead to large enough \alpha_{u} and \alpha_{uv}.