Processing math: 85%
Research article

Pilot estimators for a kind of sparse covariance matrices with incomplete heavy-tailed data

  • Received: 22 March 2023 Revised: 26 June 2023 Accepted: 28 June 2023 Published: 05 July 2023
  • MSC : 62H12, 62J10

  • This paper investigates generalized pilot estimators of covariance matrix in the presence of missing data. When the random samples have only bounded fourth moment, two kinds of generalized pilot estimators are provided, the generalized Huber estimator and the generalized truncated mean estimator. In addition, we construct thresholding generalized pilot estimator for a kind of sparse covariance matrices and establish the convergence rates in terms of probability under spectral and Frobenius norms respectively. Moreover, the convergence rates in sense of expectation are also given under an extra condition. Finally, simulation studies are conducted to demonstrate the superiority of our method.

    Citation: Huimin Li, Jinru Wang. Pilot estimators for a kind of sparse covariance matrices with incomplete heavy-tailed data[J]. AIMS Mathematics, 2023, 8(9): 21439-21462. doi: 10.3934/math.20231092

    Related Papers:

    [1] Ruiping Wen, Wenwei Li . An accelerated alternating directional method with non-monotone technique for matrix recovery. AIMS Mathematics, 2023, 8(6): 14047-14063. doi: 10.3934/math.2023718
    [2] Yanting Xiao, Wanying Dong . Robust estimation for varying-coefficient partially linear measurement error model with auxiliary instrumental variables. AIMS Mathematics, 2023, 8(8): 18373-18391. doi: 10.3934/math.2023934
    [3] Yanfang Zhang, Fuchang Wang, Yibin Zhao . Statistical characteristics of earthquake magnitude based on the composite model. AIMS Mathematics, 2024, 9(1): 607-624. doi: 10.3934/math.2024032
    [4] Chen Chen, Xiangbing Chen, Yi Ai . Convex-structured covariance estimation via the entropy loss under the majorization-minimization algorithm framework. AIMS Mathematics, 2024, 9(6): 14253-14273. doi: 10.3934/math.2024692
    [5] Kannat Na Bangchang . Application of Bayesian variable selection in logistic regression model. AIMS Mathematics, 2024, 9(5): 13336-13345. doi: 10.3934/math.2024650
    [6] Xianbin Chen, Juliang Yin . Simultaneous variable selection and estimation for longitudinal ordinal data with a diverging number of covariates. AIMS Mathematics, 2022, 7(4): 7199-7211. doi: 10.3934/math.2022402
    [7] Hamid Reza Safaeyan, Karim Zare, Mohamadreza Mahmoudi, Mohsen Maleki, Amir Mosavi . A Bayesian approach on asymmetric heavy tailed mixture of factor analyzer. AIMS Mathematics, 2024, 9(6): 15837-15856. doi: 10.3934/math.2024765
    [8] Yinlan Chen, Lina Liu . A direct method for updating piezoelectric smart structural models based on measured modal data. AIMS Mathematics, 2023, 8(10): 25262-25274. doi: 10.3934/math.20231288
    [9] Yongge Tian . An effective treatment of adding-up restrictions in the inference of a general linear model. AIMS Mathematics, 2023, 8(7): 15189-15200. doi: 10.3934/math.2023775
    [10] Jairo A. Angel, Francisco M.M. Rocha, Jorge I. Vélez, Julio M. Singer . A new test for detecting specification errors in Gaussian linear mixed-effects models. AIMS Mathematics, 2024, 9(11): 30710-30727. doi: 10.3934/math.20241483
  • This paper investigates generalized pilot estimators of covariance matrix in the presence of missing data. When the random samples have only bounded fourth moment, two kinds of generalized pilot estimators are provided, the generalized Huber estimator and the generalized truncated mean estimator. In addition, we construct thresholding generalized pilot estimator for a kind of sparse covariance matrices and establish the convergence rates in terms of probability under spectral and Frobenius norms respectively. Moreover, the convergence rates in sense of expectation are also given under an extra condition. Finally, simulation studies are conducted to demonstrate the superiority of our method.



    Let X be a p-dimensional random vector. Estimating its covariance matrix Σ=(σuv)p×p is of interest in high-dimensional statistics (Mendelson and Zhivotovskiy [1], Dendramis et al. [2] and Zhang et al. [3]). Until now, a commonly adopted strategy for evaluating the covariance matrix has been to impose sparse structure on itself (Belomestny[4], Kang and Deng[5], Bettache et al.[6] and Liang et al. [7]).

    If X is sub-Gaussian, Bickel and Levina[8], Cai and Liu[9] and Cai and Zhou[10] considered all the rows or columns of the covariance matrix belonging to lq-ball, weighted lq-ball or weak lq-ball as a kind of sparse assumption. Moreover, they proposed the corresponding thresholding estimators and established the convergence rates in sense of probability or expectation respectively.

    When each component of X=(X1,,Xp)T is subject to heavy-tailed distribution, i.e., the distribution of Xu satisfies RetxdFu(x)= for t>0, Avella-Medina et al.[11] introduced a pilot estimator ˜Σ=(˜σuv)p×p satisfying

    P{|˜σuvσuv|C0(logp)/n,1u,vp}εn,p

    for positive constant C0 and logp=o(n). Where εn,p is a deterministic positive sequence and satisfies limn,pεn,p=0. Avella-Medina et al. pointed out the sample covariance matrix

    ˆΣ=1nnk=1(XkˉX)(XkˉX)TwithˉX=1nnk=1Xk

    must be a pilot estimator if X1,,Xn are i.i.d. sub-Gaussian random samples. In addition, some other pilot estimators were provided under bounded fourth moment assumption. The authors also considered convergence rate of the thresholding pilot estimator in terms of probability when the rows or columns of the covariance matrix are in weighted lq-ball.

    However, missing data (also called incomplete data) always occurs in high-dimensional sampling setting, see Hawkins et al.[12], Lounici[13] and Loh and Wain-wright[14]. Instead of obtaining whole i.i.d. samples X1,,Xn, one can only collect some parts of them. Let the vector Si{0,1}p(i=1,,n) denote by

    Siu={1,ifXiuis observed;0,ifXiuis missing,

    where Xiu and Siu are the u-th coordinate of the Xi and Si respectively. This paper denotes the samples with missing values by Xi=(Xi1,,Xip)T where Xiu=XiuSiu. The following missing mechanism introduced by Cai and Zhang[15] is adopted.

    Assumption 1.1 (Missing completely at random). S={S1,,Sn} can be either deterministic or random and is independent of X={X1,,Xn}.

    Define

    nuv=ni=1SiuSiv,

    i.e., nuv is the number of the u-th and v-th entries of Xi being both observed. For convenience, let

    nu=nuu,nmin=minu,vnuv.

    Then, it is easy to see

    nminnuvmin{nu,nv}n.

    Meanwhile, the generalized sample mean ˉX=(ˉXu)1up is defined by

    ˉXu=1nuni=1XiuSiu

    and the generalized sample covariance matrix ˆΣ=(ˆσuv) is given by

    ˆσuv=1nuvni=1(XiuˉXu)(XivˉXv)SiuSiv. (1.1)

    Our goal is to construct the thresholding estimator of the sparse covariance matrix Σ based on incomplete heavy-tailed data. Furthermore, the convergence rates of the thresholding estimator are investigated in terms of probability and expectation respectively.

    The rest of paper is organized as follows. Section 2 introduces the definition of generalized pilot estimator based on missing data. Then under bounded fourth moment assumption, two kinds of generalized pilot estimators are given. In Section 3, we construct the thresholding generalized pilot estimator and explore its convergence rates in sense of probability under spectral and Frobenius norms respectively. In Section 4, the convergence rates are given in terms of expectation under an extra mild condition. Then, Section 5 investigates the numerical performances of the thresholding generalized Huber pilot estimator and thresholding generalized truncated mean pilot estimator respectively and compares these two estimators with the adaptive thresholding estimator proposed by Cai and Zhang [15].

    Definition 2.1. Any symmetric matrix ˜Σ=(˜σuv)p×p based on incomplete data X1,,Xn is said to be a generalized pilot estimator of Σ, if for L>0 there exists constant C0(L) such that

    P{|˜σuvσuv|C0(L)(logp)/nuv,1u,vp}=O(pL) (2.1)

    holds with logp=o(nmin).

    Remark 2.1. If one can obtain complete data, the generalized pilot estimator defined by (2.1) coincides with the pilot estimator proposed by Avella-Medina et al.[11] except εn,p is replaced by O(pL).

    Remark 2.2. If X is a sub-Gaussian random vector and the items σuu(u=1,p) of Σ are uniformly bounded, the generalized sample covariance matrix ˆΣ given by (1.1) must be a generalized pilot estimator of Σ.

    In fact, Theorem 3.1 in [15] tells that for any 0<x1 there exists constants C,c>0 such that

    P{|ˆσuvσuv|xσuuσvv}Cexp(cnuvx2). (2.2)

    By nminnuv and logp=o(nmin), one knows logp=o(nuv). If x=(2+L)logp/(cnuv) with L>0, (2.2) reduces to

    P{|ˆσuvσuv|C0(L)(logp)/nuv}Cp(L+2)

    with C0(L)=(2+L)σuuσvv/c.

    Furthermore,

    P{|ˆσuvσuv|C0(L)(logp)/nuv,1u,vp}CpL.

    Therefore, ˆΣ given by (1.1) is a generalized pilot estimator of Σ.

    We introduce the following theorem in order to provide two other kinds of generalized pilot estimators under bounded fourth moment assumption.

    Theorem 2.1. Suppose max1upE|Xu|4k4, logp=o(nmin), EXu=μu,E(XuXv)=μuv, and Assumption 1.1 holds. If ˜μu and ˜μuv satisfy

    (i)P{|˜μuμu|>ck((2+L)logp)/nu}=O(p(2+L)); (2.3)
    (ii)P{|˜μuvμuv|>ck2((2+L)logp)/nuv}=O(p(2+L)) (2.4)

    with absolute constants L>0 and c1 then ˜Σ=(˜σuv)p×p:=(˜μuv˜μu˜μv)p×p must be a generalized pilot estimator of Σ.

    Proof. Let K:=ck2+L. Thus,

    P{|˜μuμu|K(logp)/nu,1up}=O(p(1+L)) (2.5)

    thanks to condition (2.3). Moreover,

    P{|(˜μuμu)(˜μvμv)|K2(logp)/nunv,1u,vp}=O(p(1+L)). (2.6)

    Similarly, one derives

    P{|˜μuvμuv|K2(logp)/nuv,1u,vp}=O(pL) (2.7)

    due to (2.4) and c1.

    By max1upE|Xu|4k4, one obtains |μu|(E|Xu|4)1/4k(u=1,,p) and

    |˜μu˜μvμuμv||μv(˜μuμu)|+|μu(˜μvμv)|+|(˜μuμu)(˜μvμv)|k(|˜μuμu|+|˜μvμv|)+|(˜μuμu)(˜μvμv)|c1K(|˜μuμu|+|˜μvμv|)+|(˜μuμu)(˜μvμv)|,

    the above last inequality follows from Kck. Thus, one concludes

    P{|˜μu˜μvμuμv|>c1K2((logp)/nu+(logp)/nv)+K2(logp)/nunv,1u,vp}=O(p(1+L))

    thanks to (2.5) and (2.6).

    Since nuvmin{nu,nv}, the above result reduces to

    P{|˜μu˜μvμuμv|>2c1K2(logp)/nuv+K2(logp)/nuv,1u,vp}=O(p(1+L)).

    Furthermore, according to logp=o(nmin) and nminnuv, one knows logp=o(nuv). Therefore, (logp)/nuv((logp)/nuv)1/2 and

    P{|˜μu˜μvμuμv|>(2c1+1)K2(logp)/nuv,1u,vp}=O(p(1+L)) (2.8)

    hold.

    Note that

    |˜σuvσuv|=|(˜μuv˜μu˜μv)(μuvμuμv)||˜μuvμuv|+|˜μu˜μvμuμv|.

    Then,

    P{|˜σuvσuv|>2(c1+1)K2(logp)/nuv,1u,vp}=O(pL)

    follows from (2.7) and (2.8).

    Hence, ˜Σ is a generalized pilot estimator of Σ with C0(L)=2ck2(1+c)(2+L).

    We shall give two generalized pilot estimators based on incomplete heavy-tailed samples.

    Denote Huber function by

    ψα(x)=αψ(xα),

    where α>0 and ψ(x)={x,|x|1,signx,|x|>1. For any constant L>0, let (˜μH)u(u=1,,p) satisfy

    ni=1ψαu(Xiu(˜μH)u)Siu=0 (2.9)

    with αu:=nuζ2/(2+L)logp and ζDXu. Similarly, (˜μH)uv(u,v=1,,p) satisfies

    ni=1ψαuv(XiuXiv(˜μH)uv)SiuSiv=0 (2.10)

    with αuv:=nuvζ21/(2+L)logp and ζ1D(XuXv). Then, we have the following estimator.

    Example 2.1. (Generalized Huber estimator). Suppose conditions of Theorem 2.1 hold, then ˜ΣH:=((˜μH)uv(˜μH)u(˜μH)v)p×p is a generalized pilot estimator of Σ, where (˜μH)j(j=u,v) and (˜μH)uv are defined by (2.9) and (2.10).

    Proof. With the definition of Xiu, (2.9) is equivalent to

    iAuψαu(Xiu(˜μH)u)=0, (2.11)

    where Au={i:Siu0}. Obviously, |Au|=ni=1Siu. By the definition of nu, we have |Au|=nu.

    Similarly, we find (2.10) is equivalent to

    iAuvψαuv(XiuXiv(˜μH)uv)=0 (2.12)

    with Auv={i:SiuSiv0} and |Auv|=nuv.

    By max1upE|Xu|4k4, we get

    DXuE|Xu|2(E|Xu|4)1/2k2.

    On the other hand,

    D(XuXv)E|XuXv|2(E|Xu|4E|Xv|4)1/2k4

    due to Cauchy-Schwarz inequality. Thus,

    αu=nuk2/(2+L)logp,αuv=nuvk4/(2+L)logp. (2.13)

    Obviously, it holds

    (2+L)logpnu(2+L)logpnmin<18,(2+L)logpnuv(2+L)logpnmin<18

    thanks to nminnuvnu and logp=o(nmin).

    According to (2.11)–(2.13) and Theorem 5 in [16], we know that if (2+L)logp/nu1/8 and (2+L)logp/nuv1/8 then

    P{|(˜μH)uμu|>4k((2+L)logp)/nu}=O(p(2+L)),P{|(˜μH)uvμuv|>4k2((2+L)logp)/nuv}=O(p(2+L)),

    i.e., (˜μH)u and (˜μH)uv reach the expected results (2.3) and (2.4).

    In order to give another generalized pilot estimator, let (˜μT)u(u=1,,p),(˜μT)uv(u,v=1,,p) be defined by

    (˜μT)u:=1nuni=1Xiu1{|Xiu|βnu(2+L)logp}, (2.14)
    (˜μT)uv:=1nuvni=1XiuXiv1{|XiuXiv|β1nuv(2+L)logp} (2.15)

    respectively where L>0, βE|Xu|2 and β1E|XuXv|2. Then, we have the second estimator.

    Example 2.2 (Generalized truncated mean estimator). Suppose conditions of Theorem 2.1 hold. Then, ˜ΣT:=((˜μT)uv(˜μT)u(˜μT)v)p×p is a generalized pilot estimator of Σ, where (˜μT)j(j=u,v) and (˜μT)uv are defined by (2.14) and (2.15).

    Proof. We first show (˜μT)u satisfies (2.3). According to max1upE|Xu|4k4, we have

    E|Xu|2(E|Xu|4)1/2k2.

    So, (2.14) is equivalent to

    (˜μT)u=1nuiAuXiu1{|Xiu|knu(2+L)logp}

    where Au={i:Siu0}. Let a:=knu/(2+L)logp. We derive

    |(˜μT)uμu|=|1nuiAuXiu1{|Xiu|a}1nuiAuEXu|=|1nuiAu(Xiu1{|Xiu|a}E(Xiu1{|Xiu|a}))1nuiAuE(Xiu1{|Xiu|>a})|.

    Therefore, upon combining E|Xu|2k2 and |Au|=nu

    |(˜μT)uμu||1nuiAu(Xiu1{|Xiu|a}E(Xiu1{|Xiu|a}))|+|1nuiAuk2a|=|1nuiAu(Xiu1{|Xiu|a}E(Xiu1{|Xiu|a}))|+k2a. (2.16)

    According to E(X2iu1{|Xiu|a})k2 and Bernstein's inequality in [17],

    P{|1nuiAu(Xiu1{|Xiu|a}E(Xiu1{|Xiu|a}))|k2tnu+at3nu}12exp(t) (2.17)

    for any t>0.

    By (2.16) and (2.17) and taking t=(2+L)logp, we have

    P{|(˜μT)uμu|k2(2+L)logpnu+a(2+L)logp3nu+k2a}12p(2+L).

    Substituting a=knu/(2+L)logp into the above inequality, we know

    P{|(˜μT)uμu|>4k((2+L)logp)/nu}=O(p(2+L))

    which is the expected condition (2.3) of Theorem 2.1.

    Similarly, we can derive

    P{|(˜μT)uvμuv|>4k2((2+L)logp)/nuv}=O(p(2+L))

    i.e., the condition (2.4) of Theorem 2.1 holds.

    We introduce the thresholding function and the space of sparse covariance matrices.

    Definition 3.1. For any constant λ>0, a real valued function τλ() is said to be thresholding function if

    (i) τλ(z)=0,|z|λ;

    (ii) |τλ(z)z|λ;

    (iii) |τλ(z)|c0|y| for |zy|λ and the constant c0>0.

    In fact, many functions satisfy conditions (i)–(iii). For example, the soft thresholding function τλ(z)=sign(z)(|z|λ)+, the adaptive lasso thresholding function τλ(z)=z(1|λ/z|η) with η1 and the smoothly clipped absolute deviation thresholding rule proposed by Rothman et al.[18].

    This paper considers the following class of covariance matrices introduced by [15]

    H(sn,p):={Σ=(σuv)p×p>0:maxvpu=1min{σuuσvv,|σuv|(logp)/n}sn,p}.

    Next, we define the thresholding generalized pilot estimator (˜Σ)τ=((˜σuv)τ)p×p and consider its convergence rates in terms of probability under spectral and Frobenius norms respectively over the parametric space H(sn,p).

    Let ˜Σ=(˜σuv) be a generalized pilot estimator and define

    (˜σuv)τ:=τλuv(˜σuv), (3.1)

    where τλuv() is the thresholding function with

    λuv=δ˜σuu˜σvvlogpnuv. (3.2)

    The constant δ will be specified in the proving process of Lemma 3.1.

    The following lemma is useful for inferring Theorem 3.1 and Theorem 4.1.

    Lemma 3.1. Suppose minuσuuγ>0, logp=o(nmin) and Assumption 1.1 hold. Denote the events Q1,Q2 as

    Q1:={|˜σuvσuv|λuv,1u,vp}, (3.3)
    Q2:={˜σuu˜σvv2σuuσvv,1u,vp}. (3.4)

    Then, for any L>0

    (i) there exists C1(L)>0 such that

    |τλuv(˜σuv)σuv|C1(L)logpnminmin{σuuσvv,|σuv|(logp)/n},1u,vp

    holds under the event Q1Q2.

    (ii) P(Q1Q2)1O(pL).

    Proof. (i) Under the event Q1, one knows

    |τλuv(˜σuv)σuv|c0|σuv|, (3.5)
    |τλuv(˜σuv)σuv||τλuv(˜σuv)˜σuv|+|˜σuvσuv|2λuv (3.6)

    thanks to conditions (ii) and (iii) of Definition 3.1.

    Define

    δ:=2C0(L)γ (3.7)

    where C0(L) is given in Definition 2.1. By (3.2) when the event Q2 happens as well (3.6) reduces to

    |τλuv(˜σuv)σuv|2δ˜σuu˜σvvlogpnuvC(L)σuuσvvlogpnuv. (3.8)

    According to (3.5) and (3.8), one obtains

    |τλuv(˜σuv)σuv|min{c0|σuv|,C(L)σuuσvvlogpnuv}C1(L)logpnuvmin{σuuσvv,|σuv|(logp)/nuv}

    under the event Q1Q2. Therefore,

    |τλuv(˜σuv)σuv|C1(L)logpnminmin{σuuσvv,|σuv|(logp)/n}

    holds due to nminnuvn. This reaches the conclusion (i) of Lemma 3.1.

    (ii) In order to show P(Q1Q2)1O(pL), one first estimates P(Qc2).

    Clearly,

    ˜σuu˜σvvσuuσvv=(˜σuuσuu)˜σvv+(˜σvvσvv)˜σuu(˜σuuσuu)(˜σvvσvv)

    and

    ˜σuu˜σvvσuuσvv+|˜σuuσuu|˜σvv+|˜σvvσvv||˜σuu|+|˜σuuσuu||˜σvvσvv|.

    Define the event

    E:={|˜σuvσuv|C0(L)(logp)/nuv,1u,vp}.

    Since ˜Σ=(˜σuv)p×p is a generalized pilot estimator of Σ=(σuv)p×p then one gets

    P(E)=1O(pL).

    By logp=o(nmin) and nminnuv, one knows logp=o(nuv). Furthermore, it holds

    ˜σuu˜σvvσuuσvv+γ4(σvv+γ2)+γ4(σuu+γ2)+γ24σuuσvv+σuuσvv2+γ222σuuσvv

    under the event E because of minuσuuγ. Hence,

    P(Qc2)P(Ec)=O(pL). (3.9)

    Next to estimate P(Qc1). One observes ˜σuu˜σvvσuuσvv|˜σuuσuu||˜σvv||˜σvvσvv||˜σuu||˜σuuσuu||˜σvvσvv| and it follows that

    ˜σuu˜σvvσuuσvvγ8(σvv+γ2)γ8(σuu+γ2)γ2834σuuσvvγ24γ22

    holds true on the event E due to minuσuuγ and logp=o(nuv). Hence,

    P{˜σuu˜σvvγ22,1u,vp}P(E)=1O(pL). (3.10)

    Let λuv=δ˜σuu˜σvvlogp/nuv given by (3.2). It can be shown that

    P(Qc1)=P{|˜σuvσuv|˜σuu˜σvv>δlogpnuv,1u,vp}P{|˜σuvσuv∣>δγ2logp2nuv,1u,vp}+O(pL) (3.11)

    follows from (3.10).

    Note that ˜Σ=(˜σuv) is a generalized pilot estimator of Σ. Then, one derives

    P{|˜σuvσuv∣>δγ2logp2nuv,1u,vp}=O(pL)

    thanks to δ=2C0(L)/γ defined in (3.7). Substituting the above result into (3.11) gives

    P(Qc1)=O(pL).

    Combining this with (3.9), one obtains the stated result

    P(Q1Q2)1P(Qc1)P(Qc2)1O(pL).

    Finally, we give the upper bounds of (˜Σ)τΣ2,F in terms of probability and A2,F denotes the spectral and Frobenius norms of matrix A respectively.

    Theorem 3.1. Suppose minuσuuγ>0, logp=o(nmin) and Assumption 1.1 hold. Then,

    (i)infΣH(sn,p)P{(˜Σ)τΣ2C1(L)sn,plogpnmin}1O(pL);(ii)infΣH(sn,p)P{1p(˜Σ)τΣFC1(L)sn,plogpnmin}1O(pL);(iii)infΣH(sn,p)maxuσuuMP{1p(˜Σ)τΣFC1(L)Msn,plogpnmin}1O(pL).

    Proof. (i) Define the event Q:=Q1Q2, where Q1,Q2 are given by (3.3) and (3.4) respectively. Then, it is easy to see

    (˜Σ)τΣ1:=maxvpu=1|τλuv(˜σuv)σuv|C1(L)sn,plogpnmin (3.12)

    thanks to Lemma 3.1 and ΣH(sn,p).

    Gersgorin theorem tells (˜Σ)τΣ2(˜Σ)τΣ1 and this combining (3.12) implies

    (˜Σ)τΣ2C1(L)sn,plogpnmin

    on the event Q.

    On the other hand, Lemma 3.1 tells P(Q)1O(pL). Hence, Theorem 3.1(i) holds.

    (ii) One observes

    1p(˜Σ)τΣ2F=1ppv=1pu=1|τλuv(˜σuv)σuv|2maxvpu=1|τλuv(˜σuv)σuv|2

    and it follows

    1p(˜Σ)τΣ2F(C1(L))2logpnminmaxvpu=1(min{σuuσvv,|σuv|(logp)/n})2 (3.13)

    on the event Q according to Lemma 3.1.

    Note that maxvpu=1|auv|2(maxvpu=1|auv|)2. Then, (3.13) reduces to

    1p(˜Σ)τΣ2F(C1(L))2logpnmin(maxvpu=1min{σuuσvv,|σuv|(logp)/n})2(C1(L))2s2n,plogpnmin

    as long as ΣH(sn,p).

    Therefore, conclusion (ii) reaches since Lemma 3.1 says P(Q)1O(pL).

    (iii) By maxuσuuM, one knows

    min{σuuσvv,|σuv|(logp)/n}M.

    Furthermore, it holds

    1p(˜Σ)τΣ2FM(C1(L))2logpnminmaxvpu=1min{σuuσvv,|σuv|(logp)/n}M(C1(L))2sn,plogpnmin

    under the event Q due to (3.13) and ΣH(sn,p).

    Thus, the claim (iii) follows from Lemma 3.1 immediately.

    Remark 3.1. Theorem 3.1(i) generalizes the result of [15] which requires X is the sub-Gaussian random vector. In addition, if nmin=n, Theorem 3.1(i) yields the result of [11] thanks to the parametric class H(sn,p) containing the class of sparse covariance matrices defined in [11].

    Remark 3.2. From the proving process of Theorem 3.1(i), we find that

    (˜Σ)τΣ1C1(L)sn,plogpnmin

    under the event Q.

    Furthermore, let Aω denote the matrix lω-operator norm of A, Lemma 7.2 in [19] tells

    AωA1(1ω)

    for any symmetric matrix A. Hence,

    (˜Σ)τΣωC1(L)sn,plogpnmin

    holds under the event Q.

    Then, using Lemma 3.1 indicates

    infΣH(sn,p)P{(˜Σ)τΣωC1(L)sn,plogpnmin}1O(pL).

    This section studies the convergence rates of the thresholding generalized pilot estimator (˜Σ)τ in terms of expectation over H(sn,p).

    We introduce the following technical lemma.

    Lemma 4.1. Let minuσuuγ>0, logp=o(nmin), p(nmin)ξ(ξ>0), E|˜σuvσuv|2M and Assumption 1.1 holds. Then,

    (i)supΣH(sn,p)(Q1Q2)cmaxvpu=1min{σuuσvv,|σuv|(logp)/n}dPsn,plogpnmin;(ii)supΣH(sn,p)(Q1Q2)c(maxvpu=1min{σuuσvv,|σuv|(logp)/n})12dPsn,plogpnmin;(iii)supΣH(sn,p)(Q1Q2)cmaxvpu=1|˜σuvσuv|dPlogpnmin.

    Where Q1,Q2 are defined by (3.3) and (3.4). xy denotes xcy with a absolute constant c>0.

    Proof. Denote Q:=Q1Q2 and

    In,p:=supΣH(sn,p)Qcmaxvpu=1min{σuuσvv,|σuv|(logp)/n}dP.

    Then, In,psn,pP(Qc).

    According to Lemma 3.1, one knows P(Qc)O(pL) and In,psn,ppL. Taking L=ξ1+3>0, one obtains

    pL(nmin)Lξ(nmin)1(logp)/nmin (4.1)

    due to p(nmin)ξ. Hence, it follows that

    In,psn,plogpnmin,

    which is the desired conclusion (i).

    Similarly, the definition of H(sn,p) and Lemma 3.1 imply

    supΣH(sn,p)Qc(maxvpu=1min{σuuσvv,|σuv|(logp)/n})1/2dPsn,ppL.

    Moreover, the above result combining (4.1) concludes (ii).

    To show (iii), Hölder inequality tells

    supΣH(sn,p)Qcmaxvpu=1|˜σuvσuv|dP=supΣH(sn,p)E{(maxvpu=1|˜σuvσuv|)I(Qc)}supΣH(sn,p){E(maxvpu=1|˜σuvσuv|)2}1/2{P(Qc)}1/2.

    On the other hand, it holds

    (maxvpu=1|˜σuvσuv|)2ppu,v=1|˜σuvσuv|2.

    Furthermore, one obtains

    E(maxvpu=1|˜σuvσuv|)2p3

    due to the given condition E|˜σuvσuv|2M. Hence,

    supΣH(sn,p)Qcmaxvpu=1|˜σuvσuv|dPp32{P(Qc)}12p3L2 (4.2)

    follows from P(Qc)O(pL).

    By L=ξ1+3 and the assumptions p(nmin)ξ, one finds

    p3L2=p12ξ(nmin)12logpnmin.

    Substituting this into (4.2) implies the expected result (iii) holding.

    Theorem 4.1. Let (˜Σ)τ=((˜σuv)τ)p×p given by (3.1), E|˜σuvσuv|2M, minuσuuγ>0 and Assumption 1.1 holds. If logp=o(nmin), p(nmin)ξ(ξ>0) and sn,p1 then

    (i)supΣH(sn,p)E(˜Σ)τΣ2sn,plogpnmin;(ii)supΣH(sn,p)E1p(˜Σ)τΣFsn,plogpnmin;(iii)supΣH(sn,p)maxuσuuME1p(˜Σ)τΣFsn,plogpnmin.

    Proof. (i) Let the event Q:=Q1Q2 where Q1,Q2 are given by (3.3) and (3.4). Then, by Gersgorin theorem (˜Σ)τΣ2(˜Σ)τΣ1 we have

    supΣH(sn,p)E(˜Σ)τΣ2supΣH(sn,p)Q(˜Σ)τΣ1dP+supΣH(sn,p)Qc(˜Σ)τΣ1dP.

    Clearly, (˜Σ)τΣ1:=maxvpu=1|τλuv(˜σuv)σuv| and it follows

    (˜Σ)τΣ1sn,plogpnmin

    under the event Q thanks to Lemma 3.1 and the definition of H(sn,p). Hence,

    supΣH(sn,p)Q(˜Σ)τΣ1dPsn,plogpnmin.

    Then, we just need to show

    Jn,p:=supΣH(sn,p)Qc(˜Σ)τΣ1dPsn,plogpnmin (4.3)

    for finishing the proof of (i).

    According to condition (iii) of Definition 3.1, we obtain |τλuv(˜σuv)|c0|˜σuv| and

    |τλuv(˜σuv)σuv||τλuv(˜σuv)|+|σuv|c0|˜σuv|+|σuv|c0|˜σuvσuv|+(c0+1)|σuv|.

    By |σuv|σuuσvv and logp=o(nmin), we know

    |σuv|min{σuuσvv,|σuv|(logp)/nmin}min{σuuσvv,|σuv|(logp)/n}

    due to nminn. Hence, it holds

    |τλuv(˜σuv)σuv|c0|˜σuvσuv|+(c0+1)min{σuuσvv,|σuv|(logp)/n} (4.4)

    and

    Jn,psupΣH(sn,p)Qcmaxvpu=1|˜σuvσuv|dP+supΣH(sn,p)Qcmaxvpu=1min{σuuσvv,|σuv|(logp)/n}dP.

    Therefore, (4.3) follows from Lemma 4.1(i), (iii) and sn,p1. This reaches (i).

    (ii) To show (ii), we observe

    supΣH(sn,p)E1p(˜Σ)τΣFsupΣH(sn,p)Q1p(˜Σ)τΣFdP+supΣH(sn,p)Qc1p(˜Σ)τΣFdP.

    Clearly,

    1p(˜Σ)τΣ2F=1ppv=1pu=1|τλuv(˜σuv)σuv|2maxvpu=1|τλuv(˜σuv)σuv|2. (4.5)

    According to Lemma 3.1, we have

    1p(˜Σ)τΣ2Flogpnminmaxvpu=1(min{σuuσvv,|σuv|(logp)/n})2 (4.6)

    on the event Q. Furthermore,

    supΣH(sn,p)Q1p(˜Σ)τΣFdPsupΣH(sn,p)Q{logpnmin(maxvpu=1min{σuuσvv,|σuv|(logp)/n})2}1/2dPsn,plogpnmin

    holds due to the definition of H(sn,p).

    Hence, it suffices to prove

    supΣH(sn,p)Qc1p(˜Σ)τΣFdPsn,plogpnmin. (4.7)

    By (4.4), we find

    |τλuv(˜σuv)σuv|2|˜σuvσuv|2+(min{σuuσvv,|σuv|(logp)/n})2.

    Substituting the above inequality into (4.5) leads to

    1p(˜Σ)τΣ2Fmaxvpu=1|˜σuvσuv|2+maxvpu=1(min{σuuσvv,|σuv|(logp)/n})2. (4.8)

    Since |a|+|b||a|+|b| and (maxvpu=1|auv|2)12maxvpu=1|auv|, we obtain

    1p(˜Σ)τΣFmaxvpu=1|˜σuvσuv|+maxvpu=1min{σuuσvv,|σuv|(logp)/n}.

    Thus, (4.7) follows from Lemma 4.1(i), (iii) and sn,p1.

    (iii) By maxuσuuM, we obtain

    min{σuuσvv,|σuv|(logp)/n}M. (4.9)

    On the other hand, (4.6), (4.9) and ΣH(sn,p) tells

    1p(˜Σ)τΣ2Flogpnminmaxvpu=1min{σuuσvv,|σuv|(logp)/n}sn,plogpnmin

    under the event Q. Therefore,

    supΣH(sn,p)maxuσuuMQ1p(˜Σ)τΣFdPsn,plogpnmin. (4.10)

    Using (4.8) and (4.9), we have

    1p(˜Σ)τΣF(maxvpu=1min{σuuσvv,|σuv|(logp)/n})1/2+(maxvpu=1|˜σuvσuv|2)1/2(maxvpu=1min{σuuσvv,|σuv|(logp)/n})1/2+maxvpu=1|˜σuvσuv|.

    Hence, it holds

    supΣH(sn,p)maxuσuuMQc1p(˜Σ)τΣFdPsn,plogpnmin (4.11)

    due to Lemma 4.1(ii), (iii) and sn,p1. Finally, conclusion (iii) follows from (4.10) to (4.11).

    Remark 4.1. The upper bound of Theorem 4.1(i) is optimal due to Proposition 3.1 in [15]. In addition, Theorem 4.1(i) performs better than Theorem 3.1 of [15] which requires X to be sub-Gaussian.

    Remark 4.2. From the proving process of Theorem 4.1(i), we observe

    supΣH(sn,p)E(˜Σ)τΣ1sn,plogpnmin.

    Note that AωA1(1ω) for any symmetric matrix A. Then,

    supΣH(sn,p)E(˜Σ)τΣωsn,plogpnmin

    holds.

    Remark 4.3. The condition

    E|˜σuvσuv|2M (4.12)

    in Lemma 4.1 and Theorem 4.1 is mild. In fact, the generalized Huber estimator (Example 2.1) and generalized truncated mean estimator (Example 2.2) both satisfy (4.12). The details can be found in Appendix.

    Let (˜ΣH)τ and (˜ΣT)τ be defined by (3.1) and (3.2), this section investigates the numerical properties and performances of the estimators (˜ΣH)τ, (˜ΣT)τ and compares these two estimators with the adaptive thresholding estimator ˆΣat proposed by [15]. The following two types of sparse covariance matrices are considered:

    Model 1. (Rothman et al., [18]) Σ=(σuv)p×p with σuv=max{1|uv|/5,0}.

    Model 2. (Cai and Zhang, [15]) Σ=Ip+(D+DT)/(D+DT2+0.01), where D=(duv)p×p is given by duu=0(u=1,,p) and

    duv={1,with probability 0.1;0,with probability 0.8;1,with probability 0.1;foruv.

    Under each model we generate random samples XiRp(i=1,,n) by two different scenarios:

    (i) Xi are independently drawn from multivariate t-distribution tν(0,Σ) with freedom ν=4.5;

    (ii) Xi are independently drawn from multivariate skewed t-distribution stν(0,Σ,ϵ) with freedom ν=5 and skew parameter ϵ=10.

    In each simulation setting we adopt the following two cases of the missingness for the data matrix Y=(X1,,Xn) with Xi=(Xi1,,Xip)T which proposed by Cai and Zhang[15]. The first case is missing uniformly and completely at random(MUCR) in which every entries Xik are observed with probability 0<ρ1. The second case is missing not uniformly but completely at random(MCR) in which Y is divided into four equal-size parts,

    Y=[Y11Y12Y21Y22],Y11,Y12,Y21,Y22Rp2×n2

    where every entries of Y11,Y22 are observed with probability 0<ρ(1)1 every entries of Y12,Y21 are observed with probability 0<ρ(2)1.

    Moreover, for each procedure we set p=50,200,300 and n=50,100,200 respectively and 50 replications are used. Meanwhile, we choose the soft thresholding rule and measure the errors by the spectral and Frobenius norm respectively in each setting. The tuning parameter in thresholding estimator is chosen by 10-fold cross-validation which is explained in Section 4 of Cai and Zhang [15], and unspecified tuning parameters in the generalized pilot estimator are chosen by the method suggested in Section 6 of Avella-Medina et al.[11].

    Tables 1 and 2 demonstrate that thredsholding estimators (˜ΣH)τ and (˜ΣT)τ perform better than the adaptive thresholding estimator ˆΣat under both MUCR and MCR settings. Moreover, thresholding generalized Huber estimator (˜ΣH)τ outperforms thresholding generalized truncated mean estimator (˜ΣT)τ. We also find that the errors decrease if sample size n gets larger. Meanwhile, we observe that the errors under Model 1 is larger than under Model 2 since the covariance matrix in Model 1 is more dense than in Model 2. All these numerical results are consistent with our theoretical results.

    Table 1.  Means errors (with standard errors in parentheses) for three kinds of thresholding estimators with t-distribution.
    Spectral norm Frobenius norm
    (p,n) ˆΣat (˜ΣH)τ (˜ΣT)τ ˆΣat (˜ΣH)τ (˜ΣT)τ
    Model 1, MUCR, ρ=0.5
    (50, 50) 6.59(0.09) 4.21(0.03) 5.14(0.01) 11.13(0.09) 8.28(0.02) 9.24(0.02)
    (50, 200) 3.78(0.04) 2.07(0.06) 2.28(0.02) 5.66(0.01) 3.15(0.04) 4.06(0.01)
    (200, 100) 5.62(0.02) 3.67(0.02) 4.42(0.03) 16.17(0.02) 13.02(0.03) 13.91(0.05)
    (200, 200) 4.46(0.03) 2.59(0.02) 3.02(0.01) 11.39(0.03) 8.49(0.01) 9.45(0.07)
    (300, 200) 4.97(0.02) 2.92(0.07) 3.63(0.04) 15.80(0.04) 12.73(0.03) 13.68(0.06)
    Model 2, MUCR, ρ=0.5
    (50, 50) 5.72(0.03) 3.58(0.02) 4.29(0.01) 9.12(0.08) 5.34(0.02) 6.93(0.02)
    (50, 200) 3.45(0.06) 1.65(0.01) 2.12(0.02) 5.23(0.03) 2.17(0.04) 3.41(0.06)
    (200, 100) 5.31(0.03) 3.26(0.08) 4.13(0.04) 11.88(0.01) 8.54(0.03) 10.07(0.04)
    (200, 200) 3.89(0.02) 1.92(0.01) 2.96(0.03) 8.96(0.01) 5.46(0.02) 6.84(0.05)
    (300, 200) 4.27(0.02) 2.14(0.02) 3.48(0.04) 11.51(0.02) 8.12(0.03) 9.72(0.05)
    Model 1, MCR, ρ(1)=0.8,ρ(2)=0.2
    (50, 50) 6.42(0.01) 4.12(0.01) 4.93(0.02) 10.79(0.02) 7.90(0.09) 9.12(0.03)
    (50, 200) 3.65(0.04) 1.96(0.03) 2.16(0.02) 5.42(0.04) 2.86(0.01) 3.96(0.02)
    (200, 100) 5.47(0.06) 3.55(0.02) 4.32(0.04) 15.66(0.02) 12.93(0.02) 13.77(0.06)
    (200, 200) 4.19(0.05) 2.38(0.01) 2.74(0.04) 11.15(0.01) 8.32(0.02) 9.26(0.05)
    (300, 200) 4.52(0.03) 2.63(0.05) 3.25(0.05) 15.48(0.04) 12.56(0.01) 13.18(0.07)
    Model 2, MCR, ρ(1)=0.8,ρ(2)=0.2
    (50, 50) 5.48(0.01) 3.31(0.09) 4.28(0.03) 8.97(0.08) 5.19(0.01) 6.62(0.02)
    (50, 200) 3.03(0.02) 1.31(0.02) 1.97(0.01) 5.12(0.03) 2.02(0.04) 3.47(0.03)
    (200, 100) 5.15(0.04) 3.15(0.02) 4.09(0.03) 11.56(0.04) 8.25(0.03) 9.86(0.04)
    (200, 200) 3.54(0.03) 1.72(0.01) 2.68(0.03) 8.61(0.01) 5.36(0.02) 6.73(0.06)
    (300, 200) 3.96(0.07) 2.08(0.03) 3.04(0.05) 11.35(0.02) 8.01(0.02) 9.69(0.07)

     | Show Table
    DownLoad: CSV
    Table 2.  Means errors (with standard errors in parentheses) for three kinds of thresholding estimators with skewed t-distribution.
    Spectral norm Frobenius norm
    (p,n) ˆΣat (˜ΣH)τ (˜ΣT)τ ˆΣat (˜ΣH)τ (˜ΣT)τ
    Model 1, MUCR, ρ=0.5
    (50, 50) 8.98(0.04) 7.34(0.03) 8.03(0.04) 12.44(0.02) 9.74(0.01) 10.79(0.02)
    (50, 200) 4.69(0.07) 3.15(0.08) 3.68(0.01) 6.06(0.08) 3.83(0.03) 4.65(0.03)
    (200, 100) 8.17(0.03) 6.88(0.04) 7.53(0.03) 18.32(0.05) 15.41(0.05) 16.93(0.06)
    (200, 200) 5.93(0.01) 4.54(0.09) 5.11(0.01) 12.58(0.06) 9.93(0.02) 11.07(0.05)
    (300, 200) 6.98(0.02) 5.85(0.02) 6.06(0.04) 18.19(0.04) 15.34(0.03) 16.51(0.06)
    Model 2, MUCR, ρ=0.5
    (50, 50) 7.67(0.01) 5.41(0.03) 6.34(0.04) 10.31(0.08) 8.68(0.08) 9.26(0.03)
    (50, 200) 4.15(0.02) 1.88(0.06) 2.65(0.03) 5.49(0.01) 3.24(0.04) 4.01(0.02)
    (200, 100) 7.42(0.01) 5.14(0.09) 6.01(0.02) 15.45(0.02) 13.73(0.03) 14.39(0.05)
    (200, 200) 4.88(0.03) 2.64(0.02) 3.60(0.08) 10.63(0.01) 8.89(0.02) 9.37(0.06)
    (300, 200) 5.41(0.01) 3.17(0.02) 4.32(0.06) 15.30(0.02) 13.36(0.04) 13.95(0.05)
    Model 1, MCR, ρ(1)=0.8,ρ(2)=0.2
    (50, 50) 8.75(0.02) 7.16(0.04) 7.84(0.04) 12.23(0.09) 9.46(0.02) 10.55(0.01)
    (50, 200) 4.36(0.08) 3.03(0.05) 3.59(0.03) 5.80(0.01) 3.62(0.06) 4.42(0.02)
    (200, 100) 7.99(0.03) 6.52(0.02) 7.38(0.04) 18.08(0.03) 15.29(0.03) 16.54(0.06)
    (200, 200) 5.81(0.02) 4.34(0.03) 4.96(0.01) 12.36(0.02) 9.64(0.04) 10.78(0.05)
    (300, 200) 6.69(0.07) 5.48(0.01) 5.92(0.03) 18.12(0.03) 15.25(0.05) 16.49(0.07)
    Model 2, MCR, ρ(1)=0.8,ρ(2)=0.2
    (50, 50) 7.58(0.02) 5.22(0.04) 6.27(0.05) 10.22(0.03) 8.36(0.03) 8.83(0.02)
    (50, 200) 4.06(0.06) 1.84(0.02) 2.48(0.05) 5.26(0.04) 2.95(0.06) 3.72(0.03)
    (200, 100) 7.14(0.02) 5.06(0.07) 5.95(0.04) 15.34(0.01) 13.36(0.05) 14.07(0.06)
    (200, 200) 4.76(0.02) 2.42(0.01) 3.36(0.08) 10.46(0.03) 8.77(0.03) 9.16(0.06)
    (300, 200) 5.30(0.01) 3.08(0.02) 4.18(0.06) 15.08(0.09) 13.06(0.05) 13.74(0.05)

     | Show Table
    DownLoad: CSV

    In this paper, we propose the generalized pilot estimator in the presence of incomplete heavy-tailed data. Moreover, two kinds of generalized pilot estimators are provided under the bounded fourth moment assumption while lots of previous studies hinged upon the sub-Gaussian condition. In addition, we establish the thresholding pilot estimator for a family of sparse covariance matrices and give the convergence rates in terms of probability and expectation respectively.

    In the future, we may consider the compositional data with missing data under lower bounded moment assumption by referring Li et al. [20]. Moreover, we can adopt the different methods to estimate the sparse covariance matrix with incomplete data such as the proximal distance algorithm [21] or continuous matrix shrinkage [22].

    The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

    This paper is supported by the National Natural Science Foundation of China (No. 12171016).

    All authors declare no conflicts of interest in this paper.

    In order to show Example 2.1 and Example 2.2 satisfying condition (4.12), we introduce a Proposition A.1.

    Proposition A.1. Let max1upE|Xu|4k4, EXu=μu, E(XuXv)=μuv, and Assumption 1.1 holds. If ˜μu and ˜μuv satisfy

    |˜μu||A|1iA|Xiu|, (A.1)
    |˜μuv||B|1iB|XiuXiv| (A.2)

    where A,B{1,,n}. Then, ˜Σ=(˜σuv)p×p=(˜μuv˜μu˜μv)p×p obeys (4.12).

    Proof. It suffices to prove

    |σuv|1, (A.3)
    E|˜σuv|21. (A.4)

    By max1upE|Xu|4k4, one knows E|Xu|(E|Xu|4)1/4k and

    E|XuXv|(EX2u)12(EX2v)12(E|Xu|4)14(E|Xv|4)14k2.

    Thus, it holds

    |σuv|E|XuXv|+(E|Xu|)(E|Xv|)2k21

    which reaches (A.3).

    For (A.4), one observes

    E|˜σuv|2=E|˜μuv˜μu˜μv|2E|˜μuv|2+E|˜μu˜μv|2. (A.5)

    According to (A.1) and Jensen's inequality, it follows

    E|˜μu|4E(1|A|iA|Xiu|)4E|Xu|4k4.

    Furthermore, upon combining Cauchy–Schwarz inequality leads to

    E|˜μu˜μv|2(E|˜μu|4E|˜μv|4)12k41. (A.6)

    Similarly, (A.2) implies

    E|˜μuv|2E(1|B|iB|XiuXiv|)2E|XuXv|2.

    By Cauchy-Schwarz inequality and max1upE|Xu|4k4, one finds

    E|XuXv|2(E|Xu|4)12(E|Xv|4)12k4.

    Hence,

    E|˜μuv|2k41. (A.7)

    Finally, the expected conclusion (A.4) follows from (A.5)–(A.7). This completes the proof of Proposition A.1.

    Now, based on Proposition A.1 we verify two kinds of generalized pilot estimators (Example 2.1 and Example 2.2) satisfying (4.12).

    For the generalized truncated mean estimator ˜ΣT:=((˜μT)uv(˜μT)u(˜μT)v)p×p, it is easy to see that (˜μT)u, (˜μT)uv obey (A.1), (A.2) respectively.

    By (2.14), we know

    |(˜μT)u|1nuni=1|Xiu|=1nuni=1|XiuSiu|=1nuiAu|Xiu|

    where Au={i:Siu0} and |Au|=nu. Similarly, it holds

    |(˜μT)uv|1nuviAuv|XiuXiv|

    with Auv={i:SiuSiv0} and |A_{uv}| = n_{uv}^*. The above two inequalities imply (\tilde{\mu}^*_{T})_{u}, \; (\tilde{\mu}^*_{T})_{uv} satisfying (A.1), (A.2) respectively.

    In fact, it is hard to check the generalized Huber estimator

    \tilde{\mathbf{\Sigma}}^*_{H}: = \Big((\tilde{\mu}^*_{H})_{uv}- (\tilde{\mu}^*_{H})_u(\tilde{\mu}^*_{H})_v\Big)_{p\times p}

    satisfying (4.12) due to the structures of (\tilde{\mu}^*_H)_u and (\tilde{\mu}^*_{H})_{uv} being unclear. But we can consider a special case.

    Proposition A.2. Let A_u = \{i:S_{iu}\neq 0\}, \; A_{uv} = \{i:S_{iu}S_{iv}\neq 0\} . If \alpha_u, \; \alpha_{uv} defined in (2.9) and (2.10) obey

    \alpha_u > \max\limits_{i\in A_u}X_{iu}-\min\limits_{i\in A_u}X_{iu}, \; \alpha_{uv} > \max\limits_{i\in A_{uv}}X_{iu}X_{iv}-\min\limits_{i\in A_{uv}}X_{iu}X_{iv}

    respectively. Then, (\tilde{\mu}^*_H)_u, \; (\tilde{\mu}^*_{H})_{uv} satisfy (A.1), (A.2).

    Proof. For i\in A_u , it holds

    \begin{gather*} X_{iu}-\left(\max\limits_{i\in A_u}X_{iu}-\alpha_u\right)\geq\min\limits_{i\in A_u}X_{iu}-\max\limits_{i\in A_u}X_{iu}+\alpha_u > 0, \\ X_{iu}-\left(\min\limits_{i\in A_u}X_{iu}+\alpha_u\right)\leq\max\limits_{i\in A_u}X_{iu}-\min\limits_{i\in A_u}X_{iu}-\alpha_u < 0. \end{gather*}

    Obviously, (2.9) is equivalent to

    \sum\limits_{i\in A_u}\psi_{\alpha_u}(X_{iu}-(\tilde{\mu}_H^*)_u) = 0.

    By the definition of \psi_{\alpha_u}(x) , we have

    \begin{gather*} \sum\limits_{i\in A_u}\psi_{\alpha_u}\left(X_{iu}-\left(\max\limits_{i\in A_u}X_{iu}-{\alpha_u}\right)\right) > 0, \\ \sum\limits_{i\in A_u}\psi_{\alpha_u}\left(X_{iu}-\left(\min\limits_{i\in A_u}X_{iu}+{\alpha_u}\right)\right) < 0. \end{gather*}

    Note that \sum\limits_{i\in A_u}\psi_{\alpha_u}(X_{iu}-(\tilde{\mu}_H^*)_u) is the continuous and decreasing function about (\tilde{\mu}_H^*)_u . Then, the solution of equation \sum\limits_{i\in A_u}\psi_{\alpha_u}(X_{iu}-(\tilde{\mu}_H^*)) = 0 belongs to the interval (\max\limits_{i\in A_u}X_{iu}-\alpha_u, \min\limits_{i\in A_u}X_{iu}+\alpha_u) .

    Hence, we obtain \max\limits_{\{i\in A_u\}}X_{iu}-\alpha_u < (\tilde{\mu}_H^*)_u < \min\limits_{\{i\in A_u\}}X_{iu}+\alpha_u and

    -\alpha_u < X_{iu}-(\tilde{\mu}_H^*)_u < \alpha_u.

    Furthermore, the above inequality and definition of \psi_{\alpha_u}(x) implies

    \sum\limits_{i\in A_u}\psi_{\alpha_u}(X_{iu}-(\tilde{\mu}_H^*)_u) = \sum\limits_{i\in A_u}(X_{iu}-(\tilde{\mu}_H^*)_u) = \sum\limits_{i\in A_u} X_{iu}-n_u^*(\tilde{\mu}_H^*)_u.

    Therefore, (\tilde{\mu}_H^*)_u = (n_u^*)^{-1}\sum_{i\in A_u}X_{iu} satisfies (A.1).

    Following the similar discussion, we can derive (\tilde{\mu}_H^*)_{uv} satisfying (A.2) with

    \alpha_{uv} > \max\limits_{i\in A_{uv}}X_{iu}X_{iv}-\min\limits_{i\in A_{uv}}X_{iu}X_{iv}.

    In fact, the condition in Proposition A.2 is easy to satisfy, since \log p = o(n_{\min}^*) and n_{\min}^*\leq n_{uv}^*\leq n_{u}^* lead to large enough \alpha_{u} and \alpha_{uv}.



    [1] S. Mendelson, N. Zhivotovskiy, Robust covariance estimation under L_{4}-L_{2} norm equivalence, Ann. Statist., 48 (2020), 1648–1664. https://doi.org/10.1214/19-AOS1862 doi: 10.1214/19-AOS1862
    [2] Y. Dendramis, L. Giraitis, G. Kapetanios, Estimation of time-varying covariance matrices for large datasets, Economet. Theory, 37 (2021), 1100–1134. https://doi.org/10.1017/S0266466620000535 doi: 10.1017/S0266466620000535
    [3] Y. Zhang, J. Tao, Y. Lv, G. Wang, An improved DCC model based on large-dimensional covariance matrices estimation and its applications, Symmetry, 15 (2023), 953. https://doi.org/10.3390/sym15040953 doi: 10.3390/sym15040953
    [4] D. Belomestny, M. Trabs, A. Tsybakov, Sparse covariance matrix estimation in high-dimensional deconvolution, Bernoulli, 25 (2019), 1901–1938. https://doi.org/10.3150/18-BEJ1040A doi: 10.3150/18-BEJ1040A
    [5] X. Kang, X. Deng, On variable ordination of Cholesky-based estimation for a sparse covariance matrix, Canad. J. Stat., 49 (2021), 283–310. https://doi.org/10.1002/cjs.11564 doi: 10.1002/cjs.11564
    [6] N. Bettache, C. Butucea, M. Sorba, Fast nonasymptotic testing and support recovery for large sparse Toeplitz covariance matrices, J. Multivariate Anal., 190 (2022), 104883. https://doi.org/10.1016/j.jmva.2021.104883 doi: 10.1016/j.jmva.2021.104883
    [7] W. Liang, Y. Wu, H. Chen, Sparse covariance matrix estimation for ultrahigh dimensional data, Stat, 11 (2022), e479. https://doi.org/10.1002/sta4.479 doi: 10.1002/sta4.479
    [8] P. J. Bickel, E. Levina, Covariance regularization by thresholding, Ann. Statist., 36 (2008), 2577–2604. https://doi.org/10.1214/08-AOS600 doi: 10.1214/08-AOS600
    [9] T. Cai, W. Liu, Adaptive thresholding for sparse covariance matrix estimation, J. Amer. Stat. Assoc., 106 (2011), 672–684. https://doi.org/10.1198/jasa.2011.tm10560 doi: 10.1198/jasa.2011.tm10560
    [10] T. T. Cai, H. H. Zhou, Optimal rates of convergence for sparse covariance matrix estimation, Ann. Statist., 40 (2012), 2389–2420. https://doi.org/10.1214/12-AOS998 doi: 10.1214/12-AOS998
    [11] M. Avella-Medina, H. Battery, J. Fan, Q. Li, Robust estimation of high-dimensional covariance and precision matrices, Biometrika, 105 (2018), 271–284. https://doi.org/10.1093/biomet/asy011 doi: 10.1093/biomet/asy011
    [12] R. D. Hawkins, G. C. Hon, B. Ren, Next-generation genomics: an intergrative approach, Nat. Rev. Genet., 11 (2010), 476–486. https://doi.org/10.1038/nrg2795 doi: 10.1038/nrg2795
    [13] K. Lounici, Sparse principal component analysis with missing observations, In: C. Houdré, D. Mason, J. Rosiński, J. Wellner, High dimensional probability VI, Progress in Probability, 66 (2013), 327–356. https://doi.org/10.1007/978-3-0348-0490-5_20
    [14] P. L. Loh, M. J. Wainwright, High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity, Ann. Statist., 40 (2012), 1637–1664. https://doi.org/10.1214/12-AOS1018 doi: 10.1214/12-AOS1018
    [15] T. T. Cai, A. Zhang, Minimax rate-optimal estimation of high-dimensional covariance matrices with incomplete data, J. Multivariate Anal., 150 (2016), 55–74. https://doi.org/10.1016/j.jmva.2016.05.002 doi: 10.1016/j.jmva.2016.05.002
    [16] J. Fan, Q. Li, Y. Wang, Estimation of high-dimensional mean regression in absence of symmetry and light-tail assumptions, J. R. Stat. Soc. B, 79 (2017), 247–265. https://doi.org/10.1111/rssb.12166 doi: 10.1111/rssb.12166
    [17] M. Pascal, Concentration inequalities and model selection, Berlin, Heidelberg: Springer, 2007. https://doi.org/10.1007/978-3-540-48503-2
    [18] A. J. Rothman, E. Levina, J. Zhu, Generalized thresholding of large covariance matrices, J. Am. Stat. Assoc., 104 (2009), 177–186. https://doi.org/10.1198/jasa.2009.0101 doi: 10.1198/jasa.2009.0101
    [19] T. T. Cai, W. Liu, H. H. Zhou, Estiamtion sparse precision matrix: optimal rates of covariacne and adaptive estimation, Ann. Statist., 44 (2016), 455–488. https://doi.org/10.1214/13-AOS1171 doi: 10.1214/13-AOS1171
    [20] D. Li, A. Srinivasan, Q. Chen, L. Xue, Robust covariance matrix estimation for high-dimensional compositional data with application to sales data analysis, J. Bus. Econ. Stat., in press. https://doi.org/10.1080/07350015.2022.2106990
    [21] J. Xu, K. Lange, A proximal distance algorithm for likelihood-based sparse covariance estimation, Biometrika, 109 (2022), 1047–1066. https://doi.org/10.1093/biomet/asac011 doi: 10.1093/biomet/asac011
    [22] F. Xie, J. Cape, C. E. Priebe, Y. Xu, Bayesian sparse spiked covariance model with a continuous matrix shrinkage prior, Bayesian Anal., 17 (2022), 1193–1217. https://doi.org/10.1214/21-BA1292 doi: 10.1214/21-BA1292
  • Reader Comments
  • © 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1540) PDF downloads(64) Cited by(0)

Figures and Tables

Tables(2)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog