Loading [MathJax]/jax/output/SVG/jax.js
Research article Special Issues

Traffic Transformer: Transformer-based framework for temporal traffic accident prediction

  • Received: 30 November 2023 Revised: 25 December 2023 Accepted: 02 January 2024 Published: 01 April 2024
  • MSC : 68T07, 68T09

  • Reliable prediction of traffic accidents is crucial for the identification of potential hazards in advance, formulation of effective preventative measures, and reduction of accident incidence. Existing neural network-based models generally suffer from a limited field of perception and poor long-term dependency capturing abilities, which severely restrict their performance. To address the inherent shortcomings of current traffic prediction models, we propose the Traffic Transformer for multidimensional, multi-step traffic accident prediction. Initially, raw datasets chronicling sporadic traffic accidents are transformed into multivariate, regularly sampled sequences that are amenable to sequential modeling through a temporal discretization process. Subsequently, Traffic Transformer captures and learns the hidden relationships between any elements of the input sequence, constructing accurate prediction for multiple forthcoming intervals of traffic accidents. Our proposed Traffic Transformer employs the sophisticated multi-head attention mechanism in lieu of the widely used recurrent architecture. This significant shift enhances the model's ability to capture long-range dependencies within time series data. Moreover, it facilitates a more flexible and comprehensive learning of diverse hidden patterns within the sequences. It also offers the versatility of convenient extension and transference to other diverse time series forecasting tasks, demonstrating robust potential for further development in this field. Extensive comparative experiments conducted on a real-world dataset from Qatar demonstrate that our proposed Traffic Transformer model significantly outperforms existing mainstream time series forecasting models across all evaluation metrics and forecast horizons. Notably, its Mean Absolute Percentage Error reaches a minimal value of only 4.43%, which is substantially lower than the error rates observed in other models. This remarkable performance underscores the Traffic Transformer's state-of-the-art level of in predictive accuracy.

    Citation: Mansoor G. Al-Thani, Ziyu Sheng, Yuting Cao, Yin Yang. Traffic Transformer: Transformer-based framework for temporal traffic accident prediction[J]. AIMS Mathematics, 2024, 9(5): 12610-12629. doi: 10.3934/math.2024617

    Related Papers:

    [1] Lizhen Huang, Qunying Wu . Precise asymptotics for complete integral convergence in the law of the logarithm under the sub-linear expectations. AIMS Mathematics, 2023, 8(4): 8964-8984. doi: 10.3934/math.2023449
    [2] Mingzhou Xu, Xuhang Kong . Note on complete convergence and complete moment convergence for negatively dependent random variables under sub-linear expectations. AIMS Mathematics, 2023, 8(4): 8504-8521. doi: 10.3934/math.2023428
    [3] Haiye Liang, Feng Sun . Exponential inequalities and a strong law of large numbers for END random variables under sub-linear expectations. AIMS Mathematics, 2023, 8(7): 15585-15599. doi: 10.3934/math.2023795
    [4] He Dong, Xili Tan, Yong Zhang . Complete convergence and complete integration convergence for weighted sums of arrays of rowwise m-END under sub-linear expectations space. AIMS Mathematics, 2023, 8(3): 6705-6724. doi: 10.3934/math.2023340
    [5] Mingzhou Xu . Complete convergence and complete moment convergence for maximal weighted sums of extended negatively dependent random variables under sub-linear expectations. AIMS Mathematics, 2023, 8(8): 19442-19460. doi: 10.3934/math.2023992
    [6] Yi Wu, Duoduo Zhao . Law of the single logarithm for randomly weighted sums of dependent sequences and an application. AIMS Mathematics, 2024, 9(4): 10141-10156. doi: 10.3934/math.2024496
    [7] Mingzhou Xu . Complete convergence of moving average processes produced by negatively dependent random variables under sub-linear expectations. AIMS Mathematics, 2023, 8(7): 17067-17080. doi: 10.3934/math.2023871
    [8] Li Wang, Qunying Wu . Almost sure convergence theorems for arrays under sub-linear expectations. AIMS Mathematics, 2022, 7(10): 17767-17784. doi: 10.3934/math.2022978
    [9] Lunyi Liu, Qunying Wu . Complete integral convergence for weighted sums of negatively dependent random variables under sub-linear expectations. AIMS Mathematics, 2023, 8(9): 22319-22337. doi: 10.3934/math.20231138
    [10] Shuyan Li, Qunying Wu . Complete integration convergence for arrays of rowwise extended negatively dependent random variables under the sub-linear expectations. AIMS Mathematics, 2021, 6(11): 12166-12181. doi: 10.3934/math.2021706
  • Reliable prediction of traffic accidents is crucial for the identification of potential hazards in advance, formulation of effective preventative measures, and reduction of accident incidence. Existing neural network-based models generally suffer from a limited field of perception and poor long-term dependency capturing abilities, which severely restrict their performance. To address the inherent shortcomings of current traffic prediction models, we propose the Traffic Transformer for multidimensional, multi-step traffic accident prediction. Initially, raw datasets chronicling sporadic traffic accidents are transformed into multivariate, regularly sampled sequences that are amenable to sequential modeling through a temporal discretization process. Subsequently, Traffic Transformer captures and learns the hidden relationships between any elements of the input sequence, constructing accurate prediction for multiple forthcoming intervals of traffic accidents. Our proposed Traffic Transformer employs the sophisticated multi-head attention mechanism in lieu of the widely used recurrent architecture. This significant shift enhances the model's ability to capture long-range dependencies within time series data. Moreover, it facilitates a more flexible and comprehensive learning of diverse hidden patterns within the sequences. It also offers the versatility of convenient extension and transference to other diverse time series forecasting tasks, demonstrating robust potential for further development in this field. Extensive comparative experiments conducted on a real-world dataset from Qatar demonstrate that our proposed Traffic Transformer model significantly outperforms existing mainstream time series forecasting models across all evaluation metrics and forecast horizons. Notably, its Mean Absolute Percentage Error reaches a minimal value of only 4.43%, which is substantially lower than the error rates observed in other models. This remarkable performance underscores the Traffic Transformer's state-of-the-art level of in predictive accuracy.



    Let {X,Xn;n1} be a sequence of independent and identically distributed (i.i.d.) random variables. Complete convergence first established by Hsu and Robbins [1] (for the sufficiency) and Erdős [2,3] (for the necessity) proceeds as follows:

    n=1P(|Sn|ϵn)<,foranyϵ>0,

    if and only if EX=0 and EX2<. Baum and Katz [4] extended the above result and obtained the following theorem:

    n=1nr/p2P(|Sn|ϵn1/p)<,for0<p<2,rp,anyϵ>0, (1.1)

    if and only if E|X|r<, and when r1, EX=0.

    There are several extensions of the research on complete convergence. One of them is the study of the convergence rate of complete convergence. The first work was the convergence rate, achieved by Heyde [5]. He got the result of limϵ0ϵ2n=1P(|Sn|ϵn)=EX2 under the conditions EX=0 and EX2<. For more results on the convergence rate, see Chen [6], Sp˘ataru [7], Gut and Sp˘ataru [8], Sp˘atarut and Gut [9], Gut and Steinebach [10], He and Xie [11], Kong and Dai [12], etc.

    But (1.1) does not hold for p=2. However, by replacing n1/p by nlnn and nlnlnn, Gut and Sp˘ataru [8] and Sp˘atarut and Gut [9] established the following results called the convergence rate of the law of the (iterated) logarithm. Supposing that {X,Xn;n1} is a sequence of i.i.d. random variables with EX=0 and EX2=σ2<, Gut and Sp˘ataru [8] and Sp˘atarut and Gut [9] obtained the following results respectively:

    limϵ0ϵ2+2δn=1lnδnnP(|Sn|ϵnlnn)=E|N|2+2δσ2+2δδ+1,0δ1, (1.2)

    where N is the standard normal distribution, and

    limϵ0ϵ2n=31nlnnP(|Sn|ϵnlnlnn)=σ2. (1.3)

    Motivated by the above results, the purpose of this paper is to extend (1.2) and (1.3) to sub-linear expectation space (to be introduced in Section 2), which was introduced by Peng [13,14], and to study the necessary conditions of (1.2).

    Under the theoretical framework of the traditional probability space, in order to infer the model, all statistical models must assume that the error (and thus the response variable) is subject to a certain uniquely determined probability distribution, that is, the distribution of the model is deterministic. Classical statistical modeling and statistical inference are based on such distribution certainty or model certainty. "Distribution certainty" modeling has yielded a set of mature theories and methods. However, the real complex data in economic, financial and other fields often have essential and non negligible probability and distribution uncertainty. The probability distribution of the response variable to be studied is uncertain and does not meet the assumptions of classical statistical modeling. Therefore, classical probability statistical modeling methods cannot be used for this type of data modeling. Driven by uncertainty issues, Peng [14,15] established a theoretical framework for sub-linear expectation spaces from the perspective of expectations. Sub-linear expectation has a wide range of application backgrounds and prospects. In recent years, a series of research achievements on limit theory in sub-linear expectation spaces has been established. See Peng [14,15], Zhang [16,17,18], Hu [19], Wu and Jiang [20,21], Wu et al. [22], Wu and Lu [23], etc. Wu [24], Liu and Zhang [25], Ding [26] and Liu and Zhang [27] obtained the convergence rate for complete moment convergence. However, the convergence rate results for the (iterative) logarithmic law have not been reported yet. The main difficulty in studying it is that the sub-linear expectation and capacity are not additive, which makes many traditional probability space tools and methods no longer effective; thus, it is much more complex and difficult to study it.

    In Section 2, we will provide the relevant definitions of sub-linear expectation space, the basic properties and the lemmas that need to be used in this paper.

    Let (Ω,F) be a measurable space and let H be a linear space of random variables on (Ω,F) such that if X1,,XnH then φ(X1,,Xn)H for each φCl,Lip(Rn), where Cl,Lip(Rn) denotes the set of local Lipschitz functions on Rn. In this case, for XH, X is called a random variable.

    Definition 2.1. A sub-linear expectation ˆE on H is a function: HR satisfying the following for all X,YH:

    (a) Monotonicity: If XY then ˆEXˆEY;

    (b) Constant preservation: ˆEc=c;

    (c) Sub-additivity: ˆE(X+Y)ˆEX+ˆEY;

    (d) Positive homogeneity: ˆE(λX)=λˆEX,λ0.

    The triple (Ω,H,ˆE) is called a sub-linear expectation space. The conjugate expectation ˆε of ˆE is defined by

    ˆεX:=ˆE(X),XH.

    Let GF. A function V:G[0,1] is called a capacity if

    V()=0,V(Ω)=1andV(A)V(B)forAB,A,BG.

    The upper and lower capacities (V,ν) corresponding to (Ω,H,ˆE) are respectively defined as

    V(A):=inf{ˆEξ;I(A)ξ,ξH},ν(A):=1V(Ac),AF,Ac:=ΩA.

    The Choquet integrals is defined by

    CV(X):=0V(X>x)dx+0(V(X>x)1)dx.

    From all of the definitions above, it is easy to obtain the following Proposition 2.1.

    Proposition 2.1. Let X,YH and A,BF.

    (i) ˆεXˆEX,ˆE(X+a)=ˆEX+a,aR;

    (ii) |ˆE(XY)|ˆE|XY|,ˆE(XY)ˆEXˆEY;

    (iii) ν(A)V(A),V(AB)V(A)+V(B),ν(AB)ν(A)+V(B);

    (iv) If fI(A)g,f,gH, then

    ˆEfV(A)ˆEg,ˆεfν(A)ˆεg. (2.1)

    (v)(Lemma 4.5 (iii) in Zhang [16]) For any c>0,

    ˆE(|X|c)c0V(|X|>x)dxCV(|X|), (2.2)

    where, here and hereafter, ab:=min(a,b), and ab:=max(a,b) for any a,bR.

    (vi)Markov inequality: V(|X|x)ˆE(|X|p)/xp,x>0,p>0;

    Jensen inequality: (ˆE(|X|r))1/r(ˆE(|X|s))1/sfor0<rs.

    Definition 2.2. (Peng [14,15])

    (ⅰ) (Identical distribution) Let X1 and X2 be two random variables on (Ω,H,ˆE). They are called identically distributed, denoted by X1d=X2, if

    ˆE(φ(X1))=ˆE(φ(X2)),forallφCl,Lip(Rn).

    A sequence {Xn;n1} of random variables is said to be identically distributed if for each i1, Xid=X1.

    (ⅱ) (Independence) In a sub-linear expectation space (Ω,H,ˆE), a random vector Y=(Y1,,Yn), YiH is said to be independent of another random vector X=(X1,,Xm),XiH under ˆE if for each φCl,Lip(Rm×Rn), there is ˆE(φ(X,Y))=ˆE[ˆE(φ(x,Y))|x=X].

    (ⅲ) (Independent and identically distributed) A sequence {Xn;n1} of random variables is said to be i.i.d., if Xi+1 is independent of (X1,,Xi) and Xid=X1 for each i1.

    From Definition 2.2 (ⅱ), it can be verified that if Y is independent of X, and X0,ˆEY0, then ˆE(XY)=ˆE(X)ˆE(Y). Further, if Y is independent of X and X,Y0, then

    ˆE(XY)=ˆE(X)ˆE(Y),ˆε(XY)=ˆε(X)ˆε(Y). (2.3)

    For convenience, in all subsequent parts of this article, let {X,Xn;n1} be a sequence of random variables in (Ω,H,ˆE), and Sn=ni=1Xi. For any XH and c>0, set X(c):=(c)Xc. The symbol c represents a positive constant that does not depend on n. Let axbx denote limxax/bx=1, axbx denote that there exists a constant c>0 such that axcbx for sufficiently large x, [x] denote the largest integer not exceeding x, and I() denote an indicator function.

    To prove the main results of this article, the following three lemmas are required.

    Lemma 2.1. (Theorem 3.1 (a) and Corollary 3.2 (b) in Zhang [16]) Let {Xk;k1} be a sequence of independent random variables in (Ω,H,ˆE).

    (i) If ˆEXk0, then for any x,y>0,

    V(Snx)V(max1knXk>y)+exp(x22(xy+Bn){1+23ln(1+xyBn)});

    (ii) If ˆεXk0, then there exists a constant c>0 such that for any x>0,

    ν(Snx)cBnx2,

    where Bn=nk=1ˆEX2k.

    Here we give the notations of a G-normal distribution which was introduced by Peng [14].

    Definition 2.3. (G-normal random variable) For 0σ_2ˉσ2<, a random variable ξ in (Ω,H,ˆE) is called a G-normal N(0,[σ_2,ˉσ2]) distributed random variable (write ξN(0,[σ_2,ˉσ2]) under ˆE), if for any φCl,Lip(R), the function u(x,t)=ˆE(φ(x+tξ)) (xR,t0) is the unique viscosity solution of the following heat equation:

    tuG(2xxu)=0,u(0,x)=φ(x),

    where G(α)=(ˉσ2α+σ_2α)/2.

    From Peng [14], if ξN(0,[σ_2,ˉσ2]) under ˆE, then for each convex function φ,

    ˆE(φ(ξ))=12πφ(ˉσx)ex2/2dx. (2.4)

    If σ=ˉσ=σ_, then N(0,[σ_2,ˉσ2])=N(0,σ2) which is a classical normal distribution.

    In particular, notice that φ(x)=|x|p,p1 is a convex function, taking φ(x)=|x|p,p1 in (2.4), we get

    ˆE(|ξ|p)=2ˉσp2π0xpex2/2dx<. (2.5)

    Equation (2.5) implies that

    CV(|ξ|p)=0V(|ξ|p>x)dx1+1ˆE(|ξ|2p)x2dx<,foranyp1/2.

    Lemma 2.2. (Theorem 4.2 in Zhang [17], Corollary 2.1 in Zhang [18]) Let {X,Xn;n1} be a sequence of i.i.d. random variables in (Ω,H,ˆE). Suppose that

    (i)limcˆE(X2c) is finite;

    (ii)x2V(|X|x)0 as x;

    (iii)limcˆE(X(c))=limcˆE((X)(c))=0.

    Then for any bounded continuous function φ,

    limnˆE(φ(Snn))=ˆE(φ(ξ)),

    and if F(x):=V(|ξ|x), then

    limnV(|Sn|>xn)=F(x),ifxisacontinuouspointofF, (2.6)

    where ξN(0,[σ_2,ˉσ2]) under ˆE, ˉσ2=limcˆE(X2c) and σ_2=limcˆε(X2c).

    Lemma 2.3. (Lemma 2.1 in Zhang [17]) Let {Xn;n1} be a sequence of independent random variables in (Ω,H,ˆE), and 0<α<1 be a real number. If there exist real constants βn,k such that

    V(|SnSk|βn,k+ϵ)α,forallϵ>0kn,

    then

    (1α)V(maxkn(|Sk|βn,k)>x+ϵ)V(|Sn|>x),forallx>0,ϵ>0.

    The results of this article are as follows.

    Theorem 3.1. Let {X,Xn;n1} be a sequence of i.i.d. random variables in (Ω,H,ˆE). Suppose that

    CV(X2)<,limcˆE(X(c))=limcˆE((X)(c))=0. (3.1)

    Then for 0δ1,

    limϵ0ϵ2+2δn=2lnδnnV(|Sn|ϵnlnn)=CV(|ξ|2δ+2)δ+1, (3.2)

    where, here and hereafter, ξN(0,[σ_2,ˉσ2]) under ˆE, ˉσ2=limcˆE(X2c) and σ_2=limcˆε(X2c).

    Conversely, if (3.2) holds for δ=1, then (3.1) holds.

    Theorem 3.2. Under the conditions of Theorem 3.1,

    limϵ0ϵ2n=31nlnnV(|Sn|ϵnlnlnn)=CV(ξ2). (3.3)

    Remark 3.1. Theorems 3.1 and 3.2 not only extend Theorem 3 in [8] and Theorem 2 in [9], respectively, from the probability space to sub-linear expectation space, but they also study and obtain necessary conditions for Theorem 3.1.

    Remark 3.2. Under the condition limcˆE(|X|c)+=0 (limcˆE(X2c)+=0limcˆE(|X|c)+=0), it is easy to verify that ˆE(±X)=limcˆE((±X)(c)). So, Corollary 3.9 in Ding [26] has two more conditions than Theorem 3.2: ˆE is continuous and limcˆE(X2c)+=0. Therefore, Corollary 3.9 in Ding [26] and Theorem 3.2 cannot be inferred from each other.

    Proof of the direct part of Theorem 3.1.. Note that

    ϵ2+2δn=2lnδnnV(|Sn|ϵnlnn)=ϵ2+2δn=2lnδnnV(|ξ|ϵlnn)+ϵ2+2δn=2lnδnn(V(|Sn|ϵnlnn)V(|ξ|ϵlnn)):=I1(ϵ)+I2(ϵ).

    Hence, in order to establish (3.2), it suffices to prove that

    limϵ0I1(ϵ)=CV(|ξ|2δ+2)δ+1 (3.4)

    and

    limϵ0I2(ϵ)=0. (3.5)

    Given that lnδnn and V(|ξ|εlnn) is monotonically decreasing with respect to n, it holds that

    I1(ϵ)=ϵ2+2δn=2lnδnnV(|ξ|ϵlnn)=ϵ2+2δlnδ22V(|ξ|ϵln2)+ϵ2+2δn=3nn1lnδnnV(|ξ|ϵlnn)dxϵ2+2δlnδ22+ϵ2+2δn=3nn1lnδxxV(|ξ|ϵlnx)dx=ϵ2+2δlnδ22+ϵ2+2δ2lnδxxV(|ξ|ϵlnx)dx,

    and

    I1(ϵ)=ϵ2+2δn=2lnδnnV(|ξ|ϵlnn)=ϵ2+2δn=2n+1nlnδnnV(|ξ|ϵlnn)dxϵ2+2δn=2n+1nlnδxxV(|ξ|ϵlnx)dx=ϵ2+2δ2lnδxxV(|ξ|ϵlnx)dx.

    Therefore, (3.4) follows from

    limϵ0I1(ϵ)=limϵ0ϵ2+2δ2lnδxxV(|ξ|ϵlnx)dx=limϵ0εln22y2δ+1V(|ξ|y)dy(lety=εlnx)=02y2δ+1V(|ξ|y)dy=CV(|ξ|2+2δ)δ+1.

    Let M40; write AM,ϵ:=exp(Mϵ2).

    |I2(ϵ)|ϵ2+2δ2n[AM,ϵ]lnδnn|V(|Sn|nϵlnn)V(|ξ|ϵlnn)|+ϵ2+2δn>[AM,ϵ]lnδnnV(|Sn|ϵnlnn)+ϵ2+2δn>[AM,ϵ]lnδnnV(|ξ|ϵlnn):=I21(ϵ)+I22(ϵ)+I23(ϵ). (3.6)

    Let us first estimate I21(ϵ). For any β>ϵ2,

    I21(ϵ)ϵ2+2δAM,ϵ2lnδxx|V(|S[x]|[x]ϵlnx)V(|ξ|ϵlnx)|dxϵ2+2δAβ,ϵ22lnδxxdx+ϵ2+2δAM,ϵAβ,ϵlnδxxsupnAβ,ϵ|V(|Sn|nϵlnx)V(|ξ|ϵlnx)|dx2β1+δ+M02y1+2δsupnAβ,ϵ|V(|Sn|ny)F(y)|dy. (3.7)

    By (2.2), ˆE(X2c)c0V(X2x)dx; also, notice that V(X2x) is a decreasing function of x. So, CV(X2)=0V(X2x)dx< implies that limcˆE(X2c) is finite and limxx2V(|X|x)=limxxV(X2x)=0. Therefore, (3.1) implies the conditions of Lemma 2.2. From (2.6),

    limϵ0supnAβ,ϵ|V(|Sn|ny)F(y)|=0,ifyisacontinuouspointofF. (3.8)

    Note that F(y) is a monotonically decreasing function, so its discontinuous points are countable. Hence (3.8) holds for each y, except on a set with the null Lebesgue measure. Combining y2δ+1supnAβ,ϵ|V(|Sn|ny)F(y)|2Mδ+1/2 for any 0yM, by the Lebesgue bounded convergence theorem, (3.8) leads to the following:

    limϵ0M0y2δ+1supnAβ,ϵ|V(|Sn|ny)F(y)|dy=0. (3.9)

    Let ϵ0 first, then let β0; from (3.7) and (3.9), we get

    limϵ0I21(ϵ)=0. (3.10)

    Next, we estimate that I22(ϵ). For 0<μ<1, let φμ(x)Cl,Lip(R) be an even function such that 0φμ(x)1 for all x and φμ(x)=0 if |x|μ and φμ(x)=1 if |x|>1. Then

    I(|x|1)φμ(x)I(|x|μ). (3.11)

    Given (2.1) and (3.11), and that X,Xi are identically distributed, for any x>0 and 0<μ<1, we get

    V(|Xi|x)ˆE[φμ(Xix)]=ˆE[φμ(Xx)]V(|X|μx). (3.12)

    Without loss of generality, we assume that ˉσ=1. For nexp(Mϵ2)exp(40ϵ2), set bn:=ϵnlnn/20; from Proposition 2.1 (ii) and the condition that limcˆE(X(c))=0,

    ni=1|ˆEX(bn)i|=n|limcˆE(X(c))ˆEX(bn)|nlimcˆE|X(c)X(bn)|=nlimcˆE(|X|cbn)+nlimcˆE(|X|c)2bn=nˉσ2bn=20nϵlnnϵ2nlnn,forM40,nexp(Mϵ2).

    Using Lemma 2.1 for {X(bn)iˆEX(bn)i;1in}, and taking x=ϵnlnn/2 and y=2bn=ϵnlnn/10 in Lemma 2.1 (ⅰ), by Proposition 2.1 (ⅰ), ˆE(X(bn)iˆEX(bn)i)=0, and noting that |X(bn)iˆEX(bn)i|y, Bn=ni=1ˆE(X(bn)iˆEX(bn)i)24nˆE(X(bn)i)24n; combining this with (3.12) we get

    V(Snϵnlnn)V(ni=1(X(bn)iˆEX(bn)i)ϵnlnn/2)+ni=1V(|Xi|bn)exp(ϵ2nlnn4(ϵ2nlnn/20+4n){1+23lnϵ2nlnn80n})+nV(|X|μbn)c(ϵ2lnn)3+nV(|X|μϵnlnn/20)

    from ϵ2nlnn4(ϵ2nlnn/20+4n){1+23ln(1+ϵ2lnn80)}3ln(ϵ2lnn80).

    Since {X,Xi} also satisfies the (3.1), we can replace the {X,Xi} with {X,Xi} in the upper form

    V(Snϵnlnn)c(ϵ2lnn)3+nV(|X|μϵnlnn/20).

    Therefore

    V(|Sn|ϵnlnn)(ϵ2lnn)3+nV(|X|cϵnlnn).

    This implies the following from Markov's inequality and (2.5),

    I22(ϵ)+I23(ϵ)ϵ2+2δnAM,ϵlnδnn(nV(|X|cϵnlnn)+1ϵ6ln3n+ˆE|ξ|6ϵ6ln3n)ϵ2+2δAM,ϵlnδxV(|X|cϵxlnx)dx+cϵ4+2δAM,ϵdxxln3δxϵ2+2δMϵ12δyln1δyV(|X|cϵy)dy+cM2+δϵ2+2δMϵ1yV(|X|ϵy)dy+M2+δϵ2δ0zV(|X|z)dz+M2+δ=ϵ2δCV(X2)/2+M2+δ.

    Let ϵ0 first, then let M; we get

    limϵ0(I22(ϵ)+I23(ϵ))=0.

    Combining this with (3.10) and (3.6), (3.5) is established.

    Proof of the converse part of Theorem 3.1. If (3.2) holds for δ=1, then

    n=2lnnnV(|Sn|ϵnlnn)<foranyϵ>0. (3.13)

    Take ξ as defined by Lemma 2.2 (ˆE|ξ|< from (2.5)) and the bounded continuous function ψ such that I(x>qˆE|ξ|+1)ψ(x)I(x>qˆE|ξ|) for any fixed q>0. Therefore, for any ϵ>0,q>0 and nexp(qˆE|ξ|+1ϵ)2, according to (2.1), Lemma 2.2 and the Markov inequality, one has

    V(|Sn|ϵnlnn)V(|Sn|(qˆE|ξ|+1)n)ˆE(ψ(|Sn|n))ˆE(ψ(|ξ|))V(|ξ|>qˆE|ξ|)ˆE|ξ|qˆE|ξ|=1q.

    From the arbitrariness of q, letting q, we get the following for any ϵ>0,

    V(|Sn|ϵnlnn)0,n. (3.14)

    So, there is an n0 such that V(|Sn|ϵnlnn)<1/4 for nn0. Now for n2n0, if kn/2, then nkn/2n0, and, combining this with (2.1), (3.11) and (3.12), we get that,

    V(|SnSk|2ϵnlnn)ˆE(φ1/2(|SnSk|2ϵnlnn))=ˆE(φ1/2(|Snk|2ϵnlnn))V(|Snk|ϵ(nk)ln(nk))<1/2.

    Also, if n/2<kn, then n,kn/2n0; thus,

    V(|SnSk|2ϵnlnn)V(|Sn|ϵnlnn)+V(|Sk|ϵnlnnϵklnk)<1/2.

    Taking α=1/2,βn,k=0 in Lemma 2.3, for n2n0,

    V(maxkn|Sk|4ϵnlnn)V(|Sn|2ϵnlnn).

    Since maxkn|Xk|2maxkn|Sk|, it follows that for n2n0

    V(maxkn|Xk|8ϵnlnn)V(|Sn|2ϵnlnn). (3.15)

    Let Yk=φ8/9(Xk9ϵnlnn). Then,

    I(maxkn|Xk|8ϵnlnn)=1I(maxkn|Xk|<8ϵnlnn)=1nk=1I(|Xk|<8ϵnlnn)1nk=1(1Yk).

    Since {Xk;k1} is a sequence of i.i.d. random variables, {1Yk;k1} is also a sequence of i.i.d. random variables, and 1Yk0; given (2.1), (2.3) and ˆE(X)=ˆε(X), it can be concluded that,

    V(maxkn|Xk|8ϵnlnn)ˆE(1nk=1(1Yk))=1ˆε(nk=1(1Yk))=1nk=1ˆε(1Yk)=1nk=1(1ˆEYk)1nk=1eˆEYk=1enˆEY1enV(|X|9ϵnlnn)nV(|X|9ϵnlnn).

    Hence, by (3.15) and (3.13)

    n=2lnnV(|X|nlnn)<.

    On the other hand,

    n=2lnnV(|X|nlnn)2lnxV(|X|xlnx)dx2ln22yV(|X|y)dyCV(X2).

    Hence,

    CV(X2)<. (3.16)

    Next, we prove that limcˆE(X(c))=limcˆE((X)(c))=0. For c1>c2>0, by (2.2) and (3.16),

    |ˆE(±X)(c1)ˆE(±X)(c2)|ˆE|(±X)(c1)(±X)(c2)|=ˆE(|X|c1c2)+ˆE(|X|c1)2c2CV(X2)c21c2.

    This implies that

    limc1>c2|ˆE(±X)(c1)ˆE(±X)(c2)|=0.

    By the Cauchy criterion, limcˆE(X(c)) and limcˆE((X)(c)) exist and are finite. It follows that limcˆE(X(c))=limnˆE(X(n)):=a. So, for any ϵ>0, when n is large enough, |ˆE(X(n))a|<ϵ; by Proposition 2.1 (iii), Lemma 2.1 (ii), ˆE(X(n)k+ˆEX(n)k)24ˆE(X(n)k)24CV(X2) and (3.16),

    ν(Snn<a2ϵ)ν((Snn<a2ϵ,1kn,|Xk|n)(1kn,|Xk|>n))ν(nk=1X(n)k<(a2ϵ)n)+nk=1V(|Xk|>n)=ν(nk=1(X(n)k+ˆEX(n)k)>(2ϵa)n+nEX(n))+nk=1V(|Xk|>n)ν(nk=1(X(n)k+ˆEX(n)k)>ϵn)+nk=1V(|Xk|>n)nk=1ˆE(X(n)k+ˆEX(n)k)2n2+nk=1ˆE(|Xk|n)2n21n0,n.

    It is concluded that,

    limnV(Snna2ϵ)=1foranyϵ>0.

    If a>0, taking ϵ<a/2, then ϵ1:=a2ϵ>0, and

    limnV(|Sn|nϵ1)limnV(Snnϵ1)=1. (3.17)

    On the other hand, by (3.14),

    limnV(|Sn|nϵ1)limnV(|Sn|ϵ1nlnn)=0,

    which contradicts (3.17). It follows that a0. Similarly, we can prove that b:=limcˆE((X)(c))0. From (X)(c)=X(c) and

    0a+b=limc(ˆE(X(c))+ˆE(X(c)))limcˆE(X(c)X(c))=0,

    we conclude that a=b=0, i.e., limcˆE(X(c))=limcˆE((X)(c))=0. This completes the proof of Theorem 3.1.

    Proof of Theorem 3.2. Note that

    ϵ2n=31nlnnV(|Sn|ϵnlnlnn)=ϵ2n=31nlnnV(|ξ|ϵlnlnn)+ϵ2n=31nlnn(V(|Sn|ϵnlnlnn)V(|ξ|ϵlnlnn)):=J1(ϵ)+J2(ϵ).

    Hence, in order to establish (3.3), it suffices to prove that

    limϵ0J1(ϵ)=CV(ξ2) (3.18)

    and

    limϵ0J2(ϵ)=0. (3.19)

    Obviously, (3.18) follows from

    limϵ0J1(ϵ)=limϵ0ϵ231xlnxV(|ξ|ϵlnlnx)dx=limϵ0εlnln32yV(|ξ|y)dy(lety=εlnlnx)=02yV(|ξ|y)dy=CV(ξ2).

    Let M32; write BM,ϵ:=exp(exp(Mϵ2)).

    |J2(ϵ)|ϵ23n[BM,ϵ]1nlnn|V(|Sn|nϵlnlnn)V(|ξ|ϵlnlnn)|+ϵ2n>[BM,ϵ]1nlnnV(|Sn|ϵnlnlnn)+ϵ2n>[BM,ϵ]1nlnnV(|ξ|ϵlnlnn):=J21(ϵ)+J22(ϵ)+J23(ϵ). (3.20)

    Let us first estimate J21(ϵ). For any β>ϵ2,

    I21(ϵ)ϵ2BM,ϵ31xlnx|V(|S[x]|[x]ϵlnlnx)V(|ξ|ϵlnlnx)|dxϵ2Bβ,ϵ32xlnxdx+ϵ2BM,ϵBβ,ϵ1xlnxsupnBβ,ϵ|V(|Sn|nϵlnlnx)V(|ξ|ϵlnlnx)|dx2β+M02ysupnBβ,ϵ|V(|Sn|ny)F(y)|dy.

    Similar to (3.9) we have

    limϵ0M0ysupnBβ,ϵ|V(|Sn|ny)F(y)|dy=0.

    Therefore, let ϵ0 first, then let β0; we get

    limϵ0J21(ϵ)=0. (3.21)

    Next, we estimate that J22(ϵ). Without loss of generality, we still assume that ˉσ=1. For nexp(exp(Mϵ2))exp(exp(32ϵ2)), set an:=ϵnlnlnn/16; from Proposition 2.1 (ii) and the condition that limcˆE(X(c))=0,

    ni=1|ˆEX(an)i|=n|limcˆE(X(c))ˆEX(an)|nlimcˆE|X(c)X(an)|=nlimcˆE(|X|can)+nlimcˆE(|X|c)2an=nˉσ2an=16nϵlnlnnϵ2nlnlnn.

    Using Lemma 2.1 for {X(an)iˆEX(an)i;1in}, and taking x=ϵnlnlnn/2 and y=2an=ϵnlnlnn/8 in Lemma 2.1 (i), if we note that |X(an)iˆEX(an)i|y, and Bn4n, combined with (3.12) we get

    V(Snϵnlnlnn)V(ni=1(X(an)iˆEX(an)i)ϵnlnlnn/2)+ni=1V(|Xi|an)exp(ϵ2nlnlnn4(ϵ2nlnlnn/16+4n){1+23lnϵ2nlnlnn64n})+nV(|X|μan)c(ϵ2lnlnn)2+nV(|X|μϵnlnlnn/16)

    from ϵ2nlnlnn4(ϵ2nlnlnn/16+4n){1+23ln(1+ϵ2lnlnn64)}2ln(ϵ2lnlnn64).

    Since {X,Xi} also satisfies (3.1), we can replace the {X,Xi} with {X,Xi} in the upper form

    V(Snϵnlnlnn)c(ϵ2lnlnn)2+nV(|X|μϵnlnlnn/16).

    Therefore

    V(|Sn|ϵnlnlnn)(ϵ2lnlnn)2+nV(|X|cϵnlnlnn).

    This implies the following from Markov's inequality and (2.5):

    J22(ϵ)+J23(ϵ)ϵ2nBM,ϵ1nlnn(nV(|X|cϵnlnlnn)+1ϵ4(lnlnn)2+ˆE|ξ|4ϵ4(lnlnn)2)ϵ2BM,ϵV(|X|cϵxlnlnx)lnxdx+cϵ2BM,ϵdxxlnx(lnlnx)2ϵ2Mϵ1ylnylnlnyV(|X|cϵy)dy+cM1MzV(|X|z)dz+cM10,M.

    Hence

    limϵ0(J22(ϵ)+J23(ϵ))=0.

    Combining this with (3.20) and (3.21), (3.19) is established.

    Statistical modeling is one of the key and basic topics in statistical theory research and practical application research. Under the theoretical framework of traditional probability space, in order to infer the model, all statistical models must assume that the error (and therefore the response variable) follows a unique and deterministic probability distribution, that is, the distribution of the model is deterministic. However, complex data in the fields of economics, finance, and other fields often have inherent and non negligible probability and distribution uncertainties. The probability distribution of the response variables that need to be studied is uncertain and does not meet the assumptions of classical statistical modeling. Therefore, classical probability statistical modeling methods cannot be used to model these types of data. How to analyze and model uncertain random data has been an unresolved and challenging issue that has long plagued statisticians. Driven by uncertainty issues, Peng [13] established a theoretical framework for the sub-linear expectation space from the perspective of expectations, providing a powerful tool for analyzing uncertainty problems. The sub-linear expectation has a wide range of potential applications. In recent years, the limit theory for sub-linear expectation spaces has attracted much attention from statisticians, and a series of research results have been achieved. This article overcomes the problem of many traditional probability space tools and methods no longer being effective due to the non additivity of sub-linear expectations and capacity; it also demonstrates the development of sufficient and necessary conditions for the rate convergence of logarithmic laws in sub-linear expectation spaces.

    The authors declare that they have not used artificial intelligence tools in the creation of this article.

    This paper was supported by the National Natural Science Foundation of China (12061028) and Guangxi Colleges and Universities Key Laboratory of Applied Statistics.

    Regarding this article, the author claims no conflict of interest.



    [1] S. Soehodho, Public transportation development and traffic accident prevention in Indonesia, IATSS Res., 40 (2017), 76–80. https://doi.org/10.1016/j.iatssr.2016.05.001 doi: 10.1016/j.iatssr.2016.05.001
    [2] H. R. Al-Masaeid, A. A. Al-Mashakbeh, A. M. Qudah, Economic costs of traffic accidents in Jordan, Accident Anal. Prev., 31 (1999), 347–357. https://doi.org/10.1016/S0001-4575(98)00068-2 doi: 10.1016/S0001-4575(98)00068-2
    [3] T. Anjuman, S. Hasanat-E-Rabbi, C. K. A. Siddiqui, M. M. Hoque, Road traffic accident: A leading cause of the global burden of public health injuries and fatalities, In: Proceedings of the international conference on mechanical engineering 2007, Bangladesh, 2007.
    [4] A. A. Mohammed, K. Ambak, A. M. Mosa, D. Syamsunur, A review of traffic accidents and related practices worldwide, Open Transport. J., 13 (2019), 65–83. https://doi.org/10.2174/1874447801913010065 doi: 10.2174/1874447801913010065
    [5] R. Sakhapov, R. Nikolaeva, Traffic safety system management, Transport. Res. Procedia, 36 (2018), 676–681. https://doi.org/10.1016/j.trpro.2018.12.126 doi: 10.1016/j.trpro.2018.12.126
    [6] K. N. Qureshi, A. H. Abdullah, A survey on intelligent transportation systems, Middle East J. Sci. Res., 15 (2013), 629–642. https://doi.org/10.5829/idosi.mejsr.2013.15.5.11215 doi: 10.5829/idosi.mejsr.2013.15.5.11215
    [7] B. Lim, S. Zohren, Time-series forecasting with deep learning: A survey, Phil. Trans. R. Soc. A., 379 (2021), 20200209. https://doi.org/10.1098/rsta.2020.0209 doi: 10.1098/rsta.2020.0209
    [8] A. Csikós, Z. J. Viharos, K. B. Kis, T. Tettamanti, I. Varga, Traffic speed prediction method for urban networks–An ANN approach, In: 2015 International conference on models and technologies for intelligent transportation systems (MT-ITS), 2015,102–108. https://doi.org/10.1109/MTITS.2015.7223243
    [9] M. Y. Çodur, A. Tortum, An artificial neural network model for highway accident prediction: A case study of Erzurum, Turkey, Promet, 27 (2015), 217–225. https://doi.org/10.7307/ptt.v27i3.1551 doi: 10.7307/ptt.v27i3.1551
    [10] S. Alkheder, M. Taamneh, S. Taamneh, Severity prediction of traffic accident using an artificial neural network, J. Forecast., 36 (2017), 100–108. https://doi.org/10.1002/for.2425 doi: 10.1002/for.2425
    [11] Z. Sheng, H. Wang, G. Chen, B. Zhou, J. Sun, Convolutional residual network to short-term load forecasting, Appl. Intell., 51 (2021), 2485–2499. https://doi.org/10.1007/s10489-020-01932-9 doi: 10.1007/s10489-020-01932-9
    [12] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Comput., 9 (1997), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735 doi: 10.1162/neco.1997.9.8.1735
    [13] M. Zheng, T. Li, R. Zhu, J. Chen, Z. Ma, M. Tang, et al., Traffic accident's severity prediction: A deep-learning approach-based CNN network, IEEE Access, 7 (2019), 39897–39910. https://doi.org/10.1109/ACCESS.2019.2903319 doi: 10.1109/ACCESS.2019.2903319
    [14] D. Yang, S. Li, Z. Peng, P. Wang, J. Wang, H. Yang, MF-CNN: Traffic flow prediction using convolutional neural network and multi-features fusion, IEICE Trans. Inf. Syst., 102 (2019), 1526–1536. https://doi.org/10.1587/transinf.2018EDP7330 doi: 10.1587/transinf.2018EDP7330
    [15] Z. Zhang, W. Yang, S. Wushour, Traffic accident prediction based on LSTM-GBRT model, J. Control Sci. Eng., 2020 (2020), 4206919. https://doi.org/10.1155/2020/4206919 doi: 10.1155/2020/4206919
    [16] W. Liyong, P. Vateekul, Improve traffic prediction using accident embedding on ensemble deep neural networks, In: 2019 11th International conference on knowledge and smart technology (KST), 2019, 11–16. https://doi.org/10.1109/KST.2019.8687542
    [17] S. Uğuz, E. Büyükgökoğlan, A hybrid CNN-LSTM model for traffic accident frequency forecasting during the tourist season, Teh. Vjesn., 29 (2022), 2083–2089. https://doi.org/10.17559/TV-20220225141756 doi: 10.17559/TV-20220225141756
    [18] X. B. Jin, Z. Y. Wang, W. T. Gong, J. L. Kong, Y. T. Bai, T. L. Su, et al., Variational bayesian network with information interpretability filtering for air quality forecasting, Mathematics, 11 (2023), 837. https://doi.org/10.3390/math11040837 doi: 10.3390/math11040837
    [19] Z. Shi, Y. Bai, X. Jin, X. Wang, T. Su, J. Kong, Parallel deep prediction with covariance intersection fusion on non-stationary time series, Knowl. Based Syst., 211 (2021), 106523. https://doi.org/10.1016/j.knosys.2020.106523 doi: 10.1016/j.knosys.2020.106523
    [20] X. B. Jin, Z. Y. Wang, J. L. Kong, Y. T. Bai, T. L. Su, H. J. Ma, et al., Deep spatio-temporal graph network with self-optimization for air quality prediction, Entropy, 25 (2023), 247. https://doi.org/10.3390/e25020247 doi: 10.3390/e25020247
    [21] W. Jiang, J. Luo, Graph neural network for traffic forecasting: A survey, Expert Syst. Appl., 207 (2022), 117921. https://doi.org/10.1016/j.eswa.2022.117921 doi: 10.1016/j.eswa.2022.117921
    [22] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, et al., Attention is all you need, In: Advances in neural information processing systems, 30 (2017).
    [23] I. Sutskever, O. Vinyals, Q. V. Le, Sequence to sequence learning with neural networks, In: Advances in neural information processing systems, 27 (2014). https://doi.org/10.48550/arXiv.1409.3215
    [24] P. M. Nadkarni, L. Ohno-Machado, W. W. Chapman, Natural language processing: An introduction, J. Amer. Med. Inform. Assoc., 18 (2011), 544–551. https://doi.org/10.1136/amiajnl-2011-000464 doi: 10.1136/amiajnl-2011-000464
    [25] A. Voulodimos, N. Doulamis, A. Doulamis, E. Protopapadakis, Deep learning for computer vision: A brief review, Comput. Intell. Neurosci., 2018 (2018), 7068349. https://doi.org/10.1155/2018/7068349 doi: 10.1155/2018/7068349
    [26] Q. Wen, T. Zhou, C. Zhang, W. Chen, Z. Ma, J. Yan, et al., Transformers in time series: A survey, arXiv: 2202.07125, 2022. https://doi.org/10.48550/arXiv.2202.07125
    [27] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, et al., An image is worth 16x16 words: Transformers for image recognition at scale, arXiv: 2010.11929, 2020. https://doi.org/10.48550/arXiv.2010.11929
    [28] H. Yin, Z. Guo, X. Zhang, J. Chen, Y. Zhang, RR-Former: Rainfall-runoff modeling based on Transformer, J. Hydrology, 609 (2022), 127781. https://doi.org/10.1016/j.jhydrol.2022.127781 doi: 10.1016/j.jhydrol.2022.127781
    [29] G. Zheng, W. K. Chai, J. Zhang, V. Katos, VDGCNeT: A novel network-wide virtual dynamic graph convolution neural network and Transformer-based traffic prediction model, Knowl. Based Syst., 275 (2023), 110676. https://doi.org/10.1016/j.knosys.2023.110676 doi: 10.1016/j.knosys.2023.110676
    [30] Z. Sheng, S. Wen, Z. K. Feng, J. Gong, K. Shi, Z. Guo, et al., A survey on data-driven runoff forecasting models based on neural networks, IEEE Trans. Emerg. Top. Comput. Intell., 7 (2023), 1083–1097. https://doi.org/10.1109/TETCI.2023.3259434 doi: 10.1109/TETCI.2023.3259434
    [31] Z. Li, F. Liu, W. Yang, S. Peng, J. Zhou, A survey of convolutional neural networks: Analysis, applications, and prospects, IEEE Trans. Neural Netw. Learn. Syst., 33 (2021), 6999–7019. https://doi.org/10.1109/TNNLS.2021.3084827 doi: 10.1109/TNNLS.2021.3084827
    [32] Y. Yu, X. Si, C. Hu, J. Zhang, A review of recurrent neural networks: LSTM cells and network architectures, Neural Comput., 31 (2019), 1235–1270. https://doi.org/10.1162/neco_a_01199 doi: 10.1162/neco_a_01199
    [33] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), 2016,770–778. https://doi.org/10.1109/CVPR.2016.90
    [34] J. L. Ba, J. R. Kiros, G. E. Hinton, Layer normalization, arXiv: 1607.06450, 2016. https://doi.org/10.48550/arXiv.1607.06450
    [35] A. F. Agarap, Deep learning using rectified linear units (relu), arXiv: 1803.08375, 2018. https://doi.org/10.48550/arXiv.1803.08375
    [36] D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv: 1412.6980, 2014. https://doi.org/10.48550/arXiv.1412.6980
  • This article has been cited by:

    1. Mengmei Xi, Fei Zhang, Xuejun Wang, Complete Moment Convergence and Lq Convergence for AANA Random Variables Under Sub-linear Expectations, 2024, 47, 0126-6705, 10.1007/s40840-023-01636-6
  • Reader Comments
  • © 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(2183) PDF downloads(154) Cited by(3)

Figures and Tables

Figures(5)  /  Tables(3)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog