Processing math: 58%
Research article

Refined estimates and generalization of some recent results with applications

  • Received: 31 May 2021 Accepted: 06 July 2021 Published: 23 July 2021
  • MSC : 26A15, 26A51, 26D10, 26D15

  • In this paper, we firstly give improvement of Hermite-Hadamard type and Fejˊer type inequalities. Next, we extend Hermite-Hadamard type and Fejˊer types inequalities to a new class of functions. Further, we give bounds for newly defined class of functions and finally presents refined estimates of some already proved results. Furthermore, we obtain some new discrete inequalities for univariate harmonic convex functions on linear spaces related to a variant most recently presented by Baloch et al. of Jensen-type result that was established by S. S. Dragomir.

    Citation: Aqeel Ahmad Mughal, Deeba Afzal, Thabet Abdeljawad, Aiman Mukheimer, Imran Abbas Baloch. Refined estimates and generalization of some recent results with applications[J]. AIMS Mathematics, 2021, 6(10): 10728-10741. doi: 10.3934/math.2021623

    Related Papers:

    [1] Min Han, Bin Pei . An averaging principle for stochastic evolution equations with jumps and random time delays. AIMS Mathematics, 2021, 6(1): 39-51. doi: 10.3934/math.2021003
    [2] Biao Liu, Meiling Zhao . Synchronization and fluctuation of a stochastic coupled systems with additive noise. AIMS Mathematics, 2023, 8(4): 9352-9364. doi: 10.3934/math.2023470
    [3] Meijiao Wang, Qiuhong Shi, Maoning Tang, Qingxin Meng . Stochastic differential equations in infinite dimensional Hilbert space and its optimal control problem with Lévy processes. AIMS Mathematics, 2022, 7(2): 2427-2455. doi: 10.3934/math.2022137
    [4] Yueli Huang, Jin-E Zhang . Asymptotic stability of impulsive stochastic switched system with double state-dependent delays and application to neural networks and neural network-based lecture skills assessment of normal students. AIMS Mathematics, 2024, 9(1): 178-204. doi: 10.3934/math.2024011
    [5] Yifei Wang, Haibo Gu, Ruya An . Averaging principle for space-fractional stochastic partial differential equations driven by Lévy white noise and fractional Brownian motion. AIMS Mathematics, 2025, 10(4): 9013-9033. doi: 10.3934/math.2025414
    [6] Hui Sun, Zhongyang Sun, Ya Huang . Equilibrium investment and risk control for an insurer with non-Markovian regime-switching and no-shorting constraints. AIMS Mathematics, 2020, 5(6): 6996-7013. doi: 10.3934/math.2020449
    [7] Zhengqi Zhang, Huaiqin Wu . Cluster synchronization in finite/fixed time for semi-Markovian switching T-S fuzzy complex dynamical networks with discontinuous dynamic nodes. AIMS Mathematics, 2022, 7(7): 11942-11971. doi: 10.3934/math.2022666
    [8] Dennis Llemit, Jose Maria Escaner IV . Value functions in a regime switching jump diffusion with delay market model. AIMS Mathematics, 2021, 6(10): 11595-11609. doi: 10.3934/math.2021673
    [9] Xin Liu, Yan Wang . Averaging principle on infinite intervals for stochastic ordinary differential equations with Lévy noise. AIMS Mathematics, 2021, 6(5): 5316-5350. doi: 10.3934/math.2021314
    [10] Hamidou Tembine . Mean-field-type games. AIMS Mathematics, 2017, 2(4): 706-735. doi: 10.3934/Math.2017.4.706
  • In this paper, we firstly give improvement of Hermite-Hadamard type and Fejˊer type inequalities. Next, we extend Hermite-Hadamard type and Fejˊer types inequalities to a new class of functions. Further, we give bounds for newly defined class of functions and finally presents refined estimates of some already proved results. Furthermore, we obtain some new discrete inequalities for univariate harmonic convex functions on linear spaces related to a variant most recently presented by Baloch et al. of Jensen-type result that was established by S. S. Dragomir.



    The phenomenon of two timescales is commonly found in complex systems, appearing in fields such as materials science, chemistry, fluid dynamics, control engineering, biology, ecology, financial economics, climate dynamics, and other applications. For example, see [1] and the references therein. In classical gene expression models, Messenger Ribonucleic Acid (mRNA) molecules are produced from Deoxyribo Nucleic Acid (DNA) through the transcription process, while protein molecules are generated from mRNA through the translation process. Both types of molecules are subject to degradation; however, the kinetic behavior of proteins is much slower than that of mRNA. Proteins can exist for several weeks, whereas mRNA may only last a few minutes. The processes by which protein molecules acquire their functional structures and conformations also exhibit different timescales, with the vibrational timescale of covalent bonds on the order of femtoseconds (1015 seconds), while protein folding likely occurs on the order of seconds. When mathematically characterizing these phenomena that span different timescales, it is often necessary to introduce fast-varying and slow-varying processes, thereby forming a two-timescale system. Due to the coupling between the fast and slow processes, directly addressing the original system is often extremely challenging. A common approach is to average the fast variables in the slow-varying equations to obtain an averaged equation, which no longer depends on the fast-varying processes. This averaged equation can then serve as a bridge to designing feasible processes to address the original system. For instance, reference [2] introduced a reduction method grounded in chemical Langevin equations with two timescales, utilizing the stochastic averaging principle to derive a limit averaging system. This limit averaging system serves as an approximation for the slow-reacting process. This reduction method not only significantly enhances computational speed during numerical simulations, but also provides accurate error bounds.

    In recent years, the averaging principles for stochastic systems with regime-switching involving two timescales have garnered considerable attention from scholars. This interest arises from the fact that in control engineering, finance, biology, and information transmission, the current state of a system is influenced not only by intrinsic uncertainty factors but also by random factors in the external environment. When both influences occur simultaneously, traditional stochastic differential equations or stochastic functional differential equations are insufficient to characterize such systems. To effectively express the impacts of both internal and external factors, researchers have introduced stochastic systems with switching. A significant feature of these systems is the coexistence of discrete events and continuous dynamics, which interact differently across various models. This characteristic yields results that differ from those of traditional stochastic differential equations. For example, reference [3] discussed the stability of stochastic systems with regime-switching, while reference [4] explored numerical methods for these systems. Furthermore, for the stochastic models with regime-switching mentioned above, when drastic changes in the external environment lead to a significant disparity in the frequency of changes both inside and outside the system, it becomes necessary to introduce a two-timescale structure to describe this phenomenon. Yin and his collaborators established a comprehensive asymptotic expansion theory related to nonhomogeneous Markov chains and their generators under various conditions in reference [5], obtaining stationary distributions and convergence rates while investigating the central limit theorem for occupation measures. Building on this theoretical foundation, the reference [6] examined the long-term behavior and stochastic persistence properties of population models driven by rapidly switching Markov chains. Notably, in the aforementioned two-timescale models, the rapidly switching Markov chains do not depend on the slow-varying process. However, in practical applications, fluctuations in the external environment affect the internal system, and, conversely, changes within the internal system can influence the external environment's development. Consequently, state-dependent regime-switching models have attracted significant scholarly interest. Generally speaking, for Markov chains that do not depend on the system state, they can be treated as exogenous noise. The challenge in dealing with regime-switching models that depend on the current state of the system lies in the coupling relationship between regime-switching and continuous states. Reference [7] used weak convergence and martingale methods to prove numerical methods for stochastic differential equations with state-dependent switching; reference [8] investigated small perturbation large deviation for diffusion systems with state-dependent rapid switching, where the diffusion coefficients may be degenerate; reference [9] established an asymptotic expansion theory for state-dependent switching diffusion systems, thereby deriving averaging principles. It is significant to point out that these studies have only considered regime-switching dependent on the current state of the system. Recently, references [10,11,12] proposed diffusion systems with regime-switching dependent on the historical state of the system and explored recurrence and ergodicity. To the best of our knowledge, research on stochastic differential equations with past-dependent switching is still in its early stages. Therefore, issues such as averaging principle, numerical computation, numerical simulation, and stability analysis of this model are all worthy of consideration. This paper mainly aims to introduce two time scales into the model and establish the corresponding averaging principle, thereby providing a theoretical foundation for further research on the model.

    To date, there are four main methods for establishing averaging principles in two-timescale models. The first method is based on asymptotic expansion techniques of partial differential equations. It begins by demonstrating that the density function of the solution to the original system satisfies a Fokker-Planck-Kolmogorov equation (FPK equation) with singular perturbation under suitable conditions. Next, an asymptotic expansion is performed on this equation, and by taking the limit of the expansion, one obtains the limit function of the density function, which remains a valid density function. Finally, integrating the limit density function with respect to the stationary density function of the fast-varying process yields a density function that satisfies an FPK equation. The corresponding stochastic differential equation for this FPK equation is the limit equation for the slow-varying process in the original system. It should be noted that this method requires strong smoothness conditions on the coefficients of the original system, with more detailed explanations available in reference [13]. The second method is based on certain properties and estimates of the solutions to the Poisson equation defined on the entire space, including the existence and uniqueness of solutions, growth estimates, and estimates of the growth of partial derivatives. The foundational theory for this approach was established by Pardoux [14,15,16]. Subsequently, a substantial amount of literature has utilized methods involving the Poisson equation to obtain richer results, such as mentioned in [17,18]. In particular, one of the advantages of this method is its ability to determine the convergence rate of the slow-varying process. The third method is the perturbation test function method, which was first proposed by Khasminskii in the 1960s [19] and later developed and refined by Kushner [20]. The fourth method is based on the technique of time discretization. If the diffusion coefficient of the slow-varying equation does not depend on the fast-varying process, one can seek the strong convergence (in the Lp sense) limit of the slow-varying equation. More detailed explanations can be found in reference [21]. If the diffusion coefficient of the slow-varying process depends on the fast-varying process, an example in [22] illustrates that the slow-varying process no longer possesses a strong convergence limit; in this case, one can only look for its weak convergence limit using time discretization techniques. Detailed discussions can be found in reference [20].

    In summary, this paper aims to develop an averaging principle for stochastic functional differential equations with past-dependent switching that incorporate two time scales. To this end, this paper employs the aforementioned time discretization technique and is organized as follows. In the next section, we first provide definitions and assumptions of the model presented in this paper and prove that the switching process can be represented as a stochastic differential equation with respect to the Poisson random measure. Based on this, we can obtain the existence and uniqueness of the solution as well as the moment estimation. Section 3 mainly studies the interaction between fast-varying and slow-varying processes. Section 4 demonstrates through the coupling method that the exponential ergodicity of the frozen Markov chain does not depend on the fixed parameter, and obtains the Lipschitz continuity of invariant measures with respect to fixed parameters. The tightness of the slow-varying process Xε(t) and the moment estimation of the segment process are discussed in Section 5. Based on this preparatory work, Section 5 presents the core Theorem 5.5 of this article, which illustrates the main limit theorem in the sense of weak convergence. Finally, Section 6 provides several different types of examples to illustrate the results of this article.

    This paper considers the two-component process (Xε(t),αε(t)) where Xε(t) satisfies

    dXε(t)=b(Xεt,αε(t))dt+σ(Xεt,αε(t))dW(t), (2.1)

    z defines the transpose of z, C([a,b];Rn) denotes the family of continuous function ν from [a,b] to Rd with the norm

    ||ν||=supatb|ν(t)|,

    τ denotes the delay length,

    b(,)=(b1,b2,,bn):C([τ,0];Rn)×SRn,σ(,)=[σij]n×q:C([τ,0];Rn)×SRn×q,

    αε(t) is a pure jump process taking value in

    S={1,2,,N},

    and the set of positive integers with a finite N and W(t) is a standard Brownian motion defined on the complete probability space (Ω,F,(Ft)t0,P), taking values in Rq and independent of αε(t). We assume that the switching intensity of αε(t) depends on the segment process Xεt, that is,

    P(αε(t+δ)=j|αε(t)=i,Xεs,αε(s),st)=1εqij(Xεt)δ+o(δ),   if   ij, (2.2a)
    P(αε(t+δ)=i|αε(t)=i,Xεs,αε(s),st)=1+1εqii(Xεt)δ+o(δ), (2.2b)

    for δ>0, i,jS and ε>0 is a small positive parameter. For convenience of notation, we define

    qi=qii, iS.

    The main objective of this paper is to establish the averaging principle for the aforementioned model. The highlights and major contributions of this paper are reflected in the subsequent key aspects:

    (1) To the best of our knowledge, this paper establishes for the first time the averaging principle for stochastic functional differential equations with past-dependent switching involving two timescales. Since the diffusion coefficient depends on the fast-varying process, the counterexample mentioned in the introduction indicates that this model does not possess a strong convergence limit. Therefore, this paper employs weak convergence methods and martingale methods to address this difficulty.

    (2) Since Xε(t) and αε(t) depend on Xεt, the existence and uniqueness of solutions, the interaction between the fast-varying and slow-varying processes, and the invariant measure from classical literature are no longer applicable. Therefore, this article utilizes the method in [23,24], which represents the switching process as a stochastic differential equation with respect to the Poisson random measure. The advantage of doing so is that we can apply techniques related to stochastic differential equations to the switching process. Furthermore, based on this, we obtain the interaction between fast-varying and slow-varying processes, which will be repeatedly used in the martingale method. At the same time, this article also discusses the moment estimation of the segment process, obtaining an order that is sufficiently close to half, which will be used in the implementation of martingale methods together with inequality (3.3) to form control over the estimation term of martingales. In the following, the assumptions used in this paper are presented along with some explanations.

    The following are the assumptions that will be used in this paper. Throughout this paper, K denotes a generic positive constant, whose value may change for different usage, so

    K+K=KandKK=K

    are understood in an appropriate sense. Kβ represents the generic constant depending on parameters β. Let us begin by introducing some conditions on the two-timescale system (Xε(t),αε(t)), which will be used throughout this article.

    (A1) Assume that the initial value

    Xε0=ξC([τ,0];Rn)

    is nonrandom and satisfies the Lipschitz property, and the initial value

    αε(0)=i0S

    is independent of ε.

    (A2) For any ϕ1,ϕ2C([τ,0];Rn) and any iS, there exists a positive constant L1 such that

    |b(ϕ1,i)b(ϕ2,i)|2|σ(ϕ1,i)σ(ϕ2,i)|2L1(||ϕ1ϕ2||2). (2.3)

    (A3) For each ϕC([τ,0];Rn),

    Q(ϕ)=(qij(ϕ))i,jS

    is a conservative transition rate matrix.

    (A4) Assume that

    βij:=infϕC([τ,0],Rn)qij(ϕ)>0

    for any i,jS.

    (A5) Assume that

    M:=supiSjS,jisupϕC([τ,0],Rn)qij(ϕ)<.

    (A6) For any ϕ1,ϕ2C([τ,0];Rn), there exists a constant K>0 such that

    ||Q(ϕ1)Q(ϕ2)||l1:=supiSji|qij(ϕ1)qij(ϕ2)|K||ϕ1ϕ2||.

    Owing to the fact that b(,i) and σ(,i) are independent of t, the condition (A2) yields the linear growth condition, that is, for any iS and any ϕC([τ,0];Rn),

    |b(ϕ,i)||σ(ϕ,i)|K(1+||ϕ||). (2.4)

    Thanks to the condition (A4) and the finite state space, each state has a strictly positive arrival probability. According to the definition of irreducibility, we can conclude that

    Q(ϕ)=(qij(ϕ))i,jS

    is a irreducible transition rate matrix for each ϕC([τ,0];Rn). To proceed, we construct αε(t) as the solution to a stochastic differential equation with respect to a Poisson random measure. For ϕC([τ,0],Rn) and ε>0, let {Γεij(ϕ),i,jS} be a family of consecutive left-closed and right-open intervals on the half-line, each of length qij(ϕ)/ε, that is

    Γε12(ϕ)=[0,q12(ϕ)ε),,Γε1N(ϕ)=[N1l=2q1l(ϕ)ε,q1(ϕ)ε),Γεij(ϕ)=[i1l=1ql(ϕ)ε+j1l=1qil(ϕ)ε,i1l=1ql(ϕ)ε+jl=1qil(ϕ)ε),  i,jS,i1,ji,ΓεN(N1)(ϕ)=[N1l=1ql(ϕ)ε+N1l=1qil(ϕ)ε,N1l=1ql(ϕ)ε+Nl=1qil(ϕ)ε).

    For convenience of notation, we set

    Γεii(ϕ)=andΓεij(ϕ)=

    if

    qij(ϕ)=0.

    Before stating the construction of a Poisson random measure, we give the following lemma to facilitate a subsequent proof.

    Lemma 2.1. Assume that the condition (A5) holds. For any ϕ1,ϕ2C([τ,0];Rn) and any i,jS,

    m(Γεij(ϕ1)ΔΓεij(ϕ2))2Nε||Q(ϕ1)Q(ϕ2)||l1,

    where m denotes the Lebesgue measure on R.

    Proof. According to the definition of {Γεij(ϕ),i,jS},

    m(Γεij(ϕ1)ΔΓεij(ϕ2))1ε|i1k=1qk(ϕ1)+j1k=1,kiqik(ϕ1)i1k=1qk(ϕ2)j1k=1,kiqik(ϕ2)|+1ε|i1k=1qk(ϕ1)+jk=1,kiqik(ϕ1)i1k=1qk(ϕ2)jk=1,kiqik(ϕ2)|2εi1k=1|qk(ϕ1)qk(ϕ2)|+2εjk=1,ki|qik(ϕ1)qik(ϕ2)|2εi1k=1Nl=1,lk|qkl(ϕ1)qkl(ϕ2)|+2εjk=1,ki|qik(ϕ1)qik(ϕ2)|2Nε||Q(ϕ1)Q(ϕ2)||l1.

    This completes the proof.

    Next, we provide an explicit construction of the Poisson random measure as in [25, p.42] or [26, p.26]. Under condition (A5), we set

    Hε=N(N1)M/ε

    as an upper bound on the total length of {Γεij(ϕ),i,jS} for fixed ε>0. Let ξεl,l=1,2, be a sequence of random variables on [0,Hε] with

    P(ξεldx)=m(dx)Hε (2.5)

    and τεk,k=1,2, be nonnegative random variables such that

    P(τεk>t)=exp(Hεt). (2.6)

    Suppose that {ξεl,τεk}l,k1 are all mutually independent. Set

    ζε1=τε1,,   ζεk=τε1+τε2++τεk,   kN.

    Put

    Dεpε=kN{ζεk}

    and

    pε(ζεk)=ξεk,   kN.

    Correspondingly, introduce a counting measure as follows:

    Nεpε((0,t]×A)=#{sDεpε:0<st,pε(s)A},  t>0,AB([0,)),

    where # means the number of counting in the {}. Then, {pε(t)}t0 is a Poisson point process that satisfies the jump height ξεk at the jump time ζεk, and its corresponding Poisson random measure is Nεpε(dt,dz) with intensity measure dt×m(dz), which is independent of {W(t)}t0. Define

    Vε:C([τ,0];Rn)×Z+×R

    by

    Vε(ϕ,i,z)=jS,ji(ji)I{Γεij(ϕ)}(z)

    and the systems of equations

    {dXε(t)=b(Xεt,αε(t))dt+σ(Xεt,αε(t))dW(t),dαε(t)=RVε(Xεt,αε(t),z)Nεpε(dt,dz). (2.7)

    If the above systems of equations has a solution, the solution (Xε(t),αε(t)) to (2.7) satisfies Eqs (2.1) and (2.2). In fact, we only need to demonstrate that the solution satisfies (2.2). Due to the property of independent increment of the Poisson random measure, for any AB([0,)) and t,δ>0,

    P(Nεpε((t,t+δ]×A)2)=1eδm(A)δm(A)eδm(A)=o(δ)

    and from

    dαε(t)=RVε(Xεt,α(t),z)Nεpε(dt,dz),

    we have

    αε(t+δ)αε(t)=tst+δ,sDεpεjS(jαε(s))IΓεαε(s)j(Xεs)(pε(s)). (2.8)

    In particular, if αε(t)=i,

    αε(ζε,t1)=i+jS(ji)IΓεij(Xεζε,t1)(pε(ζε,t1))=i+jS(ji)IΓεij(Xεζε,t1)(ξε,t1), (2.9)

    where ζε,t1 and ξε,t1 denote the first jump time and jump height of pε(t) after time t. For ji and ϕC([τ,0];Rn),

    P(αε(t+δ)=j|αε(t)=i,Xεt=ϕ)=P(αε(t+δ)=j,Nεpε((t,t+δ]×[0,Hε])=1|αε(t)=i,Xεt=ϕ)+P(αε(t+δ)=j,Nεpε((t,t+δ]×[0,Hε])2|αε(t)=i,Xεt=ϕ)=P(pε(ζε1)Γεij(ϕ),Nεpε((t,t+δ]×[0,Hε])=1|αε(t)=i,Xεt=ϕ)+o(δ)=1εeqij(ϕ)εδqij(ϕ)δ+o(δ)=1εqij(ϕ)δ+o(δ).

    For j=i and ϕC([τ,0];Rn),

    P(αε(t+δ)=i|αε(t)=i,Xεt=ϕ)=P(αε(t+δ)=i,Nεpε((t,t+δ]×[0,Hε])=0|αε(t)=i,Xεt=ϕ)+P(αε(t+δ)=i,Nεpε((t,t+δ]×[0,Hε])=1|αε(t)=i,Xεt=ϕ)+P(αε(t+δ)=i,Nεpε((t,t+δ]×[0,Hε])2|αε(t)=i,Xεt=ϕ)=eqii(ϕ)εδ+o(δ)=1+1εqii(ϕ)δ+o(δ).

    Using (2.7), the following theorem gives the existence and uniqueness of solution and the moment estimation which is independent of the small parameter ε.

    Theorem 2.2. Suppose that (A1)(A6) hold. Then, for ε>0 and any initial value

    Xε0=ξ(t)C([τ,0];Rn)

    and

    αε(0)=i0S,

    there exists a unique global strong solution (Xε(t),αε(t)) of (2.1) and (2.2). Moreover, for every T>0, there exists a positive constant KT such that

    sup0<ε<1E(supτtT|Xε(t)|4)KT. (2.10)

    We prepare a lemma to prove this theorem (see [27, Theorem 2.2]).

    Lemma 2.3. Assume that (A2) holds. Then, for any iS, there exists a unique gobal solution X(t) to the following equation:

    dX(t)=b(Xt,i)dt+σ(Xt,i)dW(t)

    with initial time t0 and initial value

    Xt0=ϕC([τ,0];Rn).

    Moreover, the initial time t0 can be a random variable provided that it is a stopping time.

    Remark 2.4. For fixed state iS, under the global Lipschitz condition, the Picard iterative sequence can approximate a unique solution X(t) ([27, Theorem 2.2]). When the initial time is a stopping time, it does not affect the techniques used in the proof, such as martingales isometry ([28, Remark 3.10]). This proof is omitted here.

    Proof of Theorem 2.2. To construct the solution to (2.1) and (2.2) with initial value (ξ,i0) for fixed ε>0, we use the interlacing procedure similar to [11, Theorem 3.1] or [28, Theorem 3.13]. Let ˜X0,ε(t),t0 be the solution to

    d˜X0,ε(t)=b(˜X0,εt,i0)dt+σ(˜X0,εt,i0)dW(t)

    with initial value

    ˜X0,ε(t)=ξ(t),   t[τ,0].

    Here, for the sake of uniformity of notation, we superscript ˜X0,ε(t) with ε, which is virtually not related to ε. Let

    σε1=inf{t>0:t0RVε(˜X0,εs,i0,z)Nεpε(ds,dz)0}

    and

    i1=i0+σε10RVε(˜X0,εs,i0,z)Nεpε(ds,dz).

    To proceed, let ˜X1,ε(t),tσε1 be the solution to

    d˜X1,ε(t)=b(˜X1,εt,i1)dt+σ(˜X1,εt,i1)dW(t)

    with initial value

    ˜X1,ε(t)=˜X0,ε(t),   t[σε1τ,σε1].

    Define

    σε2=inf{t>σε1:t0RVε(˜X1,εs,i1,z)Nεpε(ds,dz)0}

    and

    i2=i1+σε2σε1RVε(˜X1,εs,i0,z)Nεpε(ds,dz).

    For the convenience of notation, set

    σε0=0.

    When we have already defined ˜Xm2,ε(t) on [σεm2τ,σεm1], m2, let ˜Xm1,ε(t),tσεm1 be the solution to

    d˜Xm1,ε(t)=b(˜Xm1,εt,im1)dt+σ(˜Xm1,εt,im1)dW(t)

    with initial value

    ˜Xm1,ε(t)=˜Xm2,ε(t),   t[σεm1τ,σεm1].

    Define

    σεm=inf{t>σεm1:t0RVε(˜Xm1,εs,im1,z)Nεpε(ds,dz)0}

    and

    im=im1+σεmσεm1RVε(˜Xm1,εs,im1,z)Nεpε(ds,dz). (2.11)

    Clearly, continuing this procedure, we can construct a process

    Xε(t)=˜Xm,ε(t),αε(t)=im,

    when

    σεmtσεm+1,   m0,

    which satisfies that

    Xε(t)=ξ(t),   t[τ,0]

    and that

    {Xε(tσεm)=ξ(0)+tσεm0b(Xεs,αε(s))ds+tσεm0σ(Xεs,αε(s))dW(s),αε(tσεm)=i0+tσεm0RVε(Xεs,αε(s),z)Nεpε(ds,dz).

    Define

    σε=limmσεm.

    To imply that (Xε(t),αε(t)) is the unique global solution, it is only necessary to obtain

    σε=.

    For any T>0, one has

    P(σεmT)=P(σεmT0RI{z[0,Nj=1qj(Xεs)/ε]}Nεpε(ds,dz)=m)P(T0RI{z[0,Hε]}Np(ds,dz)m)=k=meHεT(HεT)kk!,

    which implies that

    P(σεmT)0,

    as m. It follows that

    σε=.

    The uniqueness of ˜Xm,ε(t) and the uniqueness of im defined by (2.11) on [σεm,σεm+1] can derive the uniqueness of (Xε(t),αε(t)). Finally, for the resulting solution, it can be given that for every t[0,T]

    E(sup0st|Xε(s)|4)K[||ξ||4+E(sup0st|s0b(Xεr,αε(r))dr|4)+E(sup0st|s0σ(Xεr,αε(r))dW(r)|4)]K[||ξ||4+E(s0|b(Xεr,αε(r))|4dr)+E(s0|σ(Xεr,αε(r))|4dr)]K||ξ||4+Kt0[1+E(supτur|Xε(u)|4)]dr.

    It is easy to observe that

    E(supτst|Xε(s)|4)||ξ||4+E(sup0st|Xε(s)|4).

    This, together with Gronwall inequality, yields that

    E(supτst|Xε(s)|4)KT.

    Letting t=T concludes the proof.

    Applying (2.7), consider the solution (Xε(t),α1,ε(t)) and (Yε(t),α2,ε(t)) respectively to the following systems of equations:

    {dXε(t)=b1(Xεt,α1,ε(t))dt+σ1(Xεt,α1,ε(t))dW(t),dα1,ε(t)=RVε(Xεt,α1,ε(t),z)Nεpε(dt,dz),X0=ϕ1,  α1,ε(0)=i0S, (3.1)

    and

    {dYε(t)=b2(Yεt,α2,ε(t))dt+σ2(Yεt,α2,ε(t))dW(t),dα2,ε(t)=RVε(Yεt,α2,ε(t),z)Nεpε(dt,dz),Y0=ϕ2,  α2,ε(0)=i0S, (3.2)

    where we assume that ϕ1,ϕ2C([τ,0];Rn) are nonrandom and W(t) is a standard Brownian motion.

    Lemma 3.1. Suppose that bi,σi,i=1,2 satisfy the condition (A2), replacing b and σ, respectively. If

    α1,ε(s)=α2,ε(s)

    for any s,t[0,T],s<t, there exist the solutions to (3.1) and (3.2) such that

    1tstsE[I{α1,ε(r)α2,ε(r)}|Fεs]dr2N(N1)1εtsE(||Q(Xεr)Q(Yεr)||l1|Fεs)dr, (3.3)

    where

    Fεt=σ{W(s),Nεpε((0,s]×[0,Hε]):st}.

    Proof. Obviously,

    1exHεexHε(xHε)=exHε[(xHε)22+o(x2)]

    for any x>0. Choose a δ>0 so that when x(0,δ],

    1exHεexHε(xHε)(xHε)2

    holds. Divide [s,t] by δ. Let tk=s+kδ, k=0,1,2,,ˉK, where we denote

    ˉK=[tsδ],

    the integer part of tsδ, and

    tˉK+1=t.

    For the interval [t0,t1],

    P(α1,ε(t1)α2,ε(t1)|Fεs)=P(α1,ε(t1)α2,ε(t1),Nεpε((t0,t1]×[0,Hε])=1|Fεt0)+P(α1,ε(t1)α2,ε(t1),Nεpε((t0,t1]×[0,Hε])2|Fεt0). (3.4)

    According to the definition of the Poisson random measure and its property of independent increment,

    P(α1,ε(t1)α2,ε(t1),Nεpε((t0,t1]×[0,Hε])2|Fεt0)P(Nεpε((t0,t1]×[0,Hε])2)=1eHεδeHεδ(Hεδ)(Hεδ)2. (3.5)

    Below, we estimate the first term on the right side of Eq (3.4). Recall that ζε,sk and ξε,sk denote the kth jump time and jump height after time s, respectively. From (2.8) and (2.9),

    P(α1,ε(t1)α2,ε(t1),Nεpε((t0,t1]×[0,Hε])=1|Fεt0)=P(α1,ε(t1)α2,ε(t1),ζε,t01(t0,t1],ζε,t02>t1|Fεt0)=P(α1,ε(t0)+lS(lα1,ε(t0))IΓεα1,ε(t0)l(Xεζε,t01)(ξε,t01)α2,ε(t0)+lS(lα2,ε(t0))IΓεα2,ε(t0)l(Yεζε,t01)(ξε,t01),ζε,t01(t0,t1],ζε,t02>t1|Fεt0)=P(ξε,t01lα1,ε(s){Γεα1,ε(t0)l(Xεζε,t01)ΔΓεα2,ε(t0)l(Yεζε,t01)},ζε,t01(t0,t1],ζε,t02>t1|Fεt0). (3.6)

    Note that on [t0,ζε,t01], the solutions Xε(u) and Yε(u) of (3.1) and (3.2) are respectively determined by

    dXε(u)=b1(Xεu,α1,ε(t0))du+σ1(Xεu,α1,ε(t0))dW(u),dYε(u)=b2(Yεu,α2,ε(t0))du+σ2(Yεu,α2,ε(t0))dW(u),

    where the initial values are, respectively, Xεt0 and Yεt0. Therefore, due to the mutual independence of Nεpε(dt,dz) and (W(t))t0, it follows that {Xεu}u[t0,ζε,t01], {Yεu}u[t0,ζε,t01], and ζε,t01,ζε,t02 are mutual conditional independent with respect to Fεt0. This, together with (2.5), (2.6), and (3.6) yields that

    P(α1,ε(t1)α2,ε(t1),Nεpε((t0,t1]×[0,Hε])=1|Fεt0)=t1t0Ω1Hεm(lα1,ε(t0){Γεα1,ε(t0)l(Xεv)ΔΓεα1,ε(t0)l(Yεv)})P(dω|Fεt0)×eHε(ru)P(ζε,t01du)t1t0E(m(lα1,ε(t0){Γεα1,ε(t0)l(Xεv)ΔΓεα1,ε(t0)l(Yεv)})|Fεt0)du. (3.7)

    Substituting (3.5) and (3.7) into (3.4) yields that

    P(α1,ε(t1)α2,ε(t1)|Fεt0)(Hεδ)2+2N(N1)t1t0E(||Q(Xεu)Q(Yεu)||l1|Fεt0)du. (3.8)

    To proceed, we estimate

    P(α1,ε(t2)α2,ε(t2)|Fεt0)=P(α1,ε(t2)α2,ε(t2),α1,ε(t1)α2,ε(t1)|Fεt0)+P(α1,ε(t2)α2,ε(t2),α1,ε(t1)=α2,ε(t1)|Fεt0). (3.9)

    On one hand, (3.8) gives the estimation of the first term on the right side of the above equation. On the other hand, clearly,

    P(α1,ε(t2)α2,ε(t2),α1,ε(t1)=α2,ε(t1)|Fεt0)P(α1,ε(t2)α2,ε(t2),α1,ε(t1)=α2,ε(t1),Nεpε((t1,t2]×[0,Hε])=1|Fεt0)+(Hεδ)2. (3.10)

    Similar to (3.6) and (3.7),

    P(α1,ε(t2)α2,ε(t2),α1,ε(t1)=α2,ε(t1),Nεpε((t1,t2]×[0,Hε])=1|Fεt0)=E(Iα1,ε(t1)=α2,ε(t1)P(α1,ε(t2)α2,ε(t2),Nεpε((t1,t2]×[0,Hε])=1|Fεt1)|Fεt0). (3.11)

    Restricted to the set {α1,ε(t1)=α2,ε(t1)}, it can be deduced that

    P(α1,ε(t2)α2,ε(t2),Nεpε((t1,t2]×[0,Hε])=1|Fεt1)=P(α1,ε(t2)α2,ε(t2),ζε,t11(t1,t2],ζε,t12>2δ|Fεt1)P(ξε,t11lα1,ε(t1){Γεα1,ε(t1)l(Xεζε,t11)ΔΓεα2,ε(t1)l(Yεζε,t11)},ζε,t11(t1,t2],ζε,t12>t2|Fεt1)=t2t1Ω1Hεm(lα1,ε(t1){Γεα1,ε(t1)l(Xεu))ΔΓεα2,ε(t1)l(Yεu)})P(dω|Fεt1)eHε(t2u)P(ζε,t11du)2N(N1)1εt2t1E(||Q(Xεu)Q(Yεu)||l1|Fεt1)du. (3.12)

    Substituting (3.8), (3.10), and (3.12) into (3.9) yields that

    P(α1,ε(t2)α2,ε(t2)|Fεt0)2(Hεδ)2+2N(N1)1εt2t0E(||Q(Xεu)Q(Yεu)||l1|Fεt0)du.

    Deducing inductively, we obtain

    P(α1,ε(tk)α2,ε(tk)|Fεt0)k(Hεδ)2+KN,εtkt0E(||Q(Xεu)Q(Yεu)||l1|Fεt0)du, (3.13)

    where

    KN,ε=2N(N1)1ε.

    Finally, applying (3.13) gives that

    tsP(α1,ε(r)α2,ε(r)|Fεs)=ˉKk=0tk+1tkP(α1,ε(r)α2,ε(r)|Fεs)=ˉKk=0tk+1tkP(α1,ε(r)α2,ε(r),α1,ε(tk)=α2,ε(tk)|Fεs)+ˉKk=0tk+1tkP(α1,ε(r)α2,ε(r),α1,ε(tk)α2,ε(tk)|Fεs)ˉKk=0tk+1tkP(Nεpε(tk,tk+1)×[0,Hε]1)dr+ˉKk=0tk+1tkP(α1,ε(tk)α2,ε(tk)|Fεs)drˉKk=0tk+1tk[(1eHεδ)+k(Hεδ)2+KN,εtkt0E(||Q(Xεu)Q(Yεu)||l1|Fεs)du]dr=(ˉK+1)δ(1eHεδ)+(Hε)2δ3ˉK(ˉK+1)2+˜KtsE(||Q(Xεu)Q(Yεu)||l1|Fεs)du,

    where

    ˜K=(ˉK+1)δKε,N.

    Letting δ0 together with the fact that (ˉK+1)δ(ts), as δ0, gives that (3.3) holds.

    The proof is completed.

    In this section, for any probability measure

    μ=(μi)iSandν=(νi)iS,

    the total variation distance between μ and ν is defined by

    ||μν||var=sup|f|1|μ(f)ν(f)|,

    where

    μ(f)=iSμif(i)andν(f)=iSνif(i).

    To proceed, we will use the following coupling methods; for details, see [29, Chapter 5]. For any ϕC([τ,0];Rn), let (αϕ1(t),αϕ2(t)) be a coupling Markov process on phase space S×S with marginal distributed as αϕ(t) and αϕl(0)=il (l=1,2), where the process αϕ() with the parameter variable ϕ, when ϕ is fixed, is referred to as a frozen Markov process, which can be handled using the conclusions of Markov processes. Denote the classical coupling operator

    Ωϕf(k1,k2)=IΔc(k1,k2)(l1Sqk1l1(ϕ)(f(l1,k2)f(k1,k2))+l2Sqk2l2(ϕ)(f(k1,l2)f(k1,k2)))+IΔ(k1,k2)l1Sqk1l1(ϕ)(f(l1,l1)f(k1,k2)), (4.1)

    where f is a bounded function on S×S and

    Δ:={(k1,k2)S×S:k1=k2}.

    This classical coupling means that the marginals evolve independently until they meet. After they meet, they will move together at rate Q(ϕ). Define the coupling time

    T=inf{t0:αϕ1(t)=αϕ2(t)}.

    Then, by the well-known coupling inequality (see [29, p.195]), we have

    ||Pϕ(t,i1,)Pϕ(t,i2,)||var2Ei1,i2ϕ(Iαϕ1(t)αϕ2(t))=2Pi1,i2ϕ(T>t). (4.2)

    where Pi1,i2ϕ and Ei1,i2ϕ denote the probability and expectation for the coupling process (αϕ1(t),αϕ2(t)) starting from (i1,i2). Furthermore, let πϕ be invariant probability measure associated to Markov chain αϕ(t) with fixed ϕC([τ,0];Rn). When πϕ is assumed to exist, then from the fact

    πϕ=πϕPϕ

    and (4.2), it can be obtained that:

    ||Pϕ(t,i1,)πϕ||var=||Pϕ(t,i1,)i2SPϕ(t,i2,)πϕ(i2)||vari2Sπ(i2)||Pϕ(t,i1,)Pϕ(t,i2,)||var2i2Sπ(i2)Pi1,i2ϕ(T>t). (4.3)

    From the inequality above, it can be seen that a coupling time gives us some information about the convergence rate. To obtain the strong ergodicity uniformly in ϕC([τ,0];Rn), we just estimate Pi1,i2ϕ(T>t).

    Theorem 4.1. Under assumptions (A3)(A5), Pϕt is strongly ergodic and uniformly in ϕC([τ,0];Rn), that is, there exist constants L3,λ>0 such that

    supiS||Pϕt(i,)πϕ||varL3eλt. (4.4)

    Proof. For fixed ϕC([τ,0];Rn), from classic conclusions (see [5, Lemma A.2]), it can be concluded that

    supiS||Pϕt(i,)πϕ||varKexp(ˆκ(ϕ)s).

    where the convergence rate ˜κ(ϕ) depends on ϕ. To proceed, we show that ˜κ(ϕ) has a uniform lower bound. Set

    ˆκ0(ϕ)=min{min{(1eqk(ϕ))qkl(ϕ)qk(ϕ):lS{k}},eqk(ϕ):kS},κ1=infϕC([τ,0];Rn)ˆκ0(ϕ),κ2=supϕC([τ,0];Rn)ˆκ0(ϕ).

    Under assumptions (A4) and (A5), we can obtain

    κ1min{min{(1eβkk)βklM:lS{k}},eβkk:kS}>0 (4.5)

    and

    κ2supϕC([τ,0];Rn)lS{k}(1eqk(ϕ))qkl(ϕ)qk(ϕ)supϕC([τ,0];Rn)(1eqk(ϕ))1eM. (4.6)

    By the classical coupling (αϕ1(t),αϕ2(t)) constructed by (4.1), together with the definition of ˆκ0(ϕ) one has for i1i2,

    Pi1,i2ϕ(T1)lk1,k2(1eqk1(ϕ))qk1l(ϕ)qk1(ϕ)(1eqk2(ϕ))qk2l(ϕ)qk2(ϕ)+(1eqk1(ϕ))qk1k2(ϕ)qk1(ϕ)eqk2(ϕ)+(1eqk2(ϕ))qk2k1(ϕ)qk1(ϕ)eqk1(ϕ)ˆκ0(ϕ)lk1,k2(1eqk1(ϕ))qk1l(ϕ)qk1(ϕ)+ˆκ0(ϕ)(1eqk1(ϕ))qk1k2(ϕ)qk1(ϕ)+ˆκ0(ϕ)eqk1(ϕ)ˆκ0(ϕ).

    Note that

    Pi1,i2ϕ(T1)=1ˆκ0(ϕ)

    as i1=i2, which implies

    Pi1,i2ϕ(T1)ˆκ0(ϕ)

    for any i1,i2S. This, together with Markov property, can derive that

    Pi1,i2ϕ(T>t)(1ˆκ0(ϕ))[t]

    for any i1,i2S by the induction method, where [t] denotes the integral part of t. In fact, suppose that

    Pi1,i2ϕ(T>[t]1)(1ˆκ0(ϕ))[t]1

    for t1. Then

    Pi1,i2ϕ(T>t)Ei1,i2ϕ(T>[t])=Ei1,i2ϕ(I(T>1)Eαϕ1(1),αϕ2(1)ϕI(T>[t]1))(1ˆκ0(ϕ))[t]1Pi1,i2ϕ(T>1)(1ˆκ0(ϕ))[t]. (4.7)

    Substituting (4.7) into (4.3) yields that

    ||Pϕ(t,i1,)πϕ||var2(1ˆκ0(ϕ))[t]=2e{t}ln(1ˆκ0(ϕ))etln(1ˆκ0(ϕ))2e{t}ln(1κ2)etln(1κ1)2eMetln(1κ1),

    where {} denotes the decimal part.

    This concludes the proof.

    Theorem 4.2. Under assumptions (A3)(A6), the functional ϕπϕ from C([τ,0];Rn) to P(S), endowed with the total variation norm, is Lipschitz continuous, i.e., there exists a constant L4 such that

    ||πϕ1πϕ2||varL4||ϕ1ϕ2|| (4.8)

    for any ϕ1 and ϕ2C([τ,0],Rn).

    Proof. For ϕ1 and ϕ2C([τ,0],Rn), by the integration by parts formula for continuous Markov chains (see [29, Thoerem 13.40]),

    Pϕ1tf(i)Pϕ2tf(i)=t0Pϕ1ts(Q(ϕ1)Q(ϕ2))Pϕ2sf(i)ds,t>0,fB(S). (4.9)

    For any |f|1 and any 0s<t,

    supiS|Pϕ1ts(Q(ϕ1)Q(ϕ2))Pϕ2sf(i)|supiS|(Q(ϕ1)Q(ϕ2))Pϕ2sf(i)|=supiS|(Q(ϕ1)Q(ϕ2))(Pϕ2sπϕ2)f(i)|. (4.10)

    It is easy to observe that for any |f|1,

    supiS|(Q(ϕ1)Q(ϕ2))f(i)|supiS[|qii(ϕ1)qii(ϕ2)|+jS,ji|qij(ϕ1)qij(ϕ2)|]2||Q(ϕ1)Q(ϕ2)||l1.

    This, together with (4.10), the condition (A6), and the fact that for any |f|1,

    supiS|(Pϕ2sπϕ2)f(i)|supiS||Pϕ2s(i,)πϕ2||var

    implies that

    supiS|Pϕ1ts(Q(ϕ1)Q(ϕ2))Pϕ2sf(i)|K||ϕ1ϕ2||supiS||Pϕ2s(i,)πϕ2||var.

    Combining the estimation above with (4.4), we get from (4.9) that for iS

    |Pϕ1tf(i)Pϕ2tf(i)|KL3||ϕ1ϕ2||t0eλsdsK||ϕ1ϕ2||(1eλt). (4.11)

    Consequently, (4.4) and (4.11) show that

    |πϕ1fπϕ2f|=|Ni=1Nj=1πϕ2(i)πϕ1(j)Pϕ1tf(j)Ni=1Nj=1πϕ1(j)πϕ2(i)Pϕ2tf(i)|i,jSπϕ1(j)πϕ2(i)|Pϕ1tf(j)Pϕ2tf(i)|i,jSπϕ1(j)πϕ2(i)|Pϕ1tf(j)Pϕ2tf(j)|+i,jSπϕ1(j)πϕ2(i)|Pϕ2tf(j)πϕ2f|+i,jSπϕ1(j)πϕ2(i)|Pϕ2tf(i)πϕ2f|K||ϕ1ϕ2||(1eλt)+L3eλt.

    Finally, letting t and taking supremum over |f|1, it follows that

    ||πϕ1πϕ2||varK||ϕ1ϕ2||.

    This completes the proof.

    By using the invariant measure πϕ, let us define

    ˉb(ϕ)=iSb(ϕ,i)πϕ(i) and ˉΣ(ϕ)=iSσ(ϕ,i)σ(ϕ,i)πϕ(i).

    Obviously, ˉb() satisfies the linear growth condition. According to the definition of ˉΣ(), we can yield that for ϕ, ϕ1, and ϕ2C([τ,0];Rn)

    |ˉb(ϕ1)ˉb(ϕ2)|K(1+||ϕ1||2+||ϕ2||2)||ϕ1ϕ2||, (4.12)
    |ˉΣ(ϕ)|iS|σ(ϕ,i)|2πϕ(i)K(1+||ϕ||2) (4.13)

    and

    |ˉΣ(ϕ1)ˉΣ(ϕ2)|K(1+||ϕ1||2+||ϕ2||2)||ϕ1ϕ2||. (4.14)

    Let us introduce the following equation:

    (A7) The following equation,

    dX(t)=ˉb(Xt)dt+ˉσ(Xt)d˜w(t) (4.15)

    has a solution that is unique in the weak sense (i.e., uniqueness in the sense of the distribution) on [0, T] for the same initial date

    X(t) = \xi\in C([-\tau, 0], \mathbb{R}^n)

    as Eq (2.1), where \tilde{w} is a standard Brownian motion and

    \bar{\sigma}(\cdot)\bar{\sigma}'(\cdot) = \bar{\Sigma}(\cdot).

    This section will show that X^ \varepsilon weakly converges to X determined by (4.15). To prove this claim, tightness of X^ \varepsilon is needed.

    Theorem 5.1. Under assumptions (A1)–(A5), \{X^ \varepsilon(\cdot)\}_{ \varepsilon\in(0, 1)} is tight on C([0, T]; \mathbb{R}^n) .

    To obtain tightness, we need the following sufficient condition for tightness (see [30, p.64]).

    Lemma 5.2. If \{X^ \varepsilon\}_{ \varepsilon\in(0, 1)}\in C([0, T]; \mathbb{R}^n) satisfy that, for some positive constants \alpha, \beta, \nu ,

    \begin{equation} \sup\limits_{ \varepsilon\in(0, 1)}\mathbb{E}|X^ \varepsilon(0)|^\nu < \infty, \end{equation} (5.1a)
    \begin{equation} \sup\limits_{ \varepsilon\in(0, 1)}\mathbb{E}|X^ \varepsilon(t)-X^ \varepsilon(s)|^\alpha\leq K|t-s|^{1+\beta}; 0\leq s < t\leq T , \end{equation} (5.1b)

    then the probability measures induced by X^ \varepsilon form a tight sequence.

    Proof of Theorem 5.1. Because

    X^ \varepsilon(0) = \xi(0)

    is independent of \varepsilon , it is sufficient to check (5.1b). Hence, for any p\geq2 , \delta > 0 , and 0\leq s < t\leq T , it follows from (2.1) that

    \begin{align*} \mathbb{E}\Big[\sup\limits_{s\leq t\leq s+\delta}|X^ \varepsilon(t)-X^ \varepsilon(s)|^p\Big] &\leq K\Big\{\mathbb{E}\Big[\sup\limits_{s\leq t\leq s+\delta}\Big|\int_{s}^{t}b(X^ \varepsilon_r, \alpha^ \varepsilon(r))dr\Big|^p\Big]+\mathbb{E}\Big[\sup\limits_{s\leq t\leq s+\delta}\Big|\int_{s}^{t}\sigma(X^ \varepsilon_r, \alpha^ \varepsilon(r))dW(r)\Big|^p\Big]\Big\}\nonumber\\ &\leq K\delta^{p-1}\int_{s}^{t}\mathbb{E}|b(X^ \varepsilon_r, \alpha^ \varepsilon(r))|^pdr+K\delta^{\frac{p-2}{2}}\int_{s}^{t}\mathbb{E}|\sigma(X^ \varepsilon_r, \alpha^ \varepsilon(r))|^pdr, \end{align*}

    which, combined with assumption (A2) and (2.10), implies that

    \begin{align} \mathbb{E}\Big[\sup\limits_{s\leq t\leq s+\delta}|X^ \varepsilon(t)-X^ \varepsilon(s)|^p\Big]&\leq K\delta^{\frac{p}{2}}. \end{align} (5.2)

    Then letting

    p = 4 > 2\; \; \; \text{and}\; \; \; \delta = t-s

    gives that (5.1b) holds.

    According to tightness of \{X^ \varepsilon(\cdot)\} , by the Prohorov theorem (see [31, p.59]), there exists X(\cdot) defined on C([-\tau, T]; \mathbb{R}^n) and the subsequence of \varepsilon (without loss of generality, we still denote the superscript of convergent subsequence as \varepsilon ) such that

    X^ \varepsilon(t)\Rightarrow X(t),

    as \varepsilon\rightarrow0 . To proceed, we shall need the following version of the Arzelá-Ascoli theorem (see [30, p.63]) and the moment estimation of the segment process:

    Lemma 5.3. \{X^ \varepsilon\}_{ \varepsilon\in(0, 1)}\in C([-\tau, T]; \mathbb{R}^n) is tight if and only if

    \begin{equation} \lim\limits_{\mu\uparrow\infty}\sup \limits_{ \varepsilon\in(0, 1)}\mathbb{P}(|X^ \varepsilon(0)| > \mu) = 0, \end{equation} (5.3a)
    \begin{equation} \lim\limits_{\delta\downarrow0}\sup\limits_{ \varepsilon\in(0, 1)}\mathbb{P}\Big(\Lambda_{\delta, \varepsilon} > \gamma_1\Big) = 0\ \ {\mbox{for any}}\ \gamma_1 > 0, \end{equation} (5.3b)

    where we define

    \Lambda_{\delta, \varepsilon} = \sup\limits_{\stackrel{s, t\in[-\tau, T]}{|s-t| < \delta}}|X^ \varepsilon(t)-X^ \varepsilon(s)|.

    Lemma 5.4. For p > 2 , 0 < \delta < 1 , and \gamma_2 < \frac{p}{2}-1 , we have

    \begin{equation} \mathbb{E}\Big[\sup\limits_{\substack{s, t\in[0, T] \\ |s-t| < \delta}}\sup\limits_{\theta\in[-\tau, 0]}|X^ \varepsilon_t(\theta)-X^ \varepsilon_t(\theta)|\Big]\leq K\delta^{\frac{1}{2}-\frac{1+\gamma_2}{p}}.\nonumber \end{equation}

    Proof. Let

    N_1 = [\frac{T+\tau}{\delta}], \ \ \ t_m = -\tau+m\delta,

    m = 0, 1, \dots, N_1 , and t_{N_1+1} = T . Denote

    \begin{equation} \Xi = \max\limits_{1\leq i\leq N_1+1}\sup\limits_{t_{i-1}\leq s\leq t_i}|X^ \varepsilon(s)-X^ \varepsilon(t_{i-1})|.\nonumber \end{equation}

    Due to |s-t| < \delta , s and t either fall into the same interval

    I_i: = [t_{i-1}, t_i]

    or into different adjacent intervals I_i and

    I_{i+1}: = [t_i, t_{i+1}].

    If s and t fall into the same interval I_i , then

    \begin{equation} |X^ \varepsilon(s)-X^ \varepsilon(t)|\leq|X^ \varepsilon(s)-X^ \varepsilon(t_{i-1})|+|X^ \varepsilon(t)-X^ \varepsilon(t_{i-1})|\leq 2\Xi\nonumber. \end{equation}

    If s and t fall into different adjacent intervals I_i and I_{i+1} , then

    \begin{equation} |X^ \varepsilon(s)-X^ \varepsilon(t)|\leq|X^ \varepsilon(s)-X^ \varepsilon(t_{i-1})|+|X^ \varepsilon(t_i)-X^ \varepsilon(t_{i-1})|+|X^ \varepsilon(t)-X^ \varepsilon(t_i)|\leq 3\Xi.\nonumber \end{equation}

    From this, it can be concluded that

    \begin{equation} \sup\limits_{\substack{s, t\in[0, T] \\ |s-t| < \delta}}|X^ \varepsilon(t)-X^ \varepsilon(s)|\leq 3\max\limits_{1\leq i\leq N_1+1}\sup\limits_{t_{i-1}\leq s\leq t_i}|X^ \varepsilon(s)-X^ \varepsilon(t_{i-1})|, \end{equation} (5.4)

    which implies that for \beta > 0 ,

    \begin{align} \mathbb{P}\Big(\sup\limits_{\stackrel{s, t\in[0, T]}{|s-t| < \delta}}\sup\limits_{\theta\in[-\tau, 0]}|X^ \varepsilon_t(\theta)-X^ \varepsilon_s(\theta)| > \beta\Big) &\leq\mathbb{P}\Big(\sup\limits_{\stackrel{s, t\in[-\tau, T]}{|s-t| < \delta}}|X^ \varepsilon(t)-X^ \varepsilon(s)| > \beta\Big)\\ &\leq\mathbb{P}\Big(\max\limits_{1\leq i\leq N_1+1}\sup\limits_{t_{i-1}\leq s\leq t_{i}}|X^ \varepsilon(s)-X^ \varepsilon(t_{i-1})| > \frac{\beta}{3}\Big)\\ &\leq\sum\limits_{i = 1}^{N_1+1}\mathbb{P}\Big(\sup\limits_{t_{i-1}\leq s\leq t_{i}}|X^ \varepsilon(s)-X^ \varepsilon(t_{i-1})| > \frac{\beta}{3}\Big)\\ &\leq(N_1+1)\sup\limits_{-\tau\leq s\leq T}\mathbb{P}\Big(\sup\limits_{s\leq t\leq s+\delta}|X^ \varepsilon(t)-X^ \varepsilon(s)| > \frac{\beta}{3}\Big)\\ &\leq\frac{K}{\delta}\sup\limits_{-\tau\leq s\leq T}\mathbb{P}\Big(\sup\limits_{s\leq t\leq s+\delta}|X^ \varepsilon(t)-X^ \varepsilon(s)| > \frac{\beta}{3}\Big). \end{align} (5.5)

    By using the in [32, Corollary 2] and (5.5), one can derive that

    \begin{align*} \mathbb{E}\Big[\sup\limits_{\stackrel{s, t\in[0, T]}{|s-t| < \delta}}\sup\limits_{\theta\in[-\tau, 0]}|X^ \varepsilon_t(\theta)-X^ \varepsilon_s(\theta)|^p\Big] & = p\int_{0}^{\infty}\beta^{p-1}\mathbb{P}\Big(\sup\limits_{\stackrel{s, t\in[0, T]}{|s-t| < \delta}}\sup\limits_{\theta\in[-\tau, 0]}|X^ \varepsilon_t(\theta)-X^ \varepsilon_s(\theta)|^p > \beta\Big)d\beta\\ &\leq Kp\int_{0}^{\infty}\beta^{p-1}\frac{1}{\delta}\sup\limits_{-\tau\leq s\leq T}\mathbb{P}\Big(\sup\limits_{s\leq t\leq s+\delta}|X^ \varepsilon(t)-X^ \varepsilon(s)| > \frac{\beta}{3}\Big)d\beta\\ & = Kp\int_{0}^{\delta}\beta^{p-1}\frac{1}{\delta}\sup\limits_{-\tau\leq s\leq T}\mathbb{P}\Big(\sup\limits_{s\leq t\leq s+\delta}|X^ \varepsilon(t)-X^ \varepsilon(s)| > \frac{\beta}{3}\Big)d\beta\\ &\quad+Kp\int_{\delta}^{1}\beta^{p-1}\frac{1}{\delta}\sup\limits_{-\tau\leq s\leq T}\mathbb{P}\Big(\sup\limits_{s\leq t\leq s+\delta}|X^ \varepsilon(t)-X^ \varepsilon(s)| > \frac{\beta}{3}\Big)d\beta\\ &\quad+Kp\int_{1}^{\infty}\beta^{p-1}\frac{1}{\delta}\sup\limits_{-\tau\leq s\leq T}\mathbb{P}\Big(\sup\limits_{s\leq t\leq s+\delta}|X^ \varepsilon(t)-X^ \varepsilon(s)| > \frac{\beta}{3}\Big)d\beta\\ & = :\Upsilon_{1}+\Upsilon_{2}+\Upsilon_{3}. \end{align*}

    It is easy to observe that

    \begin{equation} \Upsilon_{1}\leq K\delta^{-1}\int_{0}^{\delta}p\beta^{p-1}d\beta = K\delta^{p-1}.\nonumber \end{equation}

    Applying the Chebyshev inequality, assumption (A1) and (5.2) give that for q\geq2

    \begin{align} & \mathbb{P}\Big(\sup\limits_{s\leq t\leq s+\delta}|X^ \varepsilon(t)-X^ \varepsilon(s)| > \frac{\beta}{3}\Big)\leq\frac{K\mathbb{E}\Big[\sup\limits_{s\leq t\leq s+\delta}|X^ \varepsilon(t)-X^ \varepsilon(s)|^q\Big]}{\beta^q}\leq K\frac{\delta^{\frac{q}{2}}}{\beta^q}. \end{align} (5.6)

    In the above inequality, letting q = p yields that

    \begin{align*} & \Upsilon_2\leq Kp\int_{\delta}^{1}\beta^{p-1}\frac{\delta^{\frac{p}{2}-1}}{\beta^p}d\beta = K\delta^{\frac{p}{2}-1}\ln\frac{1}{\delta}. \end{align*}

    Similarly, letting q = 2p gives

    \begin{align*} & \Upsilon_3\leq Kp\int_{1}^{\infty}\beta^{p-1}\frac{\delta^{p-1}}{\beta^{2p}}d\beta = K\delta^{p-1}. \end{align*}

    Note that

    \gamma_2 < \frac{p}{2}-1

    and

    0 < \delta < 1, \ \ \ \lim\limits_{\delta\rightarrow0}\delta^{\gamma_2}\ln\frac{1}{\delta} = 0.

    From this, it can be concluded that

    \Upsilon_2\leq K\delta^{\frac{p}{2}-1-\gamma_2},

    which implies that

    \mathbb{E}\Big[\sup\limits_{\stackrel{s, t\in[0, T]}{|s-t| < \delta}}\sup\limits_{\theta\in[-\tau, 0]}|X^ \varepsilon_t(\theta)-X^ \varepsilon_s(\theta)|^p\Big]\leq K\delta^{\frac{p}{2}-1-\gamma_2}.

    This, together with the Lyapunov inequlity, yields that

    \mathbb{E}\Big[\sup\limits_{\stackrel{s, t\in[0, T]}{|s-t| < \delta}}\sup\limits_{\theta\in[-\tau, 0]}|X^ \varepsilon_t(\theta)-X^ \varepsilon_s(\theta)|\Big]\leq K\delta^{\frac{1}{2}-\frac{1+\gamma_2}{p}}.

    This completes the proof.

    Now, let us state the main result of this article.

    Theorem 5.5. Under assumptions (A1)(A7), the limit of any weakly convergent subsequence of the process \{X^ \varepsilon(\cdot)\}_{ \varepsilon\in(0, 1)} satisfies the Eq (4.15) with the same value

    X^ \varepsilon_0 = \xi\in C([-\tau, 0];\mathbb{R}^n).

    Proof. For any f\in C_0^\infty(\mathbb{R}^n, \mathbb{R}) , applying the Itô formula to f(X^ \varepsilon(t)) for Eq (2.1)yields that

    \begin{align} M^ \varepsilon_f(t)&: = f(X^ \varepsilon(t))-f(X^ \varepsilon(0))-\int_{0}^{t}\mathbb{L}^ \varepsilon(X^ \varepsilon_r, \alpha^ \varepsilon(r))f(X^ \varepsilon(r))dr\\ & = \int_{0}^{t}f_x(X^ \varepsilon(r))\sigma(X^ \varepsilon_r, \alpha^ \varepsilon(r))dW(r) \end{align} (5.7)

    is a martingale, where

    \begin{align*} \mathbb{L}^ \varepsilon(X^ \varepsilon_r, \alpha^ \varepsilon(r))f(X^ \varepsilon(r))& = f_x(X^ \varepsilon(r))b(X^ \varepsilon_r, \alpha^ \varepsilon(r))+\frac{1}{2}\sum\limits_{i, j = 1}^{n}\Sigma_{ij}(X^ \varepsilon_r, \alpha^ \varepsilon(r))f_{x_ix_j}(X^ \varepsilon(r)) \end{align*}

    and

    \begin{align*} & \Sigma_{ij}(X^ \varepsilon_r, \alpha^ \varepsilon(r)) = \sigma_i(X^ \varepsilon_r, \alpha^ \varepsilon(r))\sigma_j(X^ \varepsilon_r, \alpha^ \varepsilon(r)). \end{align*}

    This is equivalent to

    \begin{align} &\mathbb{E}\Big[(h(X^ \varepsilon(s_i)), i\leq k)\Big(f(X^ \varepsilon(t))-f(X^ \varepsilon(s))-\int_{s}^{t}\mathbb{L}^ \varepsilon(X^ \varepsilon_r, \alpha^ \varepsilon(r))f(X^ \varepsilon(r))dr\Big)\Big] = 0, \end{align} (5.8)

    for arbitrary k, s, and t with s_1 < s_2 < \ldots < s_k < s < t, and any bounded and continuous function h(\cdot) . To characterize the limit process \{X(t)\}_{t\geq0} , it suffices to show that letting \varepsilon\rightarrow0 on both sides of (5.8),

    \begin{align} 0& = \lim\limits_{ \varepsilon\rightarrow0}\mathbb{E}\Big[(h(X^ \varepsilon(s_i)), i\leq k)\Big(f(X^ \varepsilon(t))-f(X^ \varepsilon(s))-\int_{s}^{t}\mathbb{L}^ \varepsilon(X^ \varepsilon_r, \alpha^ \varepsilon(r))f(X^ \varepsilon(r))dr\Big)\Big]\\ & = \mathbb{E}\Big[(h(X(s_i)), i\leq k)\Big(f(X(t))-f(X(s))-\int_{s}^{t}\mathcal{L}(X_r)f(X(r))dr\Big)\Big], \end{align} (5.9)

    where

    \mathcal{L}(X_r)f(X(r)) = f_x(X(r))\bar{b}(X_r)+\frac{1}{2}\sum\limits_{i, j = 1}^{n}\bar{\sigma}_i(X_r)\bar{\sigma}_j(X_r)f_{x_ix_j}(X(r)).

    By the Skorohod representation theorem ([32, p.354]) and Theorem 5.1, we may assume for s\in[0, T] , X^ \varepsilon(s)\rightarrow X(s) in the sense of w.p.1 as \varepsilon\rightarrow0 . This, together with the Lebesgue dominated convergence theorem, yields that

    \begin{align} &\mathbb{E}[h(X^ \varepsilon(s_i), i\leq k)(f(X^ \varepsilon(t))-f(X^ \varepsilon(s)))]\rightarrow \mathbb{E}[h(X(s_i), i\leq k)(f(X(t))-f(X(s)))] \end{align} (5.10)

    for all 0\leq s < t . Furthermore, by Vitali convergence theorem (refer to [33] and the related literature for further details) and Theorem 2.10, we can obtain

    \begin{align} &\mathbb{E}\Big[\sup\limits_{0\leq s\leq T}|X^ \varepsilon(s)|^4\Big]\longrightarrow\mathbb{E}\Big[\sup\limits_{0\leq s\leq T}|X(s)|^4\Big]\leq K. \end{align} (5.11)

    Next, we only need to show

    \begin{align*} \lim\limits_{ \varepsilon\rightarrow0}\mathbb{E}\Big[h(X^ \varepsilon(s_i), i\leq k)\int_{s}^{t}\mathbb{L}^ \varepsilon(X^ \varepsilon_r, \alpha^ \varepsilon(r))f(X^ \varepsilon(r))dr\Big] = \mathbb{E}\Big[h(X(s_i), i\leq k)\int_{s}^{t}\mathcal{L}(X_r)f(X(r))dr\Big]. \end{align*}

    According to the definition of \mathbb{L}^ \varepsilon and \mathcal{L} , we shall only consider

    I_1: = \mathbb{E}\Big[\int_{s}^{t}|f_x(X^ \varepsilon(r))b(X^ \varepsilon_r, \alpha^ \varepsilon(r))-f_x(X(r))\bar{b}(X_r)|dr\Big]

    and

    \begin{align*} I_2&: = \sum\limits_{k, l = 1}^{n}\mathbb{E}\Big[\int_{s}^{t}|f_{x_kx_l}(X^ \varepsilon(r))\sigma_k(X^ \varepsilon_r, \alpha^ \varepsilon(r))\sigma_l(X^ \varepsilon_r, \alpha^ \varepsilon(r))-f_{x_kx_l}(X(r))\bar{\sigma}_k(X_r)\bar{\sigma}_l(X_r)|dr\Big], \end{align*}

    where we define

    I_2 = :\sum\limits_{k, l = 1}^{n}I^{k, l}_2.

    For \delta > 0 , set

    N = [\frac{t-s}{\delta}], \ \ \ s_m = s+m\delta

    for m = 0, 1, \dots, N and

    s_{N+1} = t.

    Hence,

    \begin{align*} I_1&\leq\mathbb{E}\Big[\sum\limits_{m = 0}^{N}\int_{s_m}^{s_{m+1}}|f_x(X^ \varepsilon(r))b(X^ \varepsilon_r, \alpha^ \varepsilon(r))-f_x(X^ \varepsilon(s_m))b(X^ \varepsilon_{s_m}, \alpha^ \varepsilon(r))|dr\Big]\\ &\quad+\mathbb{E}\Big[\sum\limits_{m = 0}^{N}\int_{s_m}^{s_{m+1}}|f_x(X^ \varepsilon(s_m))b(X^ \varepsilon_{s_m}, \alpha^ \varepsilon(r))-f_x(X^ \varepsilon(s_m))\bar{b}(X^ \varepsilon_{s_m})|dr\Big]\\ &\quad+\mathbb{E}\Big[\sum\limits_{m = 0}^{N}\int_{s_m}^{s_{m+1}}|f_x(X^ \varepsilon(s_m))\bar{b}(X^ \varepsilon_{s_m})-f_x(X(r))\bar{b}(X_r)|dr\Big]\\ & = :I_{11}+I_{12}+I_{13}. \end{align*}

    Recall that f\in C_0^\infty(\mathbb{R}^n, \mathbb{R}) . This and assumption (A2) give

    \begin{align*} I_{11}&\leq\mathbb{E}\Big[\sum\limits_{m = 0}^{N}\int_{s_m}^{s_{m+1}}|f_x(X^ \varepsilon(r))b(X^ \varepsilon_r, \alpha^ \varepsilon(r))-f_x(X^ \varepsilon(s_m))b(X^ \varepsilon_r, \alpha^ \varepsilon(r))|dr\Big]\\ &\quad+\mathbb{E}\Big[\sum\limits_{m = 0}^{N}\int_{s_m}^{s_{m+1}}|f_x(X^ \varepsilon(s_m))b(X^ \varepsilon_r, \alpha^ \varepsilon(r))-f_x(X^ \varepsilon(s_m))b(X^ \varepsilon_{s_m}, \alpha^ \varepsilon(r))|dr\Big]\\ &\leq K\mathbb{E}\Big[\sum\limits_{m = 0}^{N}\int_{s_m}^{s_{m+1}}(|X^ \varepsilon(r)-X^ \varepsilon(s_m)|(1+||X^ \varepsilon_r||_{\infty})+||X^ \varepsilon_r-X^ \varepsilon_{s_m}||_{\infty})\Big]. \end{align*}

    Consequently, by virtual of (2.10), (5.2), and the Hölder inequality, we arrive at

    \begin{align*} I_{11}&\leq K\sum\limits_{m = 0}^{N}\int_{s_m}^{s_{m+1}}((\mathbb{E}|X^ \varepsilon(r)-X^ \varepsilon(s_m)|^2)^{\frac{1}{2}}+\mathbb{E}||X^ \varepsilon_r-X^ \varepsilon_{s_m}||_{\infty})dr. \end{align*}

    According to the tightness of \{X^ \varepsilon(\cdot)\} and (5.3b), we obtain that \Lambda_{\delta, \varepsilon}\stackrel{\mathbb{P}}{\longrightarrow}0 uniformly with respect to \varepsilon , as \delta\rightarrow0 . This, together with the Lebesgue dominated convergence theorem, yields that

    \begin{align} &I_{11}\leq K(N+1)\delta(\delta^{\frac{1}{2}}+\mathbb{E}\Lambda_{\delta, \varepsilon})\rightarrow0, \mbox{ as } \delta\rightarrow0. \end{align} (5.12)

    Similar to I_{11} , together with the definition of \bar{b}(\cdot) and (4.12), we have

    \begin{align*} I_{13}&\leq \mathbb{E}\Big[\sum\limits_{m = 0}^{N}\int_{s_m}^{s_{m+1}}|f_x(X^ \varepsilon(s_m))\bar{b}(X^ \varepsilon_{s_m})-f_x(X^ \varepsilon(s_m))\bar{b}(X_r)|dr\Big]\\ &\quad+\mathbb{E}\Big[\sum\limits_{m = 0}^{N}\int_{s_m}^{s_{m+1}}|f_x(X^ \varepsilon(s_m))\bar{b}(X_r)-f_x(X(r))\bar{b}(X_r)|dr\Big]\\ &\leq K\sum\limits_{m = 0}^{N}\int_{s_m}^{s_{m+1}}[(\mathbb{E}||X^ \varepsilon_{s_m}-X_r||_{\infty}^2)^{\frac{1}{2}}+(\mathbb{E}|X^ \varepsilon(s_m)-X(r)|^2)^{\frac{1}{2}}]dr. \end{align*}

    Note that

    |X^ \varepsilon(s_m)-X(r)|\leq||X^ \varepsilon_{s_m}-X_r||_{\infty}

    and

    \begin{align} ||X^ \varepsilon_{s_m}-X_r||_{\infty}&\leq\sup\limits_{\stackrel{s, t\in[-\tau, T]}{|s-t| < \delta}}|X^ \varepsilon(s)-X(t)|\\ &\leq\sup\limits_{\stackrel{s, t\in[-\tau, T]}{|s-t| < \delta}}|X^ \varepsilon(s)-X^ \varepsilon(t)|+\sup\limits_{\stackrel{s, t\in[-\tau, T]}{|s-t| < \delta}}|X^ \varepsilon(t)-X(t)|\\ &\leq\Lambda_{\delta, \varepsilon}+\sup\limits_{t\in[-\tau, T]}|X^ \varepsilon(t)-X(t)|. \end{align} (5.13)

    Define

    \Gamma_{\delta, \varepsilon} = \Lambda_{\delta, \varepsilon}+\sup\limits_{t\in[-\tau, T]}|X^ \varepsilon(t)-X(t)|.

    The Lebesgue dominated convergence theorem and (5.11) give

    \begin{align} & I_{13}\leq K\delta(N+1)(\mathbb{E}\Gamma_{\delta, \varepsilon}^2)^{\frac{1}{2}}\rightarrow0, \mbox{ as }\delta, \varepsilon\rightarrow0. \end{align} (5.14)

    Now, we construct an auxiliary Markov chain \{\tilde{\alpha}^m(r)\}_{r\geq s_m} in S satisfying the transition rate (\frac{1}{ \varepsilon}q_{ij}(X^ \varepsilon_{s_m}))_{i, j\in S} and

    \tilde{\alpha}^m(s_m) = \alpha^ \varepsilon(s_m).

    By virtue of Lemma 3.3, together with assumption (A6), we have

    \begin{align} & \int_{s_{m}}^{s_{m+1}}\mathbb{E}\Big[\mathbb{I}_{\{\alpha^{ \varepsilon}(r)\neq\tilde{\alpha}^m(r)\}}|\mathcal{F}^ \varepsilon_{s_m}\Big]dr\leq2N(N-1)\frac{\delta}{ \varepsilon}\int_{s_m}^{s_{m+1}}\mathbb{E}(||X^ \varepsilon_r-X^ \varepsilon_{s_m}||_{\infty}|\mathcal{F}^ \varepsilon_{s_m})dr. \end{align} (5.15)

    Then,

    \begin{align*} I_{12} & \leq K\mathbb{E}\Big[\sum\limits_{m = 0}^{N}\int_{s_m}^{s_{m+1}}|f_x(X^ \varepsilon(s_m))b(X^ \varepsilon_{s_m}, \alpha^ \varepsilon(r))-f_x(X^ \varepsilon(s_m))b(X^ \varepsilon_{s_m}, \tilde{\alpha}^m(r))|dr\Big]\\ &\quad+K\mathbb{E}\Big[\sum\limits_{m = 0}^{N}\int_{s_m}^{s_{m+1}}|f_x(X^ \varepsilon(s_m))b(X^ \varepsilon_{s_m}, \tilde{\alpha}^m(r))-f_x(X^ \varepsilon(s_m))\bar{b}(X^ \varepsilon_{s_m})|dr\Big]\\ & = :I_{12, 1}+I_{12, 2}. \end{align*}

    Applying (5.15), the Hölder inequality, assumption (A2), and Theorem 2.2, we obtain that

    \begin{align} I_{12, 1}&\leq K\mathbb{E}\Big[\sum\limits_{m = 0}^{N}\int_{s_m}^{s_{m+1}}|b(X^ \varepsilon_{s_m}, \alpha^ \varepsilon(r))-b(X^ \varepsilon_{s_m}, \tilde{\alpha}^m(r))|\mathbb{I}_{\{\alpha^{ \varepsilon}(r)\neq\tilde{\alpha}^m(r)\}}dr\Big]\\ &\leq K\sum\limits_{m = 0}^{N}\Big(\int_{s_m}^{s_{m+1}}\mathbb{E}|b(X^ \varepsilon_{s_m}, \alpha^ \varepsilon(r))-b(X^ \varepsilon_{s_m}, \tilde{\alpha}^m(r))|^2\Big)^{\frac{1}{2}}\Big(\mathbb{E}\int_{s_{m}}^{s_{m+1}}\mathbb{E}[\mathbb{I}_{\{\alpha^{ \varepsilon}(r)\neq\tilde{\alpha}^m(r)\}}|\mathcal{F}^ \varepsilon_{s_m}]dr\Big)^{\frac{1}{2}}\\ &\leq K\sum\limits_{m = 0}^{N}\delta^{\frac{1}{2}}\Big(\frac{\delta}{ \varepsilon}\int_{s_m}^{s_{m+1}}\mathbb{E}||X^ \varepsilon_r-X^ \varepsilon_{s_m}||_{\infty}dr\Big)^{\frac{1}{2}}. \end{align} (5.16)

    Next, in Lemma 5.4, letting

    p = 4

    and

    \gamma_2 = \frac{1}{3}

    gives

    \begin{equation} \mathbb{E}||X^ \varepsilon_r-X^ \varepsilon_{s_m}||_{\infty}\leq\mathbb{E}\Big[\sup\limits_{\stackrel{s, t\in[0, T]}{|s-t| < \delta}}\sup\limits_{\theta\in[-\tau, 0]}|X^ \varepsilon_t(\theta)-X^ \varepsilon_s(\theta)|\Big]\leq K\delta^{\frac{1}{6}}. \end{equation} (5.17)

    Substituting (5.17) into (5.16) yields that

    \begin{align} & I_{12, 1}\leq K(N+1)\delta\frac{\delta^{\frac{7}{12}}}{ \varepsilon^{\frac{1}{2}}}. \end{align} (5.18)

    According to (4.4), one can derive that

    \begin{align} I_{12, 2} &\leq K\mathbb{E}\Big[\sum\limits_{m = 0}^{N}\int_{s_m}^{s_{m+1}}\mathbb{E}\Big(|b(X^ \varepsilon_{s_m}, \tilde{\alpha}^m(r))-\bar{b}(X^ \varepsilon_{s_m})|\Big|\mathcal{F}^ \varepsilon_{s_m}\Big)dr\Big]\\ &\leq K\mathbb{E}\Big[\sum\limits_{m = 0}^{N}\int_{s_m}^{s_{m+1}}(1+||X^ \varepsilon_{s_m}||_{\infty})\mathbb{E}\Big(\Big|\Big|P^{X^ \varepsilon_{s_m}}_{\frac{r-s_m}{ \varepsilon}}(\alpha^ \varepsilon(s_m), \cdot)-\pi^{X^ \varepsilon_{s_m}}\Big|\Big|_{var}\Big|\mathcal{F}^ \varepsilon_{s_m}\Big)dr\Big]\\ & = K\mathbb{E}\Big[\sum\limits_{m = 0}^{N} \varepsilon\int_{0}^{\frac{\delta}{ \varepsilon}}(1+||X^ \varepsilon_{s_m}||_{\infty})\mathbb{E}(||P^{X^ \varepsilon_{s_m}}_{r}(\alpha^ \varepsilon(s_m), \cdot)-\pi^{X^ \varepsilon_{s_m}}||_{var}|\mathcal{F}^ \varepsilon_{s_m})dr\Big]\\ &\leq K \varepsilon(N+1)\frac{L_3}{\lambda}(1-e^{-\frac{\delta}{ \varepsilon}\lambda}). \end{align} (5.19)

    For the same interval division [s_m, s_{m+1}], m = 1, 2, \dots, N,

    \begin{align} I^{k, l}_2&\leq\sum\limits_{m = 0}^{N}\mathbb{E}\Big(\int_{s_m}^{s_{m+1}}|\sigma_k(X^ \varepsilon_r, \alpha^ \varepsilon(r))\sigma_l(X^ \varepsilon_r, \alpha^ \varepsilon(r))f_{x_kx_l}(X^ \varepsilon(r))\\ &\quad-\sigma_k(X^ \varepsilon_{s_m}, \alpha^ \varepsilon(r))\sigma_l(X^ \varepsilon_{s_m}, \alpha^ \varepsilon(r))f_{x_kx_l}(X^ \varepsilon(s_m))|dr\Big)\\ &\quad+\sum\limits_{m = 0}^{N}\mathbb{E}\Big(\int_{s_m}^{s_{m+1}}|\sigma_k(X^ \varepsilon_{s_m}, \alpha^ \varepsilon(r))\sigma_l(X^ \varepsilon_{s_m}, \alpha^ \varepsilon(r))f_{x_kx_l}(X^ \varepsilon(s_m))\\ &\quad -\bar{\sigma}_k(X^ \varepsilon_{s_m})\bar{\sigma}_l(X^ \varepsilon_{s_m})f_{x_kx_l}(X^ \varepsilon(s_m))|dr\Big)\\ &\quad+\sum\limits_{m = 0}^{N}\mathbb{E}\Big(\int_{s_m}^{s_{m+1}}|\bar{\sigma}_k(X^ \varepsilon_{s_m})\bar{\sigma}_l(X^ \varepsilon_{s_m})f_{x_kx_l}(X^ \varepsilon(s_m))-\bar{\sigma}_k(X_r)\bar{\sigma}_l(X_r)f_{x_kx_l}(X(r))|dr\Big)\\ & = :I^{k, l}_{2, 1}+I^{k, l}_{2, 2}+I^{k, l}_{2, 3}. \end{align} (5.20)

    By the Hölder inequality, assumption (A2), Theorem 2.2, and the fact f\in C_0^\infty(\mathbb{R}^n, \mathbb{R}) , combined with the result that \Lambda_{\delta, \varepsilon}\stackrel{\mathbb{P}}{\longrightarrow}0 uniformly with respect to \varepsilon , as \delta\rightarrow0 , we have

    \begin{align} I^{k, l}_{2, 1}&\leq\sum\limits_{m = 0}^{N}\mathbb{E}\Big(\int_{s_m}^{s_{m+1}}|\sigma_k(X^ \varepsilon_r, \alpha^ \varepsilon(r))\sigma_l(X^ \varepsilon_r, \alpha^ \varepsilon(r))f_{x_kx_l}(X^ \varepsilon(r))\\ &\quad-\sigma_k(X^ \varepsilon_{s_m}, \alpha^ \varepsilon(r))\sigma_l(X^ \varepsilon_r, \alpha^ \varepsilon(r))f_{x_kx_l}(X^ \varepsilon(r))|dr\Big)\\ &\quad+\sum\limits_{m = 0}^{N}\mathbb{E}\Big(\int_{s_m}^{s_{m+1}}|\sigma_k(X^ \varepsilon_{s_m}, \alpha^ \varepsilon(r))\sigma_l(X^ \varepsilon_r, \alpha^ \varepsilon(r))f_{x_kx_l}(X^ \varepsilon(r))\\ &\quad-\sigma_k(X^ \varepsilon_{s_m}, \alpha^ \varepsilon(r))\sigma_l(X^ \varepsilon_{s_m}, \alpha^ \varepsilon(r))f_{x_kx_l}(X^ \varepsilon(r))|dr\Big)\\ &\quad+\sum\limits_{m = 0}^{N}\mathbb{E}\Big(\int_{s_m}^{s_{m+1}}|\sigma_k(X^ \varepsilon_{s_m}, \alpha^ \varepsilon(r))\sigma_l(X^ \varepsilon_{s_m}, \alpha^ \varepsilon(r))f_{x_kx_l}(X^ \varepsilon(r))\\ &\quad-\sigma_k(X^ \varepsilon_{s_m}, \alpha^ \varepsilon(r))\sigma_l(X^ \varepsilon_{s_m}, \alpha^ \varepsilon(r))f_{x_kx_l}(X^ \varepsilon(s_m))|dr\Big)\\ &\leq K(N+1)\delta(\mathbb{E}\Lambda_{\delta, \varepsilon}^2)^\frac{1}{2}\rightarrow0, \end{align} (5.21)

    as \delta\rightarrow0, where the convergence is derived from the Lebesgue dominated convergence theorem.

    According to (4.13) and (4.14), it follows that

    \begin{align*} I^{k, l}_{2, 3}&\leq\sum\limits_{m = 0}^{N}\mathbb{E}\Big(\int_{s_m}^{s_{m+1}}|\bar{\Sigma}_{k, l}(X^ \varepsilon_{s_m})f_{x_kx_l}(X^ \varepsilon(s_m))-\bar{\Sigma}_{k, l}(X^ \varepsilon_{s_m})f_{x_kx_l}(X(r))|dr\Big)\\ &\quad+\sum\limits_{m = 0}^{N}\mathbb{E}\Big(\int_{s_m}^{s_{m+1}}|\bar{\Sigma}_{k, l}(X^ \varepsilon_{s_m})f_{x_kx_l}(X(r))-\bar{\Sigma}_{k, l}(X_r)f_{x_kx_l}(X(r))|dr\Big)\\ &\leq K\sum\limits_{m = 0}^{N}\int_{s_m}^{s_{m+1}}(1+\mathbb{E}||X^ \varepsilon_{s_m}||_{\infty}^4+\mathbb{E}||X_r||_{\infty}^4)^{\frac{1}{2}}(\mathbb{E}||X^ \varepsilon_{s_m}-X_r||_{\infty}^2)^{\frac{1}{2}}dr, \end{align*}

    which, combined with (5.13), gives

    \begin{align} & I^{k, l}_{2, 3}\leq K\delta(N+1)(\mathbb{E}\Gamma_{\delta, \varepsilon}^2)^{\frac{1}{2}}\rightarrow0, \mbox{ as }\ \delta, \varepsilon\rightarrow0. \end{align} (5.22)

    As for I^{k, l}_{2, 2} , recalling the definition of \tilde{\alpha}^m(t) , we have

    \begin{align*} I^{k, l}_{2, 2}&\leq\sum\limits_{m = 0}^{N}\mathbb{E}\Big(\int_{s_m}^{s_{m+1}}|\sigma_k(X^ \varepsilon_{s_m}, \alpha^ \varepsilon(r))\sigma_l(X^ \varepsilon_{s_m}, \alpha^ \varepsilon(r))-\sigma_k(X^ \varepsilon_{s_m}, \tilde{\alpha}^m(r))\sigma_l(X^ \varepsilon_{s_m}, \tilde{\alpha}^m(r))|dr\Big)\\ &\quad+\sum\limits_{m = 0}^{N}\mathbb{E}\Big(\int_{s_m}^{s_{m+1}}|\sigma_k(X^ \varepsilon_{s_m}, \tilde{\alpha}^m(r))\sigma_l(X^ \varepsilon_{s_m}, \tilde{\alpha}^m(r))-\bar{\sigma}(X^ \varepsilon_{s_m})\bar{\sigma}(X^ \varepsilon_{s_m})|dr\Big)\\ & = :I^{k, l}_{2, 21}+I^{k, l}_{2, 22}. \end{align*}

    Similar to (5.16), using (5.15) yields that

    \begin{align} I^{k, l}_{2, 21}&\leq\sum\limits_{m = 0}^{N}K\Big(\int_{s_m}^{s_{m+1}}\mathbb{E}(1+||X^ \varepsilon_{s_m}||_{\infty}^2)^2\Big)^{\frac{1}{2}}\Big(\mathbb{E}\int_{s_{m}}^{s_{m+1}}\mathbb{E}[\mathbb{I}_{\{\alpha^{ \varepsilon}(r)\neq\tilde{\alpha}^m(r)\}}|\mathcal{F}^ \varepsilon_{s_m}]dr\Big)^{\frac{1}{2}}\\ &\leq K(N+1)\delta\frac{\delta^{\frac{7}{12}}}{ \varepsilon^{\frac{1}{2}}}. \end{align} (5.23)

    Furthermore, as a result of (4.4), we obtain that

    \begin{align} I^{k, l}_{2, 22}&\leq\sum\limits_{m = 0}^{N}\mathbb{E}\Big(\int_{s_m}^{s_{m+1}}(1+||X^ \varepsilon_{s_m}||_{\infty})^2\mathbb{E}\Big(\Big|\Big|P^{X^ \varepsilon_{s_m}}_{\frac{r-s_m}{ \varepsilon}}(\alpha^ \varepsilon(s_m), \cdot)-\pi^{X^ \varepsilon_{s_m}}\Big|\Big|_{var}\Big|\mathcal{F}^ \varepsilon_{s_m}\Big)dr\Big)\\ & = K\mathbb{E}\Big[\sum\limits_{m = 0}^{N} \varepsilon\int_{0}^{\frac{\delta}{ \varepsilon}}(1+||X^ \varepsilon_{s_m}||_{\infty})^2\mathbb{E}(||P^{X^ \varepsilon_{s_m}}_{r}(\alpha^ \varepsilon(s_m), \cdot)-\pi^{X^ \varepsilon_{s_m}}||_{var}|\mathcal{F}^ \varepsilon_{s_m})dr\Big]\\ &\leq K \varepsilon(N+1)\frac{L_3}{\lambda}(1-e^{-\frac{\delta}{ \varepsilon}\lambda}). \end{align} (5.24)

    Finally, according to (5.12), (5.14), (5.18), (5.19), and (5.21)–(5.24), we can get

    \begin{align*} I_1+I_2&\leq K(N+1)\delta[(\mathbb{E}\Lambda_{\delta, \varepsilon}^2)^\frac{1}{2}+(\mathbb{E}\Gamma_{\delta, \varepsilon}^2)^{\frac{1}{2}}+\delta^{\frac{7}{12}} \varepsilon^{-\frac{1}{2}}]+K \varepsilon(N+1)\frac{L_3}{\lambda}(1-e^{-\frac{\delta}{ \varepsilon}\lambda}). \end{align*}

    To obtain the desired conclusion, let \delta = \varepsilon^{\frac{11}{12}} in the inequality above.

    The two examples provided in this section cannot be validated using results from the classical literature due to the presence of past-dependent switching. We will verify them one by one to ensure they meet the assumptions proposed in this paper, allowing us to derive the averaged equations from Theorem 5.5. To proceed, consider a special two-state switching process with generator Q , that is,

    S = \{1, 2\}.

    Set

    Q = \left(\begin{array}{cc} -b_1&b_1 \\ a_1& -a_1 \end{array}\right).

    It is easy to obtain that the stationary distribution is

    \nu: = (\frac{a_1}{a_1+b_1}, \frac{b_1}{a_1+b_1}).

    Example 6.1. Consider the following one-dimensional two-timescale stochastic integral differential equation:

    \begin{equation} dX^ \varepsilon(t) = A(\alpha^ \varepsilon(t))f\Big(\int_{-\tau}^{0}X^ \varepsilon(t+\theta)d\theta\Big)dt+B(\alpha^ \varepsilon(t))g\Big(\int_{-\tau}^{0}X^ \varepsilon(t+\theta)d\theta\Big)dB_1(t), \end{equation} (6.1)

    where B_1(t) is a standard Brownian motion,

    X^ \varepsilon_0 = \xi\in C([-\tau, 0];\mathbb{R}^n)

    is nonrandom and satisfies the Lipschitz property,

    \begin{align*} \alpha^ \varepsilon(t)& = \alpha(t/ \varepsilon), \\ \alpha^ \varepsilon(0)& = 1, \end{align*}

    and \alpha(t) is a pure jump process taking value in \{1, 2\} and equipping generator

    \tilde{Q}(\phi) = \left(\begin{array}{ll} -\Big(a+b\cos^2\Big(\int_{-\tau}^{0}\phi(\theta)d\theta\Big)\Big) &a+b\cos^2\Big(\int_{-\tau}^{0}\phi(\theta)d\theta\Big) \\ c+d\cos^2\Big(\int_{-\tau}^{0}\phi(\theta)d\theta\Big) &-\Big(c+d\cos^2\Big(\int_{-\tau}^{0}\phi(\theta)d\theta\Big)\Big) \end{array}\right),

    for \phi\in C([-\tau, 0];\mathbb{R}^n) and a, b, c, d > 0 .

    Clearly, for

    \phi, \psi\in C([-\tau, 0];\mathbb{R}^n),
    \begin{align*} ||\tilde{Q}(\phi)-\tilde{Q}(\psi)||_{l_1}&\leq\max\{b, d\}\Big|\cos^2\Big(\int_{-\tau}^{0}\phi(\theta)d\theta\Big)-\cos^2\Big(\int_{-\tau}^{0}\psi(\theta)d\theta\Big)\Big|\\ &\leq2\tau\max\{b, d\}||\phi-\psi||_{\infty}, \end{align*}

    which implies that \tilde{Q}(\phi) is Lipschitz continuous. According to the beginning of this section, the stationary distribution corresponding to \tilde{Q}(\phi) is

    \begin{align*} \tilde{\nu}(\phi)& = (\tilde{\nu}_1(\phi), \tilde{\nu}_2(\phi))\\ &: = \Bigg(\frac{c+d\cos^2\Big(\int_{-\tau}^{0}\phi(\theta)d\theta\Big)}{a+c+(b+d)\cos^2\Big(\int_{-\tau}^{0}\phi(\theta)d\theta\Big)}, \frac{a+b\cos^2\Big(\int_{-\tau}^{0}\phi(\theta)d\theta\Big)}{a+c+(b+d)\cos^2\Big(\int_{-\tau}^{0}\phi(\theta)d\theta\Big)}\Bigg). \end{align*}

    Furthermore, let

    f:\mathbb{R}\longmapsto\mathbb{R}\ \ \mbox{ and } \ \ g:\mathbb{R}\longmapsto\mathbb{R}

    be Borel measurable and A(1) , A(2) , B(1) , and B(2) be both constants and

    B(1) = -B(2).

    Assume that there exists a positive K such that for any x, y\in\mathbb{R} ,

    |f(x)-f(y)|\vee|g(x)-g(y)|\leq K|x-y|

    and

    |f(x)|\leq K.

    By virtue of (4.15), we can get

    \begin{equation} dX(t) = F(X_t)dt+G(X_t)dB_2(t), \end{equation} (6.2)

    where B_2(t) is a standard Brownian motion. Meanwhile, for \phi\in C([-\tau, 0];\mathbb{R}^n) ,

    \begin{equation} F(\phi) = (A(1)\tilde{\nu}_1(\phi)+A(2)\tilde{\nu}_2(\phi))f\Big(\int_{-\tau}^{0}\phi(\theta)d\theta\Big)\nonumber \end{equation}

    and

    \begin{equation} G(\phi) = |B(1)|\Big|g\Big(\int_{-\tau}^{0}\phi(\theta)d\theta\Big)\Big|.\nonumber \end{equation}

    It is easy to obtain that \psi, \phi\in C([-\tau, 0];\mathbb{R}^n) ,

    \begin{equation} |G(\phi)-G(\psi)|\leq|B(1)|\Big|\int_{-\tau}^{0}(\phi(\theta)-\psi(\theta))d\theta\Big|\leq K||\phi-\psi||_{\infty}. \end{equation} (6.3)

    By calculation, it can be concluded that for \psi, \phi\in C([-\tau, 0];\mathbb{R}^n) ,

    \begin{align*} |\tilde{\nu}_1(\phi)-\tilde{\nu}_1(\psi)|&\leq\frac{c(b+d)+d(a+c)}{(a+c)^2}\Big|\cos^2\Big(\int_{-\tau}^{0}\phi(\theta)d\theta\Big)-\cos^2\Big(\int_{-\tau}^{0}\psi(\theta)d\theta\Big)\Big| \\ &\leq\tau\frac{c(b+d)+d(a+c)}{(a+c)^2}||\phi-\psi||_{\infty} \end{align*}

    and

    |\tilde{\nu}_1(\phi)|\leq\frac{c+d}{a+c}.

    Similarly,

    \begin{align*} |\tilde{\nu}_2(\phi)-\tilde{\nu}_2(\psi)|\leq\tau\frac{a(b+d)+b(a+c)}{(a+c)^2}||\phi-\psi||_{\infty} \end{align*}

    and

    \begin{align*} |\tilde{\nu}_2(\phi)|\leq\frac{a+b}{a+c}. \end{align*}

    Due to the boundedness and the Lipschitz property of f(\cdot) , for \psi, \phi\in C([-\tau, 0];\mathbb{R}^n) ,

    \begin{align} |F(\phi)-F(\psi)|&\leq \Big|f\Big(\int_{-\tau}^{0}\phi(\theta)d\theta\Big)\Big|[|A(1)||\tilde{\nu}_1(\phi)-\tilde{\nu}_2(\psi)|+ |A(2)||\tilde{\nu}_2(\phi)-\tilde{\nu}_2(\psi)|]\\ &\quad+|A(1)\tilde{\nu}_1(\psi)+A(2)\tilde{\nu}_2(\psi)|\Big|f\Big(\int_{-\tau}^{0}\phi(\theta)d\theta\Big)-f\Big(\int_{-\tau}^{0}\psi(\theta)d\theta\Big)\Big|\\ &\leq K||\phi-\psi||_{\infty}. \end{align} (6.4)

    This, togeher with (6.3) and (6.4) yields the existence and uniqueness of the solution to (4.15). Finally, according to Theorem 5.5, the limit of any weakly convergent subsequence of the solution to (6.1) satisfies (6.2).

    Example 6.2. Consider a one-dimensional two-timescale stochastic delay differential equation:

    \begin{align} dX^ \varepsilon(t) = &(A(\alpha^ \varepsilon(t))+B(\alpha^ \varepsilon(t))h(X^ \varepsilon(t-\tau)))dt\\ &+(C(\alpha^ \varepsilon(t))+D(\alpha^ \varepsilon(t))r(X^ \varepsilon(t-\tau)))dB_3(t), \end{align} (6.5)

    where B_3(t) is a standard Brownian motion,

    X^ \varepsilon_0 = \xi\in C([-\tau, 0];\mathbb{R}^n)

    is nonrandom and satisfies the Lipschitz property,

    \alpha^ \varepsilon(t) = \alpha(t/ \varepsilon), \ \ \ \alpha^ \varepsilon(0) = 1

    and \alpha(t) is a pure jump process taking value in \{1, 2\} and equipping generator

    \hat{Q}(\phi) = \left(\begin{array}{ll} -(a+b\cdot e^{-\sup\nolimits_{-\tau\leq\theta\leq0}|\phi(\theta)|}) &a+b\cdot e^{-\sup\nolimits_{-\tau\leq\theta\leq0}|\phi(\theta)|} \\ c+d\cdot e^{-\sup\nolimits_{-\tau\leq\theta\leq0}|\phi(\theta)|} &-(c+d\cdot e^{-\sup\nolimits_{-\tau\leq\theta\leq0}|\phi(\theta)|}) \end{array}\right),

    for \phi\in C([-\tau, 0];\mathbb{R}^n) and a, b, c, d > 0 .

    Clearly, \hat{Q}(\cdot) satisfies assumputions (A3) and (A4). Next, let us verify that (6.5) satisfies the assumption (A5). For \phi, \psi\in C([-\tau, 0];\mathbb{R}^n) ,

    \begin{align*} ||\hat{Q}(\phi)-\hat{Q}(\psi)||_{l_1}&\leq\max\{b, d\}\cdot\Big|e^{-\sup\nolimits_{-\tau\leq\theta\leq0}|\phi(\theta)|}-e^{-\sup\nolimits_{-\tau\leq\theta\leq0}|\psi(\theta)|}\Big|\\ &\leq\max\{b, d\}\cdot\Big|\sup\limits_{-\tau\leq\theta\leq0}|\phi(\theta)|-\sup\limits_{-\tau\leq\theta\leq0}|\phi(\theta)|\Big|\\ &\leq\max\{b, d\}\cdot\sup\limits_{-\tau\leq\theta\leq0}|\phi(\theta)-\psi(\theta)|. \end{align*}

    According to the beginning of this section, the stationary distribution corresponding to \hat{Q}(\phi) is

    \begin{align*} \hat{\nu}(\phi)& = (\hat{\nu}_1(\phi), \hat{\nu}_2(\phi))\\ &: = \Bigg(\frac{c+d\cdot e^{-\sup\nolimits_{-\tau\leq\theta\leq0}|\phi(\theta)|}}{a+c+(b+d)\cdot e^{-\sup\nolimits_{-\tau\leq\theta\leq0}|\phi(\theta)|}}, \frac{a+b\cdot e^{-\sup\nolimits_{-\tau\leq\theta\leq0}|\phi(\theta)|}}{a+c+(b+d)\cdot e^{-\sup\nolimits_{-\tau\leq\theta\leq0}|\phi(\theta)|}}\Bigg). \end{align*}

    By calculation, it follows that for \psi, \phi\in C([-\tau, 0];\mathbb{R}) ,

    \begin{align*} |\hat{\nu}_1(\phi)-\hat{\nu}_1(\psi)|&\leq\frac{c(b+d)+d(a+c)}{(a+c)^2}\Big|e^{-\sup\nolimits_{-\tau\leq\theta\leq0}|\phi(\theta)|}-e^{-\sup\nolimits_{-\tau\leq\theta\leq0}|\psi(\theta)|}\Big| \\ &\leq\frac{c(b+d)+d(a+c)}{(a+c)^2}||\phi-\psi||_{\infty} \end{align*}

    and

    |\hat{\nu}_1(\phi)|\leq\frac{c+d}{a+c}.

    Similarly,

    \begin{align*} |\hat{\nu}_2(\phi)-\hat{\nu}_2(\psi)|\leq\frac{a(b+d)+b(a+c)}{(a+c)^2}||\phi-\psi||_{\infty} \end{align*}

    and

    |\hat{\nu}_2(\phi)|\leq\frac{a+b}{a+c}.

    Furthermore, let

    h:C([-\tau, 0];\mathbb{R})\longmapsto\mathbb{R} \ \ \ \mbox{ and }\ \ \ r:C([-\tau, 0];\mathbb{R})\longmapsto\mathbb{R}

    be both Borel measurable and A(i), B(i), C(i), D(i), i = 1, 2 be both constants and

    C(1) = -C(2), \ \ D(1) = -D(2).

    Assume that there exists a positive constant K such that for any \phi and \psi\in C([-\tau, 0];\mathbb{R}) ,

    |h(\phi(-\tau))-h(\psi(-\tau))|\vee|r(\phi(-\tau))-r(\psi(-\tau))|\leq K||\phi-\psi||_{\infty}

    and

    |h(\phi(-\tau))|\leq K.

    According to (4.15), one can derive that

    \begin{equation} dX(t) = H(X_t)dt+R(X_t)dB_4(t), \end{equation} (6.6)

    where B_4(t) is a standard Brownian motion and for any \phi\in C([-\tau, 0];\mathbb{R}^n) ,

    \begin{equation} H(\phi(-\tau)) = [A(1)+B(1)h(\phi(-\tau))]\hat{\nu}_1(\phi)+[A(2)+B(2)h(\phi(-\tau))]\hat{\nu}_2(\phi)\nonumber \end{equation}

    and

    \begin{equation} R(\phi(-\tau)) = |C(1)+D(1)r(\phi(-\tau))|\nonumber. \end{equation}

    Similar to (6.1), according to the definition of \hat{\nu}(\cdot) , r(\cdot) , and h(\cdot) , for \psi, \phi\in C([-\tau, 0];\mathbb{R}^n) ,

    |H(\phi)-H(\psi)|\vee|R(\phi)-R(\psi)|\leq K||\phi-\psi||_{\infty}.

    which yields the existence and uniqueness of the solution to (6.6). Finally, according to Theorem 5.5, the limit of any weakly convergent subsequence of the solution to (6.5) satisfies (6.6).

    The article overcomes the difficulties arising from past-dependent switching and the presence of delay terms in continuous dynamic equations. Under the Lipschitz condition defined by the uniform norm, it establishes for the first time the averaging principle for stochastic functional differential equations with past-dependent switching. Inspired by reference [34], there will be a focus on how to establish the averaging principle of this system under non-Lipschitz conditions in future research. In addition, the numerical simulation and stability analysis issues related to this model will also be explored in future research.

    Minyu Wu: conceptualization, methodology, investigation, writing–original draft, writing–review and editing; Xizhong Yang: conceptualization, methodology and validation; Feiran Yuan: conceptualization, validation, writing–review and editing; Xuyi Qiu: supervision, writing–review and editing. All authors have read and agreed to the published version of the manuscript.

    The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

    Here, we would like to express our sincere gratitude to Professor Shao Jinghai and Professor Wu Fuke for their guidance and care during the initial stage of writing this paper.

    The authors declare no conflicts of interest.



    [1] J. L. W. V. Jensen, Sur les fonctions convexes et les inégalits entre les valeurs moyennes, Acta Math., 30 (1906), 175–193. doi: 10.1007/BF02418571
    [2] S. S. Dragomir, Bounds for the normalized Jensen functional, Bull. Aust. Math. Soc., 74 (2006), 471–478. doi: 10.1017/S000497270004051X
    [3] L. Horvath, K. A. Khan, J. Pecaric, Cyclic refinements of the discrete and integral form of Jensen's inequality with applications, Analysis, 36 (2016), 253–262.
    [4] L. Horvath, D. Pecaric, J. Pecaric, Estimations of f-and Renyi divergences by using a cyclic refinement of the Jensen's inequality, Bull. Malays. Math. Sci. Soc., 42 (2019), 933–946. doi: 10.1007/s40840-017-0526-4
    [5] J. Jaksetic, D. Pecaric, J. Pecaric, Some properties of Zipf-Mandelbrot law and Hurwitz-function, Math. Inequal. Appl., 21 (2018), 575–584.
    [6] J. Jaksetic, J. Pecaric, Exponential convexity method, J. Convex Anal., 20 (2013), 181–197.
    [7] C. Niculescu, L. E. Persson, Convex functions and their applications, New York: Springer-Verlag, 2006.
    [8] J. E. Pecaric, F. Proschan, Y. L. Tong, Convex functions, partial orderings and statistical applications, New York: Academic Press, 1992.
    [9] S. Varošanec, On h-convexity, J. Math. Anal. Appl., 326 (2007), 303–311.
    [10] İ. İşcan, Hermite-Hadamard type inequaities for harmonically functions, Hacet. J. Math. Stat., 43 (2014), 935–942.
    [11] İ. İşcan, Hermite-Hamard type inequalities for harmonically (\alpha, m)-convex functions, 2015. Avaliable from: https://arXiv.org/pdf/1307.5402v3.
    [12] I. A. Baloch, I. Işcan, S. S. Dragomir, Fej\acute{e}r's type inequalities for harmonically (s, m)-convex functions, Int. J. Anal. Appl., 12 (2016), 188–197.
    [13] I. A. Baloch, I. Işcan, Some Ostrowski type inequalities for harmonically (s, m)-convex functions in Second Sense, Int. J. Anal., 2015 (2015), 672675.
    [14] I. A. Baloch, I. Işcan, Some Hermite-Hadamard type integral inequalities for Harmonically (p, (s, m))-convex functions, J. Inequal. Spec. Funct., 8 (2017), 65–84.
    [15] M. U. Awan, N. Akhtar, S. Iftikhar, M. A. Noor, Y. M. Chu, New Hermite-Hadamard type inequalities for n-polynomial harmonically convex functions, J. Inequal. Appl., 2020 (2020), 125. doi: 10.1186/s13660-020-02393-x
    [16] I. A. Baloch, Y. M. Chu, Petrović-type inequalities for harmonic h-convex functions, J. Funct. Space., 2020 (2020), 3075390.
    [17] T. Abdeljawad, S. Rashid, Z. Hammouch, Y. M. Chu, Some new local fractional inequalities associated with generalized (s, m)-convex functions and applications, Adv. Differ. Equ., 2020 (2020), 406. doi: 10.1186/s13662-020-02865-w
    [18] I. A. Baloch, I. Işcan, Integral inequalities for differentiable harmonically (s, m)-preinvex functions, Open J. Math. Anal., 1 (2017), 25–33.
    [19] I. A. Baloch, S. S. Dragomir, New inequalities based on harmonic log-convex functions, Open J. Math. Anal., 3 (2019), 103–105. doi: 10.30538/psrp-oma2019.0043
    [20] X. Z. Yang, G. Farid, W. Nazeer, M. Yussouf, Y. M. Chu, C. F. Dong, Fractional generalized Hadamard and Fejér-Hadamard inequalities for m-convex function, AIMS Mathematics., 5 (2020), 6325–6340. doi: 10.3934/math.2020407
    [21] S. H. Wu, I. A. Baloch, I. Iscan, On Harmonically (p, h, m)-preinvex functions, J. Funct. Space., 2017 (2017), 2148529.
    [22] I. A. Baloch, New ostrowski type inequalities for functions whose derivatives are p-preinvex, J. New Theory, 16 (2017), 68–79.
    [23] I. A. Baloch, M. Bohner, M. D. L. Sen, Petrovic-type inequalities for harmonic convex functions on coordinates, J. Inequal. Spec. Funct., 11 (2020), 16–23.
    [24] S. Y. Guo, Y. M. Chu, G. Farid, S. Mehmood, W. Nazeer, Fractional Hadamard and Fejér-Hadamard inequaities associated with exponentially (s, m)-convex functions, J. Funct. Space., 2020 (2020), 2410385.
    [25] M. U. Awan, M. A. Noor, M. V. Mihai, K. I. Noor, A. G. Khan, Some new bounds for Simpson's rule involving special functions via harmonic h-convexity, J. Nonlinear Sci. Appl., 10 (2017), 1755–1766. doi: 10.22436/jnsa.010.04.37
    [26] M. R. Delavar, S. S. Dragomir, M. De La Sen, A note on characterization of h-convex functions via Hermite-Hadamard type inequality, Probl. Anal. Issues Anal., 8 (2019), 28–36.
    [27] I. A. Baloch, B. R. Ali, On new inequalities of Hermite-Hadamard type for functions whose fourth derivative absolute values are quasi-convex with applications, J. New Theory, 10 (2016), 76–85.
    [28] M. B. Khan, P. O. Mohammed, M. A. Noor, Y. S. Hamed, New Hermit Hadamard inequalities in fuzzy-interval fractional calculus and related inequalities, Symmetry, 13 (2021), 673. doi: 10.3390/sym13040673
    [29] M. A. Alqudah, A. Kashuri, P. O. Mohammed, T. Abdeljawad, M. Raees, M. Anwar, et al., Hermite Hadamard integral inequalities on coordinated convex functions in quantum calculus, Adv. Differ. Equ., 2021 (2021), 264. doi: 10.1186/s13662-021-03420-x
    [30] F. Al-Azemi, O. Calin, Asian options with harmonic average, Appl. Math. Inf. Sci., 9 (2015), 1–9.
    [31] I. A. Baloch, M. De La Sen, İ. İşcan, Characterizations of classes of harmonic convex functions and applications, Int. J. Anal. Appl., 17 (2019), 722–733.
    [32] F. X. Chen, S. H. Wu, Fej\acute{e}r and Hermite-Hadamard type inequalities for harmonically convex functions, J. Appl. Math., 2014 (2014), 386806.
    [33] I. A. Baloch, A. H. Mughal, Y. M. Chu, A. U. Haq, M. De la Sen, A variant of Jensen-type inequality and related results for harmonic convex functions, AIMS Mathematics, 5 (2020), 6404–6418. doi: 10.3934/math.2020412
    [34] I. A. Baloch, A. A. Mughal, Y. M. Chu, A. U. Haq, M. D. L. Sen, Improvement and generalization of some results related to the class of harmonically convex functions and applications, J. Math. Comput. Sci., 22 (2021), 282–294.
    [35] S. S. Dragomir, Inequalities of Jensen type for HA-convex functions, Analele Universitatii Oradea Fasc. Matematica, 27 (2020), 103–124.
    [36] A. A. Mughal, H. Almusawa, A. U. Haq, I. A. Baloch, Properties and bound of functionals related to Jensen-type inequalities via harmonic convex functions, In press.
  • Reader Comments
  • © 2021 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(2684) PDF downloads(121) Cited by(3)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog