Loading [MathJax]/jax/output/SVG/jax.js
Research article

A multi-scale cyclic-shift window Transformer object tracker based on fast Fourier transform

  • In recent years, Transformer-based object trackers have demonstrated exceptional performance in object tracking. However, traditional methods often employ single-scale pixel-level attention mechanisms to compute the correlation between templates and search regions, disrupting object's integrity and positional information. To address these issues, we introduce a cyclic-shift mechanism to expand the diversity of sample positions and replace the traditional single-scale pixel-level attention mechanism with a multi-scale window-level attention mechanism. This approach not only preserves the object's integrity but also enriches the diversity of samples. Nevertheless, the introduced cyclic-shift operation heavily burdens storage and computation. To this end, we treat the attention computation of shifted and static windows in the spatial domain as convolution. By leveraging the convolution theorem, we transform the attention computation of cyclic shift samples from the spatial domain to element-wise multiplication in the frequency domain. This approach enhances computational efficiency and reduces data storage requirements. We conducted extensive experiments on the proposed module. The results demonstrate that the proposed module outperforms multiple existing tracking algorithms regarding performance. Moreover, ablation studies show that the method effectively reduces the storage and computational burden without compromising performance.

    Citation: Huanyu Wu, Yingpin Chen, Changhui Wu, Ronghuan Zhang, Kaiwei Chen. A multi-scale cyclic-shift window Transformer object tracker based on fast Fourier transform[J]. Electronic Research Archive, 2025, 33(6): 3638-3672. doi: 10.3934/era.2025162

    Related Papers:

    [1] Jiani Jin, Haokun Qi, Bing Liu . Hopf bifurcation induced by fear: A Leslie-Gower reaction-diffusion predator-prey model. Electronic Research Archive, 2024, 32(12): 6503-6534. doi: 10.3934/era.2024304
    [2] Fengrong Zhang, Ruining Chen . Spatiotemporal patterns of a delayed diffusive prey-predator model with prey-taxis. Electronic Research Archive, 2024, 32(7): 4723-4740. doi: 10.3934/era.2024215
    [3] Ruizhi Yang, Dan Jin . Dynamics in a predator-prey model with memory effect in predator and fear effect in prey. Electronic Research Archive, 2022, 30(4): 1322-1339. doi: 10.3934/era.2022069
    [4] Yujia Xiang, Yuqi Jiao, Xin Wang, Ruizhi Yang . Dynamics of a delayed diffusive predator-prey model with Allee effect and nonlocal competition in prey and hunting cooperation in predator. Electronic Research Archive, 2023, 31(4): 2120-2138. doi: 10.3934/era.2023109
    [5] Xiaowen Zhang, Wufei Huang, Jiaxin Ma, Ruizhi Yang . Hopf bifurcation analysis in a delayed diffusive predator-prey system with nonlocal competition and schooling behavior. Electronic Research Archive, 2022, 30(7): 2510-2523. doi: 10.3934/era.2022128
    [6] Kimun Ryu, Wonlyul Ko . Global existence of classical solutions and steady-state bifurcation in a prey-taxis predator-prey system with hunting cooperation and a logistic source for predators. Electronic Research Archive, 2025, 33(6): 3811-3833. doi: 10.3934/era.2025169
    [7] Yichao Shao, Hengguo Yu, Chenglei Jin, Jingzhe Fang, Min Zhao . Dynamics analysis of a predator-prey model with Allee effect and harvesting effort. Electronic Research Archive, 2024, 32(10): 5682-5716. doi: 10.3934/era.2024263
    [8] Mengting Sui, Yanfei Du . Bifurcations, stability switches and chaos in a diffusive predator-prey model with fear response delay. Electronic Research Archive, 2023, 31(9): 5124-5150. doi: 10.3934/era.2023262
    [9] Yuan Tian, Yang Liu, Kaibiao Sun . Complex dynamics of a predator-prey fishery model: The impact of the Allee effect and bilateral intervention. Electronic Research Archive, 2024, 32(11): 6379-6404. doi: 10.3934/era.2024297
    [10] Wenbin Zhong, Yuting Ding . Spatiotemporal dynamics of a predator-prey model with a gestation delay and nonlocal competition. Electronic Research Archive, 2025, 33(4): 2601-2617. doi: 10.3934/era.2025116
  • In recent years, Transformer-based object trackers have demonstrated exceptional performance in object tracking. However, traditional methods often employ single-scale pixel-level attention mechanisms to compute the correlation between templates and search regions, disrupting object's integrity and positional information. To address these issues, we introduce a cyclic-shift mechanism to expand the diversity of sample positions and replace the traditional single-scale pixel-level attention mechanism with a multi-scale window-level attention mechanism. This approach not only preserves the object's integrity but also enriches the diversity of samples. Nevertheless, the introduced cyclic-shift operation heavily burdens storage and computation. To this end, we treat the attention computation of shifted and static windows in the spatial domain as convolution. By leveraging the convolution theorem, we transform the attention computation of cyclic shift samples from the spatial domain to element-wise multiplication in the frequency domain. This approach enhances computational efficiency and reduces data storage requirements. We conducted extensive experiments on the proposed module. The results demonstrate that the proposed module outperforms multiple existing tracking algorithms regarding performance. Moreover, ablation studies show that the method effectively reduces the storage and computational burden without compromising performance.



    In this paper, we consider the Schrödinger operators

    L=+V(x),xRn,n3,

    where Δ=ni=122xi and V(x) is a nonnegative potential belonging to the reverse Hölder class RHq for some qn2. Assume that f is a nonnegative locally Lq(Rn) integrable function on Rn, then we say that f belongs to RHq (1<q) if there exists a positive constant C such that the reverse Hölder's inequality

    (1|B(x,r)|B(x,r)|f(y)|qdy)1qC|B(x,r)|B(x,r)|f(y)|dy

    holds for x in Rn, where B(x,r) denotes the ball centered at x with radius r< [1]. For example, the nonnegative polynomial VRH, in particular, |x|2RH.

    Let the potential VRHq with qn2, and the critical radius function ρ(x) is defined as

    ρ(x)=supr>0{r:1rn2B(x,r)V(y)dy1},xRn. (1.1)

    We also write ρ(x)=1mV(x),xRn. Clearly, 0<mV(x)< when V0, and mV(x)=1 when V=1. For the harmonic oscillator operator (Hermite operator) H=Δ+|x|2, we have mV(x)(1+|x|).

    Thanks to the heat diffusion semigroup etL for enough good function f, the negative powers Lα2(α>0) related to the Schrödinger operators L can be written as

    Iαf(x)=Lα2f(x)=0etLf(x)tα21dt,0<α<n. (1.2)

    Applying Lemma 3.3 in [2] for enough good function f holds that

    Iαf(x)=RnKα(x,y)f(y)dy,0<α<n,

    and the kernel Kα(x,y) satisfies the following inequality

    Kα(x,y)Ck(1+|xy|(mV(x)+mV(y)))k1|xy|nα. (1.3)

    Moreover, we have Kα(x,y)C|xy|nα,0<α<n.

    Shen [1] obtained Lp estimates of the Schrödinger type operators when the potential VRHq with qn2. For Schrödinger operators L=Δ+V with VRHq for some qn2, Harboure et al. [3] established the necessary and sufficient conditions to ensure that the operators Lα2(α>0) are bounded from weighted strong and weak Lp spaces into suitable weighted BMOL(w) space and Lipschitz spaces when pnα. Bongioanni Harboure and Salinas proved that the fractional integral operator Lα/2 is bounded form Lp,(w) into BMOβL(w) under suitable conditions for weighted w [4]. For more backgrounds and recent progress, we refer to [5,6,7] and references therein.

    Ramseyer, Salinas and Viviani in [8] studied the fractional integral operator and obtained the boundedness from strong and weak Lp() spaces into the suitable Lipschitz spaces under some conditions on p(). In this article, our main interest lies in considering the properties of fractional integrals operator Lα2(α>0), related to L=Δ+V with VRHq for some qn2 in variable exponential spaces.

    We now introduce some basic properties of variable exponent Lebsegue spaces, which are used frequently later on.

    Let p():Ω[1,) be a measurable function. For a measurable function f on Rn, the variable exponent Lebesgue space Lp()(Ω) is defined by

    Lp()(Ω)={f:Ω|f(x)s|p(x)dx<},

    where s is a positive constant. Then Lp()(Ω) is a Banach space equipped with the follow norm

    fLp()(Ω):=inf{s>0:Ω|f(x)s|p(x)dx1}.

    We denote

    p:=essinfxΩp(x) and p+:=esssupxΩp(x).

    Let P(Rn) denote the set of all measurable functions p on Rn that take value in [1,), such that 1<p(Rn)p()p+(Rn)<.

    Assume that p is a real value measurable function p on Rn. We say that p is locally log-Hölder continuous if there exists a constant C such that

    |p(x)p(y)|Clog(e+1/|xy|),x,yRn,

    and we say p is log-Hölder continuous at infinity if there exists a positive constant C such that

    |p(x)p()|Clog(e+|x|),xRn,

    where p():=lim|x|p(x)R.

    The notation Plog(Rn) denotes all measurable functions p in P(Rn), which states p is locally log-Hölder continuous and log-Hölder continuous at infinity. Moreover, we have that p()Plog(Rn), which implies that p()Plog(Rn).

    Definition 1.1. [8] Assume that p() is an exponent function on Rn. We say that a measurable function f belongs to Lp(),(Rn), if there exists a constant C such that for t>0,

    Rntp(x)χ{|f|>t}(x)dxC.

    It is easy to check that Lp(),(Rn) is a quasi-norm space equipped with the following quasi-norm

    fp(),=inf{s>0:supt>0Rn(ts)p(x)χ{|f|>t}(x)dx1}.

    Next, we define LipLα,p() spaces related to the nonnegative potential V.

    Definition 1.2. Let p() be an exponent function with 1<pp+< and 0<α<n. We say that a locally integrable function fLipLα,p()(Rn) if there exist constants C1,C2 such that for every ball BRn,

    1|B|αnχBp()B|f(x)mBf|dxC1, (1.4)

    and for Rρ(x),

    1|B|αnχBp()B|f(x)|dxC2, (1.5)

    where mBf=1|B|Bf. The norm of space LipLα,p()(Rn) is defined as the maximum value of two infimum of constants C1 and C2 in (1.4) and (1.5).

    Remark 1.1. LipLα,p()(Rn)Lα,p()(Rn) is introduced in [8]. In particular, when p()=C for some constant, then LipLα,p()(Rn) is the usual weighted BMO space BMOβL(w), with w=1 and β=αnp [4].

    Remark 1.2. It is easy to see that for some ball B, the inequality (1.5) leads to inequality (1.4) holding, and the average mBf in (1.4) can be replaced by a constant c in following sense

    12fLipLα,p()supBRninfcR1|B|αnχBp()B|f(x)c|dxfLipLα,p().

    In 2013, Ramseyer et al. in [8] studied the Lipschitz-type smoothness of fractional integral operators Iα on variable exponent spaces when p+>αn. Hence, when p+>αn, it will be an interesting problem to see whether or not we can establish the boundedness of fractional integral operators Lα2(α>0) related to Schrödinger operators from Lebesgue spaces Lp() into Lipschitz-type spaces with variable exponents. The main aim of this article is to answer the problem above.

    We now state our results as the following two theorems.

    Theorem 1.3. Let potential VRHq for some qn/2 and p()Plog(Rn). Assume that 1<pp+<n(αδ0)+ where δ0=min{1,2n/q}, then the fractional integral operator Iα defined in (1.2) is bounded from Lp()(Rn) into LipLα,p()(Rn).

    Theorem 1.4. Let the potential VRHq with qn/2 and p()Plog(Rn). Assume that 1<pp+<n(αδ0)+ where δ0=min{1,2n/q}. If there exists a positive number r0 such that p(x)p when |x|>r0, then the fractional integral operator Iα defined in (1.2) is bounded from Lp(),(Rn) into LipLα,p()(Rn).

    To prove Theorem 1.3, we first need to decompose Rn into the union of some disjoint ball B(xk,ρ(xk))(k1) according to the critical radius function ρ(x) defined in (1.1). According to Lemma 2.6, we establish the necessary and sufficient conditions to ensure fLipLα,p()(Rn). In order to prove Theorem 1.3, by applying Corollary 1 and Remark 1.2, we only need to prove that the following two conditions hold:

    (ⅰ) For every ball B=B(x0,r) with r<ρ(x0), then

    B|Iαf(x)c|dxC|B|αnχBp()fp();

    (ⅱ) For any x0Rn, then

    B(x0,ρ(x0))Iα(|f|)(x)dxC|B(x0,ρ(x0))|αnχB(x0,ρ(x0))p()fp().

    In order to check the conditions (ⅰ) and (ⅱ) above, we need to find the accurate estimate of kernel Kα(x,y) of fractional integral operator Iα (see Lemmas 2.8 and 2.9, then use them to obtain the proof of this theorem; the proof of the Theorem 1.4 proceeds identically).

    The paper is organized as follows. In Section 2, we give some important lemmas. In Section 3, we are devoted to proving Theorems 1.3 and 1.4.

    Throughout this article, C always means a positive constant independent of the main parameters, which may not be the same in each occurrence. B(x,r)={yRn:|xy|<r}, Bk=B(x0,2kR) and χBk are the characteristic functions of the set Bk for kZ. |S| denotes the Lebesgue measure of S. fg means C1gfCg.

    In this section, we give several useful lemmas that are used frequently later on.

    Lemma 2.1. [9] Assume that the exponent function p()P(Rn). If fLp()(Rn) and gLp()(Rn), then

    Rn|f(x)g(x)|dxrpfLp()(Rn)gLp()(Rn),

    where rp=1+1/p1/p+.

    Lemma 2.2. [8] Assume that p()Plog(Rn) and 1<pp+<, and p(x)p() when |x|>r0>1. For every ball B and fLp(), we have

    B|f(x)|dxCfLp(),χBLp(),

    where the constant C only depends on r0.

    Fo the following lemma see Corollary 4.5.9 in [10].

    Lemma 2.3. Let p()Plog(Rn), then for every ball BRn we have

    χBp()|B|1p(x),if|B|2n,xB,

    and

    χBp()|B|1p(),if|B|1.

    Lemma 2.4. Assume that p()Plog(Rn), then for all balls B and all measurable subsets S:=B(x0,r0)B:=B(x1,r1) we have

    χSp()χBp()C(|S||B|)11p,   χBp()χSp()C(|B||S|)11p+. (2.1)

    Proof. We only prove the first inequality in (2.1), and the second inequality in (2.1) proceeds identically. We consider three cases below by applying Lemma 2.3, and it holds that

    1) if |S|<1<|B|, then χSp()χBp()|S|1p(xS)|B|1p()(|S||B|)1(p)+=(|S||B|)11p;

    2) if 1|S|<|B|, then χSp()χBp()|S|1p()|B|1p()(|S||B|)1(p)+=(|S||B|)11p;

    3) if |S|<|B|<1, then χSp()χBp()|S|1p(xS)|B|1p(xS)|B|1p(xS)1p(xB)C(|S||B|)1(p)+=C(|S||B|)11p, where xSS and xBB.

    Indeed, since |xBxS|2r1, by using the local-Hölder continuity of p(x) we have

    |1p(xS)1p(xB)|log1r1log1r1log(e+1|xSxB|)log1r1log(e+12r1)C.

    We end the proof of this lemma.

    Remark 2.1. Thanks to the second inequality in (2.1), it is easy to prove that

    χ2Bp()CχBp().

    Lemma 2.5. [1] Suppose that the potential VBq with qn/2, then there exists positive constants C and k0 such that

    1) ρ(x)ρ(y) when |xy|Cρ(x);

    2) C1ρ(x)(1+|xy|ρ(x))k0ρ(y)Cρ(x)(1+|xy|ρ(x))k0/(k0+1).

    Lemma 2.6. [11] There exists a sequence of points {xk}k=1 in Rn such that Bk:=B(xk,ρ(xk)) satisfies

    1) Rn=kBk,

    2) For every k1, then there exists N1 such that card {j:4Bj4Bk}N.

    Lemma 2.7. Assume that p()P(Rn) and 0<α<n. Let sequence {xk}k=1 satisfy the propositions of Lemma 2.6. Then a function fLipLα,p()(Rn) if and only if f satisfies (1.4) for every ball, and

    1|B(xk,ρ(xk))|αnχB(xk,ρ(xk))p()B(xk,ρ(xk))|f(x)|dxC,forallk1. (2.2)

    Proof. Let B:=B(x,R) denote a ball with center x and radius R>ρ(x). Noting that f satisfies (1.4), and thanks to Lemma 2.6 we obtain that the set G={k:BBk} is finite.

    Applying Lemma 2.5, if zBkB, we get

    ρ(xk)Cρ(z)(1+|xkz|ρ(xk))k0C2k0ρ(z)C2k0ρ(x)(1+|xz|ρ(x))k0k0+1C2k0ρ(x)(1+Rρ(x))C2k0R.

    Thus, for every kG, we have BkCB.

    Thanks to Lemmas 2.4 and 2.6, it holds that

    B|f(x)|dx=BkBk|f(x)|dx=kG(BBk)|f(x)|dxkGBBk|f(x)|dxkGBk|f(x)|dxCkG|Bk|αnχBkp()C|B|αnχBp().

    The proof of this lemma is completed.

    Corollary 1. Assume that p()P(Rn) and 0<α<n, then a measurable function fLipLα,p() if and only if f satisfies (1.4) for every ball B(x,R) with radius R<ρ(x) and

    1|B(x,ρ(x))|αnχB(x,ρ(x))p()B(x,ρ(x))|f(x)|dxC. (2.3)

    Let kt(x,y) denote the kernel of heat semigroup etL associated to L, and Kα(x,y) be the kernel of fractional integral operator Iα, then it holds that

    Kα(x,y)=0kt(x,y)tα2dt. (2.4)

    Some estimates of kt are presented below.

    Lemma 2.8. [12] There exists a constant C such that for N>0,

    kt(x,y)Ctn/2e|xy|2Ct(1+tρ(x)+tρ(y))N,x,yRn.

    Lemma 2.9. [13] Let 0<δ<min(1,2nq). If |xx0|<t, then for N>0 the kernel kt(x,y) defined in (2.4) satisfies

    |kt(x,y)kt(x0,y)|C(|xx0|t)δtn/2e|xy|2Ct(1+tρ(x)+tρ(y))N,

    for all x,y and x0 in Rn.

    In this section, we are devoted to the proof of Theorems 1.3 and 1.4. To prove Theorem 1.3, thanks to Corollary 1 and Remark 1.2, we only need to prove that the following two conditions hold:

    (ⅰ) For every ball B=B(x0,r) with r<ρ(x0), then

    B|Iαf(x)c|dxC|B|αnχBp()fp();

    (ⅱ) For any x0Rn, then

    B(x0,ρ(x0))Iα(|f|)(x)dxC|B(x0,ρ(x0))|αnχB(x0,ρ(x0))p()fp().

    We now begin to check that these conditions hold. First, we prove (ⅱ).

    Assume that B=B(x0,R) and R=ρ(x0). We write f=f1+f2, where f1=fχ2B and f2=fχRn2B. Hence, by the inequality (1.3), we have

    BIα(|f1|)(x)dx=BIα(|fχ2B|)(x)dxCB2B|f(y)||xy|nαdydx.

    Applying Tonelli theorem, Lemma 2.1 and Remark 1.2, we get the following estimate

    BIα(|f1|)(x)dxC2B|f(y)|Bdx|xy|nαdyCRα2B|f(y)|dyC|B|αnχBp()fp(). (3.1)

    To deal with f2, let xB and we split Iαf2 as follows:

    Iαf2(x)=R20etLf2(x)tα21dt+R2etLf2(x)tα21dt:=I1+I2.

    For I1, if xB and yRn2B, we note that |x0y|<|x0x|+|xy|<C|xy|. By Lemma 2.8, it holds that

    I1=|R20Rn2Bkt(x,y)f(y)dytα21dt|CR20Rn2Btn2e|xy|2t|f(y)|dytα21dtCR20tn+α21Rn2B(t|xy|2)M/2|f(y)|dydtCR20tMn+α21dtRn2B|f(y)||x0y|Mdy,

    where the constant C only depends the constant M.

    Applying Lemma 2.1 to the last integral, we get

    Rn2B|f(y)||x0y|Mdy=i=12i+1B2iB|f(y)||x0y|Mdyi=1(2iR)M2i+1B|f(y)|dyCi=1(2iR)Mχ2i+1Bp()fp().

    By using Lemma 2.4, we arrive at the inequality

    Rn2B|f(y)||x0y|MdyCi=1(R)M(2i)nnp+MχBp()fp()CRMfp()χBp(). (3.2)

    Here, the series above converges when M>nnp+. Hence, for such M,

    |R20etLf2(x)tα21dt|CRMfp()χBp()R20tMn+α21dtC|B|αn1fp()χBp().

    For I2, thanks to Lemma 2.8, we may choose M as above and NM, then it holds that

    |R2etLf2(x)tα22dt|=|R2Rn2Bkt(x,y)f(y)dytα22dt|CR2Rn2BtαnN22ρ(x)Ne|xy|2t|f(y)|dydtCρ(x)NR2tαnN22Rn2B(t|xy|2)M/2|f(y)|dydt.

    As xB, thanks to Lemma 2.5, ρ(x)ρ(x0)=R. Hence we have

    |R2etLf2(x)tα21dt|CRNR2tM+αnN21dtRn2B|f(y)||x0y|Mdy.

    Since M+αnN<0, the integral above for variable t converges, and by applying inequality (3.2) we have

    |R2etLf2(x)tα21dt|C|B|αn1fp()χBp(),

    thus we have proved (ⅱ).

    We now begin to prove that the condition (ⅰ) holds. Let B=B(x0,r) and r<ρ(x0). We set f=f1+f2 with f1=fχ2B and f2=fχRn2B. We write

    cr=r2etLf2(x0)tα21dt. (3.3)

    Thanks to (3.1), it holds that

    B|Iα(f(x))cr|BIα(|f1|)(x)dx+B|Iα(f2)(x)cr|dxC|B|αn1χBp()fp()+B|Iα(f2)(x)cr|dx.

    Let xB and we split Iαf2(x) as follows:

    Iαf2(x)=r20etLf2(x)tα21dt+r2etLf2(x)tα21dt:=I3+I4.

    For I3, by the same argument it holds that

    I3=|r20etLf2(x)tα21dt|C|B|αn1fp()χBp().

    For I4, by Lemma 2.9 and (3.3), it follows that

    |r2etLf2(x)tα21dtcr|r2Rn2B|kt(x,y)kt(x0,y)||f(y)|dytα21dtCδr2Rn2B(|xx0|t)δtn/2e|xy|2Ct|f(y)|dytα21dtCδrδRn2B|f(y)|r2t(nα+δ)/2e|xy|2Ctdttdy.

    Let s=|xy|2t, then we obtain the following estimate

    |r2etLf2(x)tα21dtcr|CδrδRn2B|f(y)||xy|nα+δdy0snα+δ2esCdss.

    Notice that the integral above for variable s is finite, thus we only need to compute the integral above for variable y. Thanks to inequality (3.2), it follows that

    |r2etLf2(x)tα21dtcr|CδrδRn2B|f(y)||xy|nα+δdyCi=1Rαn(2i)αnp+δχBp()fp()C|B|αnnfp()χBp(),

    so (ⅰ) is proved.

    Remark 3.1. By the same argument as the proof of Theorem 1.3, thanks to Lemma 2.2 we immediately obtained that the conclusions of Theorem 1.4 hold.

    The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

    Ping Li is partially supported by NSFC (No. 12371136). The authors would like to thank the anonymous referees for carefully reading the manuscript and providing valuable suggestions, which substantially helped in improving the quality of this paper. We also thank Professor Meng Qu for his useful discussions.

    The authors declare there are no conflicts of interest.



    [1] Y. Li, X. Yuan, H. Wu, L. Zhang, R. Wang, J. Chen, CVT-track: Concentrating on valid tokens for one-stream tracking, IEEE Trans. Circuits Syst. Video Technol., 34 (2024), 321–334. https://doi.org/10.1109/TCSVT.2024.3452231 doi: 10.1109/TCSVT.2024.3452231
    [2] S. Zhang, Y. Chen, ATM-DEN: Image inpainting via attention transfer module and decoder-encoder network, SPIC, 133 (2025), 117268. https://doi.org/10.1016/j.image.2025.117268 doi: 10.1016/j.image.2025.117268
    [3] F. Chen, X. Wang, Y. Zhao, S. Lv, X. Niu, Visual object tracking: A survey, Comput. Vision Image Underst., 222 (2022), 103508. https://doi.org/10.1016/j.cviu.2022.103508 doi: 10.1016/j.cviu.2022.103508
    [4] F. Zhang, S. Ma, Z. Qiu, T. Qi, Learning target-aware background-suppressed correlation filters with dual regression for real-time UAV tracking, Signal Process., 191 (2022), 108352. https://doi.org/10.1016/j.sigpro.2021.108352 doi: 10.1016/j.sigpro.2021.108352
    [5] S. Ma, B. Zhao, Z. Hou, W. Yu, L. Pu, X. Yang, SOCF: A correlation filter for real-time UAV tracking based on spatial disturbance suppression and object saliency-aware, Expert Syst. Appl., 238 (2024), 122131. https://doi.org/10.1016/j.eswa.2023.122131 doi: 10.1016/j.eswa.2023.122131
    [6] J. Lin, J. Peng, J. Chai, Real-time UAV correlation filter based on response-weighted background residual and spatio-temporal regularization, IEEE Geosci. Remote Sens. Lett., 20 (2023), 1–5. https://doi.org/10.1109/LGRS.2023.3272522 doi: 10.1109/LGRS.2023.3272522
    [7] J. Cao, H. Zhang, L. Jin, J. Lv, G. Hou, C. Zhang, A review of object tracking methods: From general field to autonomous vehicles, Neurocomputing, 585 (2024), 127635. https://doi.org/10.1016/j.neucom.2024.127635 doi: 10.1016/j.neucom.2024.127635
    [8] X. Hao, Y. Xia, H. Yang, Z. Zuo, Asynchronous information fusion in intelligent driving systems for target tracking using cameras and radars, IEEE Trans. Ind. Electron., 70 (2023), 2708–2717. https://doi.org/10.1109/TIE.2022.3169717 doi: 10.1109/TIE.2022.3169717
    [9] L. Liang, Z. Chen, L. Dai, S. Wang, Target signature network for small object tracking, Eng. Appl. Artif. Intell., 138 (2024), 109445. https://doi.org/10.1016/j.engappai.2024.109445 doi: 10.1016/j.engappai.2024.109445
    [10] R. Yao, L. Zhang, Y. Zhou, H. Zhu, J. Zhao, Z. Shao, Hyperspectral object tracking with dual-stream prompt, IEEE Trans. Geosci. Remote Sens., 63 (2025), 1–12. https://doi.org/10.1109/TGRS.2024.3516833 doi: 10.1109/TGRS.2024.3516833
    [11] N. K. Rathore, S. Pande, A. Purohit, An efficient visual tracking system based on extreme learning machine in the defence and military sector, Def. Sci. J., 74 (2024), 643–650. https://doi.org/10.14429/dsj.74.19576 doi: 10.14429/dsj.74.19576
    [12] Y. Chen, Y. Tang, Y. Xiao, Q. Yuan, Y. Zhang, F. Liu, et al., Satellite video single object tracking: A systematic review and an oriented object tracking benchmark, ISPRS J. Photogramm. Remote Sens., 210 (2024), 212–240. https://doi.org/10.1016/j.isprsjprs.2024.03.013 doi: 10.1016/j.isprsjprs.2024.03.013
    [13] W. Cai, Q. Liu, Y. Wang, HIPTrack: Visual tracking with historical prompts, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2024), 19258–19267. https://doi.org/10.1109/CVPR52733.2024.01822
    [14] L. Sun, J. Zhang, D. Gao, B. Fan, Z. Fu, Occlusion-aware visual object tracking based on multi-template updating Siamese network, Digit. Signal Process., 148 (2024), 104440. https://doi.org/10.1016/j.dsp.2024.104440 doi: 10.1016/j.dsp.2024.104440
    [15] Y. Chen, L. Wang, eMoE-Tracker: Environmental MoE-based transformer for robust event-guided object tracking, IEEE Robot. Autom. Lett., 10 (2025), 1393–1400. https://doi.org/10.1109/LRA.2024.3518305 doi: 10.1109/LRA.2024.3518305
    [16] Y. Sun, T. Wu, X. Peng, M. Li, D. Liu, Y. Liu, et al., Adaptive representation-aligned modeling for visual tracking, Knowl. Based Syst., 309 (2025), 112847. https://doi.org/10.1016/j.knosys.2024.112847 doi: 10.1016/j.knosys.2024.112847
    [17] J. Wang, S. Yang, Y. Wang, G. Yang, PPTtrack: Pyramid pooling based Transformer backbone for visual tracking, Expert Syst. Appl., 249 (2024), 123716. https://doi.org/10.1016/j.eswa.2024.123716 doi: 10.1016/j.eswa.2024.123716
    [18] C. Wu, J. Shen, K. Chen, Y. Chen, Y. Liao, UAV object tracking algorithm based on spatial saliency-aware correlation filter, Electron. Res. Arch., 33 (2025), 1446–1475. https://doi.org/10.3934/era.2025068 doi: 10.3934/era.2025068
    [19] A. Lukežič, T. Vojíř, L. Čehovin, J. Matas, M. Kristan, Discriminative correlation filter with channel and spatial reliability, Int. J. Comput. Vision, 126 (2018), 671–688. https://doi.org/10.1007/s11263-017-1061-3 doi: 10.1007/s11263-017-1061-3
    [20] T. Xu, Z. Feng, X. Wu, J. Kittler, Learning adaptive discriminative correlation filters via temporal consistency preserving spatial feature selection for robust visual object tracking, IEEE Trans. Image Process., 28 (2019), 5596–5609. https://doi.org/10.1109/TIP.2019.2919201 doi: 10.1109/TIP.2019.2919201
    [21] J. F. Henriques, R. Caseiro, P. Martins, J. Batista, High-speed tracking with kernelized correlation filters, IEEE Trans. Pattern Anal. Mach. Intell., 37 (2015), 583–596. https://doi.org/10.1109/TPAMI.2014.2345390 doi: 10.1109/TPAMI.2014.2345390
    [22] E. O. Brigham, R. E. Morrow, The fast Fourier transform, IEEE Spectrum, 4 (1967), 63–70. https://doi.org/10.1109/MSPEC.1967.5217220 doi: 10.1109/MSPEC.1967.5217220
    [23] H. K. Galoogahi, A. Fagg, S. Lucey, Learning background-aware correlation filters for visual tracking, in IEEE International Conference on Computer Vision (ICCV), (2017), 1144–1152. https://doi.org/10.1109/ICCV.2017.129
    [24] Z. Zhang, H. Peng, J. Fu, B. Li, W. Hu, Ocean: Object-aware anchor-free tracking, in European Conference on Computer Vision (ECCV), (2020), 771–787. https://doi.org/10.1007/978-3-030-58589-1_46
    [25] Y. Zhang, H. Pan, J. Wang, Enabling deformation slack in tracking with temporally even correlation filters, Neural Networks, 181 (2025), 106839. https://doi.org/10.1016/j.neunet.2024.106839 doi: 10.1016/j.neunet.2024.106839
    [26] Y. Chen, H. Wu, Z. Deng, J. Zhang, H. Wang, L. Wang, et al., Deep-feature-based asymmetrical background-aware correlation filter for object tracking, Digit. Signal Process., 148 (2024), 104446. https://doi.org/10.1016/j.dsp.2024.104446 doi: 10.1016/j.dsp.2024.104446
    [27] K. Chen, L. Wang, H. Wu, C. Wu, Y. Liao, Y. Chen, et al., Background-aware correlation filter for object tracking with deep CNN features, Eng. Lett., 32 (2024), 1353–1363. https://doi.org/10.1016/j.dsp.2024.104446 doi: 10.1016/j.dsp.2024.104446
    [28] J. Zhang, Y. He, W. Chen, L. D. Kuang, B. Zheng, CorrFormer: Context-aware tracking with cross-correlation and transformer, Comput. Electr. Eng., 114 (2024), 109075. https://doi.org/10.1016/j.compeleceng.2024.109075 doi: 10.1016/j.compeleceng.2024.109075
    [29] L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, P. H. Torr, Fully-convolutional siamese networks for object tracking, in European Conference on Computer Vision (ECCV), (2016), 850–865. https://doi.org/10.1007/978-3-319-48881-3_56
    [30] Q. Guo, W. Feng, C. Zhou, R. Huang, L. Wan, S. Wang, Learning dynamic siamese network for visual object tracking, in IEEE International Conference on Computer Vision (ICCV), (2017), 1781–1789. https://doi.org/10.1109/ICCV.2017.196
    [31] B. Li, J. Yan, W. Wu, Z. Zhu, X. Hu, High performance visual tracking with siamese region proposal network, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2018), 8971–8980.
    [32] B. Li, W. Wu, Q. Wang, F. Zhang, J. Xing, J. Yan, SiamRPN++: Evolution of siamese visual tracking with very deep networks, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), 4277–4286.
    [33] L. Zhao, C. Fan, M. Li, Z. Zheng, X. Zhang, Global-local feature-mixed network with template update for visual tracking, Pattern Recognit. Lett., 188 (2025), 111–116. https://doi.org/10.1016/j.patrec.2024.11.034 doi: 10.1016/j.patrec.2024.11.034
    [34] F. Gu, J. Lu, C. Cai, Q. Zhu, Z. Ju, RTSformer: A robust toroidal transformer with spatiotemporal features for visual tracking, IEEE Trans. Human Mach. Syst., 54 (2024), 214–225. https://doi.org/10.1109/THMS.2024.3370582 doi: 10.1109/THMS.2024.3370582
    [35] O. Abdelaziz, M. Shehata, DMTrack: Learning deformable masked visual representations for single object tracking, SIViP, 19 (2025), 61. https://doi.org/10.1007/s11760-024-03713-0 doi: 10.1007/s11760-024-03713-0
    [36] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, et al., Attention is all you need, in the 31st International Conference on Neural Information Processing Systems (NIPS), (2017), 6000–6010.
    [37] O. C. Koyun, R. K. Keser, S. O. Şahin, D. Bulut, M. Yorulmaz, V. Yücesoy, et al., RamanFormer: A Transformer-based quantification approach for raman mixture components, ACS Omega, 9 (2024), 23241–23251. https://doi.org/10.1021/acsomega.3c09247 doi: 10.1021/acsomega.3c09247
    [38] H. Fan, X. Wang, S. Li, H. Ling, Joint feature learning and relation modeling for tracking: A one-stream framework, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2022), 341–357. https://doi.org/10.1007/978-3-031-20047-2_20
    [39] H. Zhang, J. Song, H. Liu, Y. Han, Y. Yang, H. Ma, AwareTrack: Object awareness for visual tracking via templates interaction, Image Vision Comput., 154 (2025), 105363. https://doi.org/10.1016/j.imavis.2024.105363 doi: 10.1016/j.imavis.2024.105363
    [40] Z. Wang, L. Yuan, Y. Ren, S. Zhang, H. Tian, ADSTrack: Adaptive dynamic sampling for visual tracking, Complex Intell. Syst., 11 (2025), 79. https://doi.org/10.1007/s40747-024-01672-0 doi: 10.1007/s40747-024-01672-0
    [41] X. Chen, B. Yan, J. Zhu, D. Wang, X. Yang, H. Lu, Transformer tracking, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2021), 8122–8131. https://doi.org/10.1109/CVPR46437.2021.00803
    [42] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, et al., An image is worth 16x16 words: Transformers for image recognition at scale, preprint, arXiv: 2010.11929. https://doi.org/10.48550/arXiv.2010.11929
    [43] B. Yan, H. Peng, J. Fu, D. Wang, H. Lu, Learning spatio-temporal transformer for visual tracking, in IEEE International Conference on Computer Vision (ICCV), (2021), 10428–10437. https://doi.org/10.1109/ICCV48922.2021.01028
    [44] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, et al., Swin Transformer: Hierarchical vision transformer using shifted windows, in IEEE/CVF International Conference on Computer Vision (ICCV), (2021), 10012–10022. https://doi.org/10.1109/ICCV48922.2021.00986
    [45] L. Lin, H. Fan, Z. Zhang, Y. Xu, H. Ling, SwinTrack: A simple and strong baseline for transformer tracking, in Advances in Neural Information Processing Systems (NIPS), 35 (2022), 16743–16754.
    [46] Z. Song, J. Yu, Y. P. P. Chen, W. Yang, Transformer tracking with cyclic shifting window attention, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2022), 8781–8790. https://doi.org/10.1109/CVPR52688.2022.00859
    [47] Y. Chen, K. Chen, Four mathematical modeling forms for correlation filter object tracking algorithms and the fast calculation for the filter, Electron. Res. Arch., 32 (2024), 4684–4714. https://doi.org/10.3934/era.2024213 doi: 10.3934/era.2024213
    [48] H. Fan, L. Lin, F. Yang, P. Chu, G. Deng, S. Yu, et al., LaSOT: A high-quality benchmark for large-scale single object tracking, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), 5369–5378. https://doi.org/10.1109/CVPR.2019.00552
    [49] Y. Wu, J. Lim, M. -H. Yang, Online object tracking: A benchmark, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2013), 2411–2418. https://doi.org/10.1109/CVPR.2013.312
    [50] M. Mueller, N. Smith, B. Ghanem, A benchmark and simulator for UAV tracking, in Computer Vision–ECCV 2016, (2016), 445–461. https://doi.org/10.1007/978-3-319-46448-0_27
    [51] Y. Huang, Y. Chen, C. Lin, Q. Hu, J. Song, Visual attention learning and antiocclusion-based correlation filter for visual object tracking, J. Electron. Imaging, 32 (2023), 23. https://doi.org/10.1117/1.JEI.32.1.013023 doi: 10.1117/1.JEI.32.1.013023
    [52] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 770–778.
    [53] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, S. Zagoruyko, End-to-end object detection with transformers, in European Conference on Computer Vision (ECCV), (2020), 213–229. https://doi.org/10.1007/978-3-030-58452-8_13
    [54] D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, preprint, arXiv: 1412.6980. https://doi.org/10.48550/arXiv.1412.6980
    [55] Y. Cui, C. Jiang, G. Wu, L. Wang, MixFormer: End-to-end tracking with iterative mixed attention, IEEE Trans. Pattern Anal. Mach. Intell., 46 (2024), 4129–4146. https://doi.org/10.1109/TPAMI.2024.3349519 doi: 10.1109/TPAMI.2024.3349519
    [56] J. Shen, Y. Liu, X. Dong, X. Lu, F. S. Khan, S. Hoi, Distilled siamese networks for visual tracking, IEEE Trans. Pattern Anal. Mach. Intell., 44 (2022), 8896–8909. https://doi.org/10.1109/TPAMI.2021.3127492 doi: 10.1109/TPAMI.2021.3127492
    [57] X. Dong, J. Shen, F. Porikli, J. Luo, L. Shao, Adaptive siamese tracking with a compact latent network, IEEE Trans. Pattern Anal. Mach. Intell., 45 (2023), 8049–8062. https://doi.org/10.1109/TPAMI.2022.3230064 doi: 10.1109/TPAMI.2022.3230064
    [58] Z. Cao, Z. Huang, L. Pan, S. Zhang, Z. Liu, C. Fu, Towards real-world visual tracking with temporal contexts, IEEE Trans. Pattern Anal. Mach. Intell., 45 (2023), 15834–15849. https://doi.org/10.1109/TPAMI.2023.3307174 doi: 10.1109/TPAMI.2023.3307174
    [59] Y. Yang, X. Gu, Attention-based gating network for robust segmentation tracking, IEEE Trans. Circuits Syst. Video Technol., 35 (2025), 245–258. https://doi.org/10.1109/TCSVT.2024.3460400 doi: 10.1109/TCSVT.2024.3460400
    [60] W. Han, X. Dong, Y. Zhang, D. Crandall, C. Z. Xu, J. Shen, Asymmetric Convolution: An efficient and generalized method to fuse feature maps in multiple vision tasks, IEEE Trans. Pattern Anal. Mach. Intell., 46 (2024), 7363–7376. https://doi.org/10.1109/TPAMI.2024.3400873 doi: 10.1109/TPAMI.2024.3400873
    [61] X. Zhu, Y. Wu, D. Xu, Z. Feng, J. Kittler, Robust visual object tracking via adaptive attribute-aware discriminative correlation filters, IEEE Trans. Multimedia, 23 (2021), 2625–2638. https://doi.org/10.1109/TMM.2021.3050073 doi: 10.1109/TMM.2021.3050073
    [62] M. Danelljan, H. Gustav, F. Shahbaz Khan, M. Felsberg, Learning spatially regularized correlation filters for visual tracking, in IEEE International Conference on Computer Vision (ICCV), (2015), 4310–4318. https://doi.org/10.1109/ICCV.2015.490
    [63] J. Valmadre, L. Bertinetto, J. Henriques, A. Vedaldi, P. H. S. Torr, End-to-end representation learning for correlation filter based tracking, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 5000–5008. https://doi.org/10.1109/CVPR.2017.531
    [64] G. Bhat, M. Danelljan, L. V. Gool, R. Timofte, Learning discriminative model prediction for tracking, in IEEE/CVF International Conference on Computer Vision (ICCV), (2019), 6182–6191. https://doi.org/10.1109/ICCV.2019.00628
    [65] N. Wang, W. Zhou, J. Wang, H. Li, Transformer meets tracker: exploiting temporal context for robust visual tracking, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2021), 1571–1580. https://doi.org/10.1109/CVPR46437.2021.00162
    [66] Z. Chen, B. Zhong, G. Li, S. Zhang, R. Ji, Siamese box adaptive network for visual tracking, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020), 6668–6677. https://doi.org/10.1109/CVPR42600.2020.00670
    [67] Y. Guo, H. Li, L. Zhang, L. Zhang, K. Deng, F. Porikli, SiamCAR: Siamese fully convolutional classification and regression for visual tracking, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020), 1176–1185. https://doi.org/10.1109/CVPR42600.2020.00630
    [68] D. Xing, N. Evangeliou, A. Tsoukalas, A. Tzes, Siamese transformer pyramid networks for real-time UAV tracking, in IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), (2022), 1898–1907. https://doi.org/10.1109/WACV51458.2022.00196
  • This article has been cited by:

    1. Ying Yu, Yahui Chen, You Zhou, Cross-Diffusion-Induced Turing Instability in a Two-Prey One-Predator System, 2023, 11, 2227-7390, 2411, 10.3390/math11112411
    2. Debjit Pal, Dipak Kesh, Debasis Mukherjee, Pattern dynamics in a predator–prey model with Smith growth function and prey refuge in predator poisoned environment, 2024, 92, 05779073, 366, 10.1016/j.cjph.2024.09.015
    3. Xiaoyan Zhao, Liangru Yu, Xue-Zhi Li, Dynamics analysis of a predator-prey model incorporating fear effect in prey species, 2025, 10, 2473-6988, 12464, 10.3934/math.2025563
  • Reader Comments
  • © 2025 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(304) PDF downloads(20) Cited by(0)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog