In recent years, Transformer-based object trackers have demonstrated exceptional performance in object tracking. However, traditional methods often employ single-scale pixel-level attention mechanisms to compute the correlation between templates and search regions, disrupting object's integrity and positional information. To address these issues, we introduce a cyclic-shift mechanism to expand the diversity of sample positions and replace the traditional single-scale pixel-level attention mechanism with a multi-scale window-level attention mechanism. This approach not only preserves the object's integrity but also enriches the diversity of samples. Nevertheless, the introduced cyclic-shift operation heavily burdens storage and computation. To this end, we treat the attention computation of shifted and static windows in the spatial domain as convolution. By leveraging the convolution theorem, we transform the attention computation of cyclic shift samples from the spatial domain to element-wise multiplication in the frequency domain. This approach enhances computational efficiency and reduces data storage requirements. We conducted extensive experiments on the proposed module. The results demonstrate that the proposed module outperforms multiple existing tracking algorithms regarding performance. Moreover, ablation studies show that the method effectively reduces the storage and computational burden without compromising performance.
Citation: Huanyu Wu, Yingpin Chen, Changhui Wu, Ronghuan Zhang, Kaiwei Chen. A multi-scale cyclic-shift window Transformer object tracker based on fast Fourier transform[J]. Electronic Research Archive, 2025, 33(6): 3638-3672. doi: 10.3934/era.2025162
Related Papers:
[1]
Youtian Hao, Guohua Yan, Renjun Ma, M. Tariqul Hasan .
Linking dynamic patterns of COVID-19 spreads in Italy with regional characteristics: a two level longitudinal modelling approach. Mathematical Biosciences and Engineering, 2021, 18(3): 2579-2598.
doi: 10.3934/mbe.2021131
[2]
Weike Zhou, Aili Wang, Fan Xia, Yanni Xiao, Sanyi Tang .
Effects of media reporting on mitigating spread of COVID-19 in the early phase of the outbreak. Mathematical Biosciences and Engineering, 2020, 17(3): 2693-2707.
doi: 10.3934/mbe.2020147
[3]
Marco Roccetti .
Excess mortality and COVID-19 deaths in Italy: A peak comparison study. Mathematical Biosciences and Engineering, 2023, 20(4): 7042-7055.
doi: 10.3934/mbe.2023304
[4]
Marco Roccetti .
Drawing a parallel between the trend of confirmed COVID-19 deaths in the winters of 2022/2023 and 2023/2024 in Italy, with a prediction. Mathematical Biosciences and Engineering, 2024, 21(3): 3742-3754.
doi: 10.3934/mbe.2024165
[5]
Tianfang Hou, Guijie Lan, Sanling Yuan, Tonghua Zhang .
Threshold dynamics of a stochastic SIHR epidemic model of COVID-19 with general population-size dependent contact rate. Mathematical Biosciences and Engineering, 2022, 19(4): 4217-4236.
doi: 10.3934/mbe.2022195
[6]
S. H. Sathish Indika, Norou Diawara, Hueiwang Anna Jeng, Bridget D. Giles, Dilini S. K. Gamage .
Modeling the spread of COVID-19 in spatio-temporal context. Mathematical Biosciences and Engineering, 2023, 20(6): 10552-10569.
doi: 10.3934/mbe.2023466
[7]
Aili Wang, Xueying Zhang, Rong Yan, Duo Bai, Jingmin He .
Evaluating the impact of multiple factors on the control of COVID-19 epidemic: A modelling analysis using India as a case study. Mathematical Biosciences and Engineering, 2023, 20(4): 6237-6272.
doi: 10.3934/mbe.2023269
[8]
Sarah R. Al-Dawsari, Khalaf S. Sultan .
Modeling of daily confirmed Saudi COVID-19 cases using inverted exponential regression. Mathematical Biosciences and Engineering, 2021, 18(3): 2303-2330.
doi: 10.3934/mbe.2021117
[9]
Peng An, Xiumei Li, Ping Qin, YingJian Ye, Junyan Zhang, Hongyan Guo, Peng Duan, Zhibing He, Ping Song, Mingqun Li, Jinsong Wang, Yan Hu, Guoyan Feng, Yong Lin .
Predicting model of mild and severe types of COVID-19 patients using Thymus CT radiomics model: A preliminary study. Mathematical Biosciences and Engineering, 2023, 20(4): 6612-6629.
doi: 10.3934/mbe.2023284
[10]
Yuto Omae, Yohei Kakimoto, Makoto Sasaki, Jun Toyotani, Kazuyuki Hara, Yasuhiro Gon, Hirotaka Takahashi .
SIRVVD model-based verification of the effect of first and second doses of COVID-19/SARS-CoV-2 vaccination in Japan. Mathematical Biosciences and Engineering, 2022, 19(1): 1026-1040.
doi: 10.3934/mbe.2022047
Abstract
In recent years, Transformer-based object trackers have demonstrated exceptional performance in object tracking. However, traditional methods often employ single-scale pixel-level attention mechanisms to compute the correlation between templates and search regions, disrupting object's integrity and positional information. To address these issues, we introduce a cyclic-shift mechanism to expand the diversity of sample positions and replace the traditional single-scale pixel-level attention mechanism with a multi-scale window-level attention mechanism. This approach not only preserves the object's integrity but also enriches the diversity of samples. Nevertheless, the introduced cyclic-shift operation heavily burdens storage and computation. To this end, we treat the attention computation of shifted and static windows in the spatial domain as convolution. By leveraging the convolution theorem, we transform the attention computation of cyclic shift samples from the spatial domain to element-wise multiplication in the frequency domain. This approach enhances computational efficiency and reduces data storage requirements. We conducted extensive experiments on the proposed module. The results demonstrate that the proposed module outperforms multiple existing tracking algorithms regarding performance. Moreover, ablation studies show that the method effectively reduces the storage and computational burden without compromising performance.
1.
Introduction
Since the emergence of the COVID-19 pandemic around December 2019, the outbreak has snowballed globally [1,2], and there is no clear sign that the new confirmed cases and deaths are coming to an end. Though vaccines are rolling out to deter the spread of this pandemic, the mutations of the viruses are already under way [3,4,5,6]. Despite the fact and research that the origin of the pandemic is still in debate [7], many researchers are conducting their study from different aspects and perspectives. They could be categorised mainly into three levels: SARS-CoV-2 genetic level [8], COVID-19 individual country level [9,10,11] and continental levels [12,13]. In this study, we focus on the latter two levels. Regarding these two levels, there are many methods and techniques on these issues. For example, linear and non-linear growth models, together with 2-week-kernel-window regression, are exploited in modelling the exponential growth rate of COVID-19 confirmed cases [14] - which are also generalised to non-linear modelling of COVID-19 pandemic [15,16]. Some research works focus on the prediction of COVID-19 spread by estimating the lead-lag effects between different countries via time warping technique [17], while some utilise clustering analyses to group countries via epidemiological data of active cases, active cases per population, etc.[18]. In addition, there are other researches focusing on tackling the relationship between economic variables and COVID-19 related variables [19,20] - though both the results show there are no relation between economic freedom and COVID-19 deaths and no relation between the performance of equality markets and the COVID-19 cases and deaths.
In this study, we aim to extract the features of daily biweekly growth rates of cases and deaths on national and continental levels. We devise the orthonormal bases based on Fourier analysis [21,22], in particular Fourier coefficients for the potential features. For the national levels, we import the global time series data and sample 117 countries for 109 days [23,24]. Then we calculate the Euclidean distance matrices for the inner products between countries and between days. Based on the distance matrices, we then calculate their variabilities to delve into the distribution of the data. For the continental level, we also import the biweekly changes of cases and deaths for 5 continents as well as the world data with time series data for 447 days. Then we calculate their inner products with respect to the temporal frequencies and find the similarities of extracted features between continents.
For the national levels, the biweekly data bear higher temporal features than spatial features, i..e., as time goes by, the pandemic evolves more in the time dimension than the space (or country-wise) dimension. Moreover, there exists a strong concurrency between features for biweekly changes of cases and deaths, though there is no clear or stable trend for the extracted features. However, in the continental level, one observes that there is a stable trend of features regarding biweekly change. In addition, the extracted features between continents are similar to one another, except Asia whose features bear no clear similarities with other continents.
Our approach is based on orthonormal bases, which serve as the potential features for the biweekly change of cases and deaths. This method is straightforward and easy to comprehend. The limitations of this approach are the extracted features are based on the hidden frequencies of the dynamical structure, which is hard to assign a interpretable meaning for the frequencies, and the data fetched are not complete, due to the missing data in the database. However the results provided in this study could help one map out the evolutionary features of COVID-19.
2.
Method and implementation
Let δ:N→{0,1} be a function such that δ(n)=0 (or δn=0), if n∈2N and δ(n)=1, if n∈2N+1. Given a set of point data D={→v}⊆RN, we would like to decompose each →v into some frequency-based vectors by Fourier analysis. The features of COVID-19 case and death growth rates are specified by the orthogonal frequency vectors BN={→fij:1≤j≤N}Ni=1, which is based on Fourier analysis, in particular Fourier series [22], where
● →f1j=√1N for all 1≤j≤N;
● For any 2≤i≤N−1+δN,
→fij=√2N⋅cos[π2⋅δi−(i−δi)⋅πN⋅j;
(2.1)
● If N∈2N, then →fNj=√1N⋅cos(j⋅π) for all 1≤j≤N.
Now we have constructed an orthonormal basis FN={→f1,→f2,⋯,→fN} as features for RN. Now each →v=N∑i=1<→v,→fi>⋅→fi, where <,> is the inner product. The basis BN could also be represented by a matrix
and the representation of a data column vector →v={(-3,14,5,8,-12)} with respect to B5 is calculated by F5→v=[<→v,→fi>]5i=1 or a column vector or 5-by-1 matrix (5.367,-16.334,-3.271,-6.434,-9.503).
2.1. Data description and handling
There are two main parts of data collection and handling - one for individual countries (or national level) and the other for individual continents (or continental level). In both levels, we fetch the daily biweekly growth rates of confirmed COVID-19 cases and deaths from Our World in Data [23,24]. Then we use R programming 4.1.0 to handle the data and implement the procedures.
Sampled targets: national. After filtering out non-essential data and missing data, the effective sampled data are 117 countries with effective sampled 109 days as shown in Results. The days range from December 2020 to June 2021. Though the sampled days are not subsequent ones (due to the missing data), the biweekly information could still cover such loss. In the latter temporal and spatial analyses, we will conduct our study based on these data.
Sampled targets: continental. As for the continental data, we collect data regarding the world, Africa, Asia, Europe, North and South America. The sampled days range from March 22nd, 2020 to June 11th, 2021. In total, there are 449 days (this is different from the national level). In the latter temporal analysis (there is no spatial analysis in the continental level, due to the limited sampling size), we will conduct our study based on these data.
Notations: national. For further processing, let us utilise some notations to facilitate the introduction. Let the sampled countries be indexed by i=1,...,117. Let the sampled days be indexed by t=1,...,109. Days range from December 3rd 2020 to May 31st 2021. Let ci(t) and di(t) be the daily biweekly growth rates of confirmed cases and deaths in country i on day t, respectively, i.e.,
ci(t):=casei,t+13−casei,tcasei,t;
(2.3)
di(t):=deathi,t+13−deathi,tdeathi,t,
(2.4)
where casei,t and deathi,t denote the total confirmed cases and deaths for country i at day t, respectively. We form temporal and spatial vectors by
the vector ci and di give every count in time for a given country, and the vector v(t) and w(t) give every countries' count for a given time.
Notations: continental. For further processing, let us utilise some notations to facilitate the introduction. Let the sampled continents be indexed by j=1,...,6. Let the 447 sampled days range from March 22nd 2020 to June 11th 2021. We form temporal vectors for confirmed cases and deaths by
For any m-by-n matrix A, we use min(A) to denote the value min{aij:1≤i≤m;1≤j≤n}. Similarly, we define max(A) by the same manner. If →v is a vector, we define min(→v) and max(→v) in the same manner. The implementation goes as follows:
(1) Extract and trim and source data.
Extraction: national. Extract the daily biweekly growth rates of COVID-19 cases and deaths from the database and trim the data. The trimmed data consist of 109 time series data for 117 countries as shown in Table 1, which consists of two 117-by-109 matrices:
Row i in the matrices are regarded as temporal vectors ci and di respectively, and Column t in the matrices are regarded as spatial vectors v(t) and w(t) respectively.
Extraction: continental. As for the continental data, they are collected by two 6-by-447 matrices:
Biweekly_cont_cases=[xj(τ)]τ=1:447j=1:6;
Biweekly_cont_deaths=[yj(t)]τ=1:447j=1:6.
(2) Specify the frequencies (features) for the imported data.
Basis: national. In order to decompose ci and di into some fundamental features, we specify F109 as the corresponding features, whereas to decompose v(t) and w(t), we specify F117 as the corresponding features. The results are presented in Table 2.
Table 2.
Orthonormal temporal frequencies for 109 days (upper block or F109) and orthonormal spatial frequencies for 117 countries (lower block or F117).
Average variability: continental. For each continent j, the temporal variabilities for confirmed cases and deaths are computed by
var_cont_case_time[j]=6∑k=1dE(F447xj,F447xk)447;
var_cont_death_time[j]=6∑k=1dE(F447yj,F447yk)447.
(6) Unify the national temporal and spatial variabilities of cases and deaths. For each country i, the unified temporal and spatial variabilities for cases and deaths are defined by
where σij and βij denotes the value in the (i,j) cells of IP_cont_cases_time and IP_cont_deaths_time, respectively. The results are visualised by figures in Results.
3.
Results
There are two main parts of results shown in this section: national results and continental results.
National results. Based on the method mentioned in section 2, we identify the temporal orthonormal frequencies and spatial ones as shown in Table 2.
The computed inner products at country levels, served as the values for extracted features, for daily biweekly growth rates of cases and deaths with respect to temporal frequencies are shown in Figure 1. Similarly, the computed inner products at a country level for daily biweekly growth rates of cases and deaths with respect to spatial frequencies are shown in Figure 2. Meanwhile, their scaled variabilities are plotted in Figure 3.
Figure 1.
Inner products between growth rates of cases (in solid line) over 109 temporal frequencies; and inner products between growth rates of deaths (in dotted line) over 109 temporal frequencies for some demonstrative countries: Afghanistan, Albania, Algeria, Uruguay, Zambia, and Zimbabwe.
Figure 2.
Inner products between growth rates of cases (in solid line) over 117 spatial frequencies; and inner products between growth rates of deaths (in dotted line) over 117 spatial frequencies for some demonstrative dates: 2020/12/3, 2020/12/4, 2020/12/5, 2021/5/29, 2021/5/30, and 2021/5/31.
Continental results. According to the obtained data, we study and compare continental features of daily biweekly growth rates of confirmed cases and deaths of Africa, Asia, Europe, North America, South America and World. Unlike the missing data in analysing individual countries, the continental data are complete. We take the samples from March 22nd, 2020 to June 11th, 2021. In total, there are 447 days for the analysis. The cosine values which compute the similarities between representations for continents are shown in Table 3. The results of the unified inner products with respect to confirmed cases and deaths are plotted in Figures 4 and 5, respectively.
Table 3.
Cosine values (similarities) between World, Africa, Asia, Europe, North America (No. Am.), and South America (So. Am.).
Figure 5.
Unified inner product, or UIP, for world, Africa, Asia, Europe, North and South America with respect to daily biweekly growth rates of deaths.
Other auxiliary results that support the plotting of the graphs are also appended in Appendix. The names of the sampled 117 countries are provided in Tables A1 and A2. The dates of the sampled days are provided in Figure A1. The tabulated results for inner product of temporal and spatial frequencies on a national level are provided in Table A3. The tabulated results for inner product of temporal frequencies on a continental level are provided in Table A4. The Euclidean distance matrices for temporal and spatial representations with respect to confirmed cases and deaths are tabulated in Table A5 and their average variabilities are tabulated in Table A6.
Summaries of results. Based on the previous tables and figures, we have the following results.
(1) From Figures 1 and 2, one observes that the temporal features are much more distinct that the spatial features, i.e., if one fixes one day and extracts the features from the spatial frequencies, he obtains less distinct features when comparing with fixing one country and extracting the features from the temporal frequencies. This indicates that SARS-CoV-2 evolves and mutates mainly according to time than space.
(2) For individual countries, the features for the biweekly changes of cases are almost concurrent with those of deaths. This indicates biweekly changes of cases and deaths share the similar features. In some sense, the change of deaths is still in tune with the change of confirmed cases, i.e., there is no substantial change between their relationship.
(3) For individual countries, the extracted features go up and down intermittently and there is no obvious trend. This indicates the virus is still very versatile and hard to capture its fixed features in a country-level.
(4) From Figure 3, one observes that there is a clear similarities, in terms of variabilities, for both daily biweekly growth rates of cases and deaths under temporal frequencies. Moreover, the distribution of overall data is not condensed, where middle, labelled countries are scattering around the whole data. This indicates the diversity of daily biweekly growth rates of cases and deaths across countries is still very high.
(5) From Figure 3, the daily biweekly growth rates of deaths with respect to the spatial frequencies are fairly concentrated. This indicates the extracted features regarding deaths are stable, i.e., there are clearer and stabler spatial features for daily biweekly growth rates of deaths.
(6) Comparing the individual graphs in Figures 4 and 5, they bear pretty much the same shape, but in different scale - with death being higher feature oriented (this is also witnessed in a country-level as claimed in the first result above). This indicates there is a very clear trend of features regarding daily biweekly growth rates in a continental level (this is a stark contrast to the third claimed result above).
(7) From Figures 4 and 5, the higher values of inner products lie in both endpoints for biweekly change of cases and deaths, i.e., low temporal frequencies and high temporal frequencies for all the continents, except the biweekly change of deaths in Asia. This indicates the evolutionary patterns in Asia are very distinct from other continents.
(8) From Table 3, the extracted features are all very similar to each continents, except Asia. This echoes the above result.
4.
Conclusions and future work
In this study, we identify the features of daily biweekly growth rates of COVID-19 confirmed cases and deaths via orthonormal bases (features) which derive from Fourier analysis. Then we analyse the inner products which represent the levels of chosen features. The variabilities for each country show the levels of deaths under spatial frequencies are much more concentrated than others. The generated results are summarised in Results 3. There are some limitations in this study and future improvements to be done:
● The associated meanings of the orthonormal features from Fourier analysis are not yet fully explored;
● We use the Euclidean metric to measure the distances between features, which is then used to calculate the variabilities. Indeed Euclidean metric is noted for its geographical properties, but may not be the most suitable in the context of frequencies. One could further introduce other metrics and apply machine learning techniques to find out the optimal ones.
● In this study, we choose the daily biweekly growth rates of confirmed cases and deaths as our research sources. This is a one-sided story. To obtain a fuller picture of the dynamical features, one could add other variables for comparison.
Acknowledgements
This work is supported by the Humanities and Social Science Research Planning Fund Project under the Ministry of Education of China (No. 20XJAGAT001).
Conflict of interest
No potential conflict of interest was reported by the authors.
Table A3.
Inner products w.r.t. temporal (case: upper top and death: upper bottom blocks) and spatial (case: lower top and death: lower bottom blocks) frequencies at a national level.
Table A4.
Temporal inner product for continents (World, Africa, Asia, Europe, North and South America) w.r.t. daily biweekly growth rates of cases (upper block) and deaths (lower block) from March 22nd, 2020 to June 11th, 2021 (447 days).
Table A5.
Distance matrices for daily biweekly growth rates of cases (uppermost block) and deaths (2nd block) w.r.t. temporal frequencies and the ones of cases (3rd block) and deaths (bottommost block) w.r.t. spatial frequencies.
Table A6.
Variability for 117 countries with respect to daily biweekly growth rates of cases and death under temporal frequencies and variability for 109 days with respect to daily biweekly growth rates of cases and death under spatial frequencies.
Y. Li, X. Yuan, H. Wu, L. Zhang, R. Wang, J. Chen, CVT-track: Concentrating on valid tokens for one-stream tracking, IEEE Trans. Circuits Syst. Video Technol., 34 (2024), 321–334. https://doi.org/10.1109/TCSVT.2024.3452231 doi: 10.1109/TCSVT.2024.3452231
[2]
S. Zhang, Y. Chen, ATM-DEN: Image inpainting via attention transfer module and decoder-encoder network, SPIC, 133 (2025), 117268. https://doi.org/10.1016/j.image.2025.117268 doi: 10.1016/j.image.2025.117268
[3]
F. Chen, X. Wang, Y. Zhao, S. Lv, X. Niu, Visual object tracking: A survey, Comput. Vision Image Underst., 222 (2022), 103508. https://doi.org/10.1016/j.cviu.2022.103508 doi: 10.1016/j.cviu.2022.103508
[4]
F. Zhang, S. Ma, Z. Qiu, T. Qi, Learning target-aware background-suppressed correlation filters with dual regression for real-time UAV tracking, Signal Process., 191 (2022), 108352. https://doi.org/10.1016/j.sigpro.2021.108352 doi: 10.1016/j.sigpro.2021.108352
[5]
S. Ma, B. Zhao, Z. Hou, W. Yu, L. Pu, X. Yang, SOCF: A correlation filter for real-time UAV tracking based on spatial disturbance suppression and object saliency-aware, Expert Syst. Appl., 238 (2024), 122131. https://doi.org/10.1016/j.eswa.2023.122131 doi: 10.1016/j.eswa.2023.122131
[6]
J. Lin, J. Peng, J. Chai, Real-time UAV correlation filter based on response-weighted background residual and spatio-temporal regularization, IEEE Geosci. Remote Sens. Lett., 20 (2023), 1–5. https://doi.org/10.1109/LGRS.2023.3272522 doi: 10.1109/LGRS.2023.3272522
[7]
J. Cao, H. Zhang, L. Jin, J. Lv, G. Hou, C. Zhang, A review of object tracking methods: From general field to autonomous vehicles, Neurocomputing, 585 (2024), 127635. https://doi.org/10.1016/j.neucom.2024.127635 doi: 10.1016/j.neucom.2024.127635
[8]
X. Hao, Y. Xia, H. Yang, Z. Zuo, Asynchronous information fusion in intelligent driving systems for target tracking using cameras and radars, IEEE Trans. Ind. Electron., 70 (2023), 2708–2717. https://doi.org/10.1109/TIE.2022.3169717 doi: 10.1109/TIE.2022.3169717
[9]
L. Liang, Z. Chen, L. Dai, S. Wang, Target signature network for small object tracking, Eng. Appl. Artif. Intell., 138 (2024), 109445. https://doi.org/10.1016/j.engappai.2024.109445 doi: 10.1016/j.engappai.2024.109445
[10]
R. Yao, L. Zhang, Y. Zhou, H. Zhu, J. Zhao, Z. Shao, Hyperspectral object tracking with dual-stream prompt, IEEE Trans. Geosci. Remote Sens., 63 (2025), 1–12. https://doi.org/10.1109/TGRS.2024.3516833 doi: 10.1109/TGRS.2024.3516833
[11]
N. K. Rathore, S. Pande, A. Purohit, An efficient visual tracking system based on extreme learning machine in the defence and military sector, Def. Sci. J., 74 (2024), 643–650. https://doi.org/10.14429/dsj.74.19576 doi: 10.14429/dsj.74.19576
[12]
Y. Chen, Y. Tang, Y. Xiao, Q. Yuan, Y. Zhang, F. Liu, et al., Satellite video single object tracking: A systematic review and an oriented object tracking benchmark, ISPRS J. Photogramm. Remote Sens., 210 (2024), 212–240. https://doi.org/10.1016/j.isprsjprs.2024.03.013 doi: 10.1016/j.isprsjprs.2024.03.013
[13]
W. Cai, Q. Liu, Y. Wang, HIPTrack: Visual tracking with historical prompts, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2024), 19258–19267. https://doi.org/10.1109/CVPR52733.2024.01822
[14]
L. Sun, J. Zhang, D. Gao, B. Fan, Z. Fu, Occlusion-aware visual object tracking based on multi-template updating Siamese network, Digit. Signal Process., 148 (2024), 104440. https://doi.org/10.1016/j.dsp.2024.104440 doi: 10.1016/j.dsp.2024.104440
[15]
Y. Chen, L. Wang, eMoE-Tracker: Environmental MoE-based transformer for robust event-guided object tracking, IEEE Robot. Autom. Lett., 10 (2025), 1393–1400. https://doi.org/10.1109/LRA.2024.3518305 doi: 10.1109/LRA.2024.3518305
[16]
Y. Sun, T. Wu, X. Peng, M. Li, D. Liu, Y. Liu, et al., Adaptive representation-aligned modeling for visual tracking, Knowl. Based Syst., 309 (2025), 112847. https://doi.org/10.1016/j.knosys.2024.112847 doi: 10.1016/j.knosys.2024.112847
[17]
J. Wang, S. Yang, Y. Wang, G. Yang, PPTtrack: Pyramid pooling based Transformer backbone for visual tracking, Expert Syst. Appl., 249 (2024), 123716. https://doi.org/10.1016/j.eswa.2024.123716 doi: 10.1016/j.eswa.2024.123716
[18]
C. Wu, J. Shen, K. Chen, Y. Chen, Y. Liao, UAV object tracking algorithm based on spatial saliency-aware correlation filter, Electron. Res. Arch., 33 (2025), 1446–1475. https://doi.org/10.3934/era.2025068 doi: 10.3934/era.2025068
[19]
A. Lukežič, T. Vojíř, L. Čehovin, J. Matas, M. Kristan, Discriminative correlation filter with channel and spatial reliability, Int. J. Comput. Vision, 126 (2018), 671–688. https://doi.org/10.1007/s11263-017-1061-3 doi: 10.1007/s11263-017-1061-3
[20]
T. Xu, Z. Feng, X. Wu, J. Kittler, Learning adaptive discriminative correlation filters via temporal consistency preserving spatial feature selection for robust visual object tracking, IEEE Trans. Image Process., 28 (2019), 5596–5609. https://doi.org/10.1109/TIP.2019.2919201 doi: 10.1109/TIP.2019.2919201
[21]
J. F. Henriques, R. Caseiro, P. Martins, J. Batista, High-speed tracking with kernelized correlation filters, IEEE Trans. Pattern Anal. Mach. Intell., 37 (2015), 583–596. https://doi.org/10.1109/TPAMI.2014.2345390 doi: 10.1109/TPAMI.2014.2345390
[22]
E. O. Brigham, R. E. Morrow, The fast Fourier transform, IEEE Spectrum, 4 (1967), 63–70. https://doi.org/10.1109/MSPEC.1967.5217220 doi: 10.1109/MSPEC.1967.5217220
[23]
H. K. Galoogahi, A. Fagg, S. Lucey, Learning background-aware correlation filters for visual tracking, in IEEE International Conference on Computer Vision (ICCV), (2017), 1144–1152. https://doi.org/10.1109/ICCV.2017.129
[24]
Z. Zhang, H. Peng, J. Fu, B. Li, W. Hu, Ocean: Object-aware anchor-free tracking, in European Conference on Computer Vision (ECCV), (2020), 771–787. https://doi.org/10.1007/978-3-030-58589-1_46
[25]
Y. Zhang, H. Pan, J. Wang, Enabling deformation slack in tracking with temporally even correlation filters, Neural Networks, 181 (2025), 106839. https://doi.org/10.1016/j.neunet.2024.106839 doi: 10.1016/j.neunet.2024.106839
[26]
Y. Chen, H. Wu, Z. Deng, J. Zhang, H. Wang, L. Wang, et al., Deep-feature-based asymmetrical background-aware correlation filter for object tracking, Digit. Signal Process., 148 (2024), 104446. https://doi.org/10.1016/j.dsp.2024.104446 doi: 10.1016/j.dsp.2024.104446
[27]
K. Chen, L. Wang, H. Wu, C. Wu, Y. Liao, Y. Chen, et al., Background-aware correlation filter for object tracking with deep CNN features, Eng. Lett., 32 (2024), 1353–1363. https://doi.org/10.1016/j.dsp.2024.104446 doi: 10.1016/j.dsp.2024.104446
[28]
J. Zhang, Y. He, W. Chen, L. D. Kuang, B. Zheng, CorrFormer: Context-aware tracking with cross-correlation and transformer, Comput. Electr. Eng., 114 (2024), 109075. https://doi.org/10.1016/j.compeleceng.2024.109075 doi: 10.1016/j.compeleceng.2024.109075
[29]
L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, P. H. Torr, Fully-convolutional siamese networks for object tracking, in European Conference on Computer Vision (ECCV), (2016), 850–865. https://doi.org/10.1007/978-3-319-48881-3_56
[30]
Q. Guo, W. Feng, C. Zhou, R. Huang, L. Wan, S. Wang, Learning dynamic siamese network for visual object tracking, in IEEE International Conference on Computer Vision (ICCV), (2017), 1781–1789. https://doi.org/10.1109/ICCV.2017.196
[31]
B. Li, J. Yan, W. Wu, Z. Zhu, X. Hu, High performance visual tracking with siamese region proposal network, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2018), 8971–8980.
[32]
B. Li, W. Wu, Q. Wang, F. Zhang, J. Xing, J. Yan, SiamRPN++: Evolution of siamese visual tracking with very deep networks, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), 4277–4286.
[33]
L. Zhao, C. Fan, M. Li, Z. Zheng, X. Zhang, Global-local feature-mixed network with template update for visual tracking, Pattern Recognit. Lett., 188 (2025), 111–116. https://doi.org/10.1016/j.patrec.2024.11.034 doi: 10.1016/j.patrec.2024.11.034
[34]
F. Gu, J. Lu, C. Cai, Q. Zhu, Z. Ju, RTSformer: A robust toroidal transformer with spatiotemporal features for visual tracking, IEEE Trans. Human Mach. Syst., 54 (2024), 214–225. https://doi.org/10.1109/THMS.2024.3370582 doi: 10.1109/THMS.2024.3370582
[35]
O. Abdelaziz, M. Shehata, DMTrack: Learning deformable masked visual representations for single object tracking, SIViP, 19 (2025), 61. https://doi.org/10.1007/s11760-024-03713-0 doi: 10.1007/s11760-024-03713-0
[36]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, et al., Attention is all you need, in the 31st International Conference on Neural Information Processing Systems (NIPS), (2017), 6000–6010.
[37]
O. C. Koyun, R. K. Keser, S. O. Şahin, D. Bulut, M. Yorulmaz, V. Yücesoy, et al., RamanFormer: A Transformer-based quantification approach for raman mixture components, ACS Omega, 9 (2024), 23241–23251. https://doi.org/10.1021/acsomega.3c09247 doi: 10.1021/acsomega.3c09247
[38]
H. Fan, X. Wang, S. Li, H. Ling, Joint feature learning and relation modeling for tracking: A one-stream framework, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2022), 341–357. https://doi.org/10.1007/978-3-031-20047-2_20
[39]
H. Zhang, J. Song, H. Liu, Y. Han, Y. Yang, H. Ma, AwareTrack: Object awareness for visual tracking via templates interaction, Image Vision Comput., 154 (2025), 105363. https://doi.org/10.1016/j.imavis.2024.105363 doi: 10.1016/j.imavis.2024.105363
[40]
Z. Wang, L. Yuan, Y. Ren, S. Zhang, H. Tian, ADSTrack: Adaptive dynamic sampling for visual tracking, Complex Intell. Syst., 11 (2025), 79. https://doi.org/10.1007/s40747-024-01672-0 doi: 10.1007/s40747-024-01672-0
[41]
X. Chen, B. Yan, J. Zhu, D. Wang, X. Yang, H. Lu, Transformer tracking, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2021), 8122–8131. https://doi.org/10.1109/CVPR46437.2021.00803
[42]
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, et al., An image is worth 16x16 words: Transformers for image recognition at scale, preprint, arXiv: 2010.11929. https://doi.org/10.48550/arXiv.2010.11929
[43]
B. Yan, H. Peng, J. Fu, D. Wang, H. Lu, Learning spatio-temporal transformer for visual tracking, in IEEE International Conference on Computer Vision (ICCV), (2021), 10428–10437. https://doi.org/10.1109/ICCV48922.2021.01028
[44]
Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, et al., Swin Transformer: Hierarchical vision transformer using shifted windows, in IEEE/CVF International Conference on Computer Vision (ICCV), (2021), 10012–10022. https://doi.org/10.1109/ICCV48922.2021.00986
[45]
L. Lin, H. Fan, Z. Zhang, Y. Xu, H. Ling, SwinTrack: A simple and strong baseline for transformer tracking, in Advances in Neural Information Processing Systems (NIPS), 35 (2022), 16743–16754.
[46]
Z. Song, J. Yu, Y. P. P. Chen, W. Yang, Transformer tracking with cyclic shifting window attention, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2022), 8781–8790. https://doi.org/10.1109/CVPR52688.2022.00859
[47]
Y. Chen, K. Chen, Four mathematical modeling forms for correlation filter object tracking algorithms and the fast calculation for the filter, Electron. Res. Arch., 32 (2024), 4684–4714. https://doi.org/10.3934/era.2024213 doi: 10.3934/era.2024213
[48]
H. Fan, L. Lin, F. Yang, P. Chu, G. Deng, S. Yu, et al., LaSOT: A high-quality benchmark for large-scale single object tracking, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), 5369–5378. https://doi.org/10.1109/CVPR.2019.00552
[49]
Y. Wu, J. Lim, M. -H. Yang, Online object tracking: A benchmark, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2013), 2411–2418. https://doi.org/10.1109/CVPR.2013.312
Y. Huang, Y. Chen, C. Lin, Q. Hu, J. Song, Visual attention learning and antiocclusion-based correlation filter for visual object tracking, J. Electron. Imaging, 32 (2023), 23. https://doi.org/10.1117/1.JEI.32.1.013023 doi: 10.1117/1.JEI.32.1.013023
[52]
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 770–778.
[53]
N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, S. Zagoruyko, End-to-end object detection with transformers, in European Conference on Computer Vision (ECCV), (2020), 213–229. https://doi.org/10.1007/978-3-030-58452-8_13
Y. Cui, C. Jiang, G. Wu, L. Wang, MixFormer: End-to-end tracking with iterative mixed attention, IEEE Trans. Pattern Anal. Mach. Intell., 46 (2024), 4129–4146. https://doi.org/10.1109/TPAMI.2024.3349519 doi: 10.1109/TPAMI.2024.3349519
[56]
J. Shen, Y. Liu, X. Dong, X. Lu, F. S. Khan, S. Hoi, Distilled siamese networks for visual tracking, IEEE Trans. Pattern Anal. Mach. Intell., 44 (2022), 8896–8909. https://doi.org/10.1109/TPAMI.2021.3127492 doi: 10.1109/TPAMI.2021.3127492
[57]
X. Dong, J. Shen, F. Porikli, J. Luo, L. Shao, Adaptive siamese tracking with a compact latent network, IEEE Trans. Pattern Anal. Mach. Intell., 45 (2023), 8049–8062. https://doi.org/10.1109/TPAMI.2022.3230064 doi: 10.1109/TPAMI.2022.3230064
[58]
Z. Cao, Z. Huang, L. Pan, S. Zhang, Z. Liu, C. Fu, Towards real-world visual tracking with temporal contexts, IEEE Trans. Pattern Anal. Mach. Intell., 45 (2023), 15834–15849. https://doi.org/10.1109/TPAMI.2023.3307174 doi: 10.1109/TPAMI.2023.3307174
[59]
Y. Yang, X. Gu, Attention-based gating network for robust segmentation tracking, IEEE Trans. Circuits Syst. Video Technol., 35 (2025), 245–258. https://doi.org/10.1109/TCSVT.2024.3460400 doi: 10.1109/TCSVT.2024.3460400
[60]
W. Han, X. Dong, Y. Zhang, D. Crandall, C. Z. Xu, J. Shen, Asymmetric Convolution: An efficient and generalized method to fuse feature maps in multiple vision tasks, IEEE Trans. Pattern Anal. Mach. Intell., 46 (2024), 7363–7376. https://doi.org/10.1109/TPAMI.2024.3400873 doi: 10.1109/TPAMI.2024.3400873
[61]
X. Zhu, Y. Wu, D. Xu, Z. Feng, J. Kittler, Robust visual object tracking via adaptive attribute-aware discriminative correlation filters, IEEE Trans. Multimedia, 23 (2021), 2625–2638. https://doi.org/10.1109/TMM.2021.3050073 doi: 10.1109/TMM.2021.3050073
[62]
M. Danelljan, H. Gustav, F. Shahbaz Khan, M. Felsberg, Learning spatially regularized correlation filters for visual tracking, in IEEE International Conference on Computer Vision (ICCV), (2015), 4310–4318. https://doi.org/10.1109/ICCV.2015.490
[63]
J. Valmadre, L. Bertinetto, J. Henriques, A. Vedaldi, P. H. S. Torr, End-to-end representation learning for correlation filter based tracking, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 5000–5008. https://doi.org/10.1109/CVPR.2017.531
[64]
G. Bhat, M. Danelljan, L. V. Gool, R. Timofte, Learning discriminative model prediction for tracking, in IEEE/CVF International Conference on Computer Vision (ICCV), (2019), 6182–6191. https://doi.org/10.1109/ICCV.2019.00628
[65]
N. Wang, W. Zhou, J. Wang, H. Li, Transformer meets tracker: exploiting temporal context for robust visual tracking, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2021), 1571–1580. https://doi.org/10.1109/CVPR46437.2021.00162
[66]
Z. Chen, B. Zhong, G. Li, S. Zhang, R. Ji, Siamese box adaptive network for visual tracking, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020), 6668–6677. https://doi.org/10.1109/CVPR42600.2020.00670
[67]
Y. Guo, H. Li, L. Zhang, L. Zhang, K. Deng, F. Porikli, SiamCAR: Siamese fully convolutional classification and regression for visual tracking, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020), 1176–1185. https://doi.org/10.1109/CVPR42600.2020.00630
[68]
D. Xing, N. Evangeliou, A. Tsoukalas, A. Tzes, Siamese transformer pyramid networks for real-time UAV tracking, in IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), (2022), 1898–1907. https://doi.org/10.1109/WACV51458.2022.00196
Table 2.
Orthonormal temporal frequencies for 109 days (upper block or F109) and orthonormal spatial frequencies for 117 countries (lower block or F117).
Table A3.
Inner products w.r.t. temporal (case: upper top and death: upper bottom blocks) and spatial (case: lower top and death: lower bottom blocks) frequencies at a national level.
Table A4.
Temporal inner product for continents (World, Africa, Asia, Europe, North and South America) w.r.t. daily biweekly growth rates of cases (upper block) and deaths (lower block) from March 22nd, 2020 to June 11th, 2021 (447 days).
Table A5.
Distance matrices for daily biweekly growth rates of cases (uppermost block) and deaths (2nd block) w.r.t. temporal frequencies and the ones of cases (3rd block) and deaths (bottommost block) w.r.t. spatial frequencies.
Table A6.
Variability for 117 countries with respect to daily biweekly growth rates of cases and death under temporal frequencies and variability for 109 days with respect to daily biweekly growth rates of cases and death under spatial frequencies.
Figure 1. The overall architecture of the MCWTT module
Figure 2. Diagram of the FFM architecture
Figure 3. The signal flow diagram of the FEB
Figure 4. The flowchart of the bounding box prediction head
Figure 5. Tracker performance comparison on the OTB100 dataset. (a) Success plot; (b) precision plot
Figure 6. The success plots for various challenging attributes on the OTB100 dataset: (a) BC; (b) DEF; (c) FM; (d) IPR; (e) IV; (f) LR; (g) MB; (h) OCC; (i) OPR; (j) OV; (k) SV
Figure 7. The precision plots for various challenging attributes on the OTB100 dataset: (a) BC; (b) DEF; (c) FM; (d) IPR; (e) IV; (f) LR; (g) MB; (h) OCC; (i) OPR; (j) OV; (k) SV
Figure 8. Tracker performance comparison on the UAV123 dataset. (a) Success plot; (b) precision plot
Figure 9. The success plots for various challenging attributes on the UAV123 dataset: (a) SV; (b) ARC; (c) LR; (d) FM; (e) FO; (f) PO; (g) OV; (h) BC; (i) IV; (j) VC; (k) CM; (l) SO
Figure 10. The precision plots for various challenging attributes on the UAV123 dataset: (a) SV; (b) ARC; (c) LR; (d) FM; (e) FO; (f) PO; (g) OV; (h) BC; (i) IV; (j) VC; (k) CM; (l) SO
Figure 11. Visualization of tracking performance on different video sequences
Figure 12. Performance comparison of the MCWTT and MCWTT-S models on the OTB100 dataset. (a) Success plot; (b) precision plot
Figure 13. Validation of the average overlap rate and central error rate curves of models with and without multi-scale windows in the "Car2" video sequence. (a) Average overlap rate; (b) central error rate
Figure 14. Comparison of models with and without multi-scale windows