Infrared and visible image fusion (IVIF) is devoted to extracting and integrating useful complementary information from muti-modal source images. Current fusion methods usually require a large number of paired images to train the models in supervised or unsupervised way. In this paper, we propose CTFusion, a convolutional neural network (CNN)-Transformer-based IVIF framework that uses self-supervised learning. The whole framework is based on an encoder-decoder network, where encoders are endowed with strong local and global dependency modeling ability via the CNN-Transformer-based feature extraction (CTFE) module design. Thanks to the development of self-supervised learning, the model training does not require ground truth fusion images with simple pretext task. We designed a mask reconstruction task according to the characteristics of IVIF, through which the network can learn the characteristics of both infrared and visible images and extract more generalized features. We evaluated our method and compared it to five competitive traditional and deep learning-based methods on three IVIF benchmark datasets. Extensive experimental results demonstrate that our CTFusion can achieve the best performance compared to the state-of-the-art methods in both subjective and objective evaluations.
Citation: Keying Du, Liuyang Fang, Jie Chen, Dongdong Chen, Hua Lai. CTFusion: CNN-transformer-based self-supervised learning for infrared and visible image fusion[J]. Mathematical Biosciences and Engineering, 2024, 21(7): 6710-6730. doi: 10.3934/mbe.2024294
Related Papers:
[1]
Youtian Hao, Guohua Yan, Renjun Ma, M. Tariqul Hasan .
Linking dynamic patterns of COVID-19 spreads in Italy with regional characteristics: a two level longitudinal modelling approach. Mathematical Biosciences and Engineering, 2021, 18(3): 2579-2598.
doi: 10.3934/mbe.2021131
[2]
Weike Zhou, Aili Wang, Fan Xia, Yanni Xiao, Sanyi Tang .
Effects of media reporting on mitigating spread of COVID-19 in the early phase of the outbreak. Mathematical Biosciences and Engineering, 2020, 17(3): 2693-2707.
doi: 10.3934/mbe.2020147
[3]
Marco Roccetti .
Excess mortality and COVID-19 deaths in Italy: A peak comparison study. Mathematical Biosciences and Engineering, 2023, 20(4): 7042-7055.
doi: 10.3934/mbe.2023304
[4]
Marco Roccetti .
Drawing a parallel between the trend of confirmed COVID-19 deaths in the winters of 2022/2023 and 2023/2024 in Italy, with a prediction. Mathematical Biosciences and Engineering, 2024, 21(3): 3742-3754.
doi: 10.3934/mbe.2024165
[5]
Tianfang Hou, Guijie Lan, Sanling Yuan, Tonghua Zhang .
Threshold dynamics of a stochastic SIHR epidemic model of COVID-19 with general population-size dependent contact rate. Mathematical Biosciences and Engineering, 2022, 19(4): 4217-4236.
doi: 10.3934/mbe.2022195
[6]
S. H. Sathish Indika, Norou Diawara, Hueiwang Anna Jeng, Bridget D. Giles, Dilini S. K. Gamage .
Modeling the spread of COVID-19 in spatio-temporal context. Mathematical Biosciences and Engineering, 2023, 20(6): 10552-10569.
doi: 10.3934/mbe.2023466
[7]
Aili Wang, Xueying Zhang, Rong Yan, Duo Bai, Jingmin He .
Evaluating the impact of multiple factors on the control of COVID-19 epidemic: A modelling analysis using India as a case study. Mathematical Biosciences and Engineering, 2023, 20(4): 6237-6272.
doi: 10.3934/mbe.2023269
[8]
Sarah R. Al-Dawsari, Khalaf S. Sultan .
Modeling of daily confirmed Saudi COVID-19 cases using inverted exponential regression. Mathematical Biosciences and Engineering, 2021, 18(3): 2303-2330.
doi: 10.3934/mbe.2021117
[9]
Peng An, Xiumei Li, Ping Qin, YingJian Ye, Junyan Zhang, Hongyan Guo, Peng Duan, Zhibing He, Ping Song, Mingqun Li, Jinsong Wang, Yan Hu, Guoyan Feng, Yong Lin .
Predicting model of mild and severe types of COVID-19 patients using Thymus CT radiomics model: A preliminary study. Mathematical Biosciences and Engineering, 2023, 20(4): 6612-6629.
doi: 10.3934/mbe.2023284
[10]
Yuto Omae, Yohei Kakimoto, Makoto Sasaki, Jun Toyotani, Kazuyuki Hara, Yasuhiro Gon, Hirotaka Takahashi .
SIRVVD model-based verification of the effect of first and second doses of COVID-19/SARS-CoV-2 vaccination in Japan. Mathematical Biosciences and Engineering, 2022, 19(1): 1026-1040.
doi: 10.3934/mbe.2022047
Abstract
Infrared and visible image fusion (IVIF) is devoted to extracting and integrating useful complementary information from muti-modal source images. Current fusion methods usually require a large number of paired images to train the models in supervised or unsupervised way. In this paper, we propose CTFusion, a convolutional neural network (CNN)-Transformer-based IVIF framework that uses self-supervised learning. The whole framework is based on an encoder-decoder network, where encoders are endowed with strong local and global dependency modeling ability via the CNN-Transformer-based feature extraction (CTFE) module design. Thanks to the development of self-supervised learning, the model training does not require ground truth fusion images with simple pretext task. We designed a mask reconstruction task according to the characteristics of IVIF, through which the network can learn the characteristics of both infrared and visible images and extract more generalized features. We evaluated our method and compared it to five competitive traditional and deep learning-based methods on three IVIF benchmark datasets. Extensive experimental results demonstrate that our CTFusion can achieve the best performance compared to the state-of-the-art methods in both subjective and objective evaluations.
1.
Introduction
Since the emergence of the COVID-19 pandemic around December 2019, the outbreak has snowballed globally [1,2], and there is no clear sign that the new confirmed cases and deaths are coming to an end. Though vaccines are rolling out to deter the spread of this pandemic, the mutations of the viruses are already under way [3,4,5,6]. Despite the fact and research that the origin of the pandemic is still in debate [7], many researchers are conducting their study from different aspects and perspectives. They could be categorised mainly into three levels: SARS-CoV-2 genetic level [8], COVID-19 individual country level [9,10,11] and continental levels [12,13]. In this study, we focus on the latter two levels. Regarding these two levels, there are many methods and techniques on these issues. For example, linear and non-linear growth models, together with 2-week-kernel-window regression, are exploited in modelling the exponential growth rate of COVID-19 confirmed cases [14] - which are also generalised to non-linear modelling of COVID-19 pandemic [15,16]. Some research works focus on the prediction of COVID-19 spread by estimating the lead-lag effects between different countries via time warping technique [17], while some utilise clustering analyses to group countries via epidemiological data of active cases, active cases per population, etc.[18]. In addition, there are other researches focusing on tackling the relationship between economic variables and COVID-19 related variables [19,20] - though both the results show there are no relation between economic freedom and COVID-19 deaths and no relation between the performance of equality markets and the COVID-19 cases and deaths.
In this study, we aim to extract the features of daily biweekly growth rates of cases and deaths on national and continental levels. We devise the orthonormal bases based on Fourier analysis [21,22], in particular Fourier coefficients for the potential features. For the national levels, we import the global time series data and sample 117 countries for 109 days [23,24]. Then we calculate the Euclidean distance matrices for the inner products between countries and between days. Based on the distance matrices, we then calculate their variabilities to delve into the distribution of the data. For the continental level, we also import the biweekly changes of cases and deaths for 5 continents as well as the world data with time series data for 447 days. Then we calculate their inner products with respect to the temporal frequencies and find the similarities of extracted features between continents.
For the national levels, the biweekly data bear higher temporal features than spatial features, i..e., as time goes by, the pandemic evolves more in the time dimension than the space (or country-wise) dimension. Moreover, there exists a strong concurrency between features for biweekly changes of cases and deaths, though there is no clear or stable trend for the extracted features. However, in the continental level, one observes that there is a stable trend of features regarding biweekly change. In addition, the extracted features between continents are similar to one another, except Asia whose features bear no clear similarities with other continents.
Our approach is based on orthonormal bases, which serve as the potential features for the biweekly change of cases and deaths. This method is straightforward and easy to comprehend. The limitations of this approach are the extracted features are based on the hidden frequencies of the dynamical structure, which is hard to assign a interpretable meaning for the frequencies, and the data fetched are not complete, due to the missing data in the database. However the results provided in this study could help one map out the evolutionary features of COVID-19.
2.
Method and implementation
Let δ:N→{0,1} be a function such that δ(n)=0 (or δn=0), if n∈2N and δ(n)=1, if n∈2N+1. Given a set of point data D={→v}⊆RN, we would like to decompose each →v into some frequency-based vectors by Fourier analysis. The features of COVID-19 case and death growth rates are specified by the orthogonal frequency vectors BN={→fij:1≤j≤N}Ni=1, which is based on Fourier analysis, in particular Fourier series [22], where
● →f1j=√1N for all 1≤j≤N;
● For any 2≤i≤N−1+δN,
→fij=√2N⋅cos[π2⋅δi−(i−δi)⋅πN⋅j;
(2.1)
● If N∈2N, then →fNj=√1N⋅cos(j⋅π) for all 1≤j≤N.
Now we have constructed an orthonormal basis FN={→f1,→f2,⋯,→fN} as features for RN. Now each →v=N∑i=1<→v,→fi>⋅→fi, where <,> is the inner product. The basis BN could also be represented by a matrix
and the representation of a data column vector →v={(-3,14,5,8,-12)} with respect to B5 is calculated by F5→v=[<→v,→fi>]5i=1 or a column vector or 5-by-1 matrix (5.367,-16.334,-3.271,-6.434,-9.503).
2.1. Data description and handling
There are two main parts of data collection and handling - one for individual countries (or national level) and the other for individual continents (or continental level). In both levels, we fetch the daily biweekly growth rates of confirmed COVID-19 cases and deaths from Our World in Data [23,24]. Then we use R programming 4.1.0 to handle the data and implement the procedures.
Sampled targets: national. After filtering out non-essential data and missing data, the effective sampled data are 117 countries with effective sampled 109 days as shown in Results. The days range from December 2020 to June 2021. Though the sampled days are not subsequent ones (due to the missing data), the biweekly information could still cover such loss. In the latter temporal and spatial analyses, we will conduct our study based on these data.
Sampled targets: continental. As for the continental data, we collect data regarding the world, Africa, Asia, Europe, North and South America. The sampled days range from March 22nd, 2020 to June 11th, 2021. In total, there are 449 days (this is different from the national level). In the latter temporal analysis (there is no spatial analysis in the continental level, due to the limited sampling size), we will conduct our study based on these data.
Notations: national. For further processing, let us utilise some notations to facilitate the introduction. Let the sampled countries be indexed by i=1,...,117. Let the sampled days be indexed by t=1,...,109. Days range from December 3rd 2020 to May 31st 2021. Let ci(t) and di(t) be the daily biweekly growth rates of confirmed cases and deaths in country i on day t, respectively, i.e.,
ci(t):=casei,t+13−casei,tcasei,t;
(2.3)
di(t):=deathi,t+13−deathi,tdeathi,t,
(2.4)
where casei,t and deathi,t denote the total confirmed cases and deaths for country i at day t, respectively. We form temporal and spatial vectors by
the vector ci and di give every count in time for a given country, and the vector v(t) and w(t) give every countries' count for a given time.
Notations: continental. For further processing, let us utilise some notations to facilitate the introduction. Let the sampled continents be indexed by j=1,...,6. Let the 447 sampled days range from March 22nd 2020 to June 11th 2021. We form temporal vectors for confirmed cases and deaths by
For any m-by-n matrix A, we use min(A) to denote the value min{aij:1≤i≤m;1≤j≤n}. Similarly, we define max(A) by the same manner. If →v is a vector, we define min(→v) and max(→v) in the same manner. The implementation goes as follows:
(1) Extract and trim and source data.
Extraction: national. Extract the daily biweekly growth rates of COVID-19 cases and deaths from the database and trim the data. The trimmed data consist of 109 time series data for 117 countries as shown in Table 1, which consists of two 117-by-109 matrices:
Row i in the matrices are regarded as temporal vectors ci and di respectively, and Column t in the matrices are regarded as spatial vectors v(t) and w(t) respectively.
Extraction: continental. As for the continental data, they are collected by two 6-by-447 matrices:
Biweekly_cont_cases=[xj(τ)]τ=1:447j=1:6;
Biweekly_cont_deaths=[yj(t)]τ=1:447j=1:6.
(2) Specify the frequencies (features) for the imported data.
Basis: national. In order to decompose ci and di into some fundamental features, we specify F109 as the corresponding features, whereas to decompose v(t) and w(t), we specify F117 as the corresponding features. The results are presented in Table 2.
Table 2.
Orthonormal temporal frequencies for 109 days (upper block or F109) and orthonormal spatial frequencies for 117 countries (lower block or F117).
Average variability: continental. For each continent j, the temporal variabilities for confirmed cases and deaths are computed by
var_cont_case_time[j]=6∑k=1dE(F447xj,F447xk)447;
var_cont_death_time[j]=6∑k=1dE(F447yj,F447yk)447.
(6) Unify the national temporal and spatial variabilities of cases and deaths. For each country i, the unified temporal and spatial variabilities for cases and deaths are defined by
where σij and βij denotes the value in the (i,j) cells of IP_cont_cases_time and IP_cont_deaths_time, respectively. The results are visualised by figures in Results.
3.
Results
There are two main parts of results shown in this section: national results and continental results.
National results. Based on the method mentioned in section 2, we identify the temporal orthonormal frequencies and spatial ones as shown in Table 2.
The computed inner products at country levels, served as the values for extracted features, for daily biweekly growth rates of cases and deaths with respect to temporal frequencies are shown in Figure 1. Similarly, the computed inner products at a country level for daily biweekly growth rates of cases and deaths with respect to spatial frequencies are shown in Figure 2. Meanwhile, their scaled variabilities are plotted in Figure 3.
Figure 1.
Inner products between growth rates of cases (in solid line) over 109 temporal frequencies; and inner products between growth rates of deaths (in dotted line) over 109 temporal frequencies for some demonstrative countries: Afghanistan, Albania, Algeria, Uruguay, Zambia, and Zimbabwe.
Figure 2.
Inner products between growth rates of cases (in solid line) over 117 spatial frequencies; and inner products between growth rates of deaths (in dotted line) over 117 spatial frequencies for some demonstrative dates: 2020/12/3, 2020/12/4, 2020/12/5, 2021/5/29, 2021/5/30, and 2021/5/31.
Continental results. According to the obtained data, we study and compare continental features of daily biweekly growth rates of confirmed cases and deaths of Africa, Asia, Europe, North America, South America and World. Unlike the missing data in analysing individual countries, the continental data are complete. We take the samples from March 22nd, 2020 to June 11th, 2021. In total, there are 447 days for the analysis. The cosine values which compute the similarities between representations for continents are shown in Table 3. The results of the unified inner products with respect to confirmed cases and deaths are plotted in Figures 4 and 5, respectively.
Table 3.
Cosine values (similarities) between World, Africa, Asia, Europe, North America (No. Am.), and South America (So. Am.).
Figure 5.
Unified inner product, or UIP, for world, Africa, Asia, Europe, North and South America with respect to daily biweekly growth rates of deaths.
Other auxiliary results that support the plotting of the graphs are also appended in Appendix. The names of the sampled 117 countries are provided in Tables A1 and A2. The dates of the sampled days are provided in Figure A1. The tabulated results for inner product of temporal and spatial frequencies on a national level are provided in Table A3. The tabulated results for inner product of temporal frequencies on a continental level are provided in Table A4. The Euclidean distance matrices for temporal and spatial representations with respect to confirmed cases and deaths are tabulated in Table A5 and their average variabilities are tabulated in Table A6.
Summaries of results. Based on the previous tables and figures, we have the following results.
(1) From Figures 1 and 2, one observes that the temporal features are much more distinct that the spatial features, i.e., if one fixes one day and extracts the features from the spatial frequencies, he obtains less distinct features when comparing with fixing one country and extracting the features from the temporal frequencies. This indicates that SARS-CoV-2 evolves and mutates mainly according to time than space.
(2) For individual countries, the features for the biweekly changes of cases are almost concurrent with those of deaths. This indicates biweekly changes of cases and deaths share the similar features. In some sense, the change of deaths is still in tune with the change of confirmed cases, i.e., there is no substantial change between their relationship.
(3) For individual countries, the extracted features go up and down intermittently and there is no obvious trend. This indicates the virus is still very versatile and hard to capture its fixed features in a country-level.
(4) From Figure 3, one observes that there is a clear similarities, in terms of variabilities, for both daily biweekly growth rates of cases and deaths under temporal frequencies. Moreover, the distribution of overall data is not condensed, where middle, labelled countries are scattering around the whole data. This indicates the diversity of daily biweekly growth rates of cases and deaths across countries is still very high.
(5) From Figure 3, the daily biweekly growth rates of deaths with respect to the spatial frequencies are fairly concentrated. This indicates the extracted features regarding deaths are stable, i.e., there are clearer and stabler spatial features for daily biweekly growth rates of deaths.
(6) Comparing the individual graphs in Figures 4 and 5, they bear pretty much the same shape, but in different scale - with death being higher feature oriented (this is also witnessed in a country-level as claimed in the first result above). This indicates there is a very clear trend of features regarding daily biweekly growth rates in a continental level (this is a stark contrast to the third claimed result above).
(7) From Figures 4 and 5, the higher values of inner products lie in both endpoints for biweekly change of cases and deaths, i.e., low temporal frequencies and high temporal frequencies for all the continents, except the biweekly change of deaths in Asia. This indicates the evolutionary patterns in Asia are very distinct from other continents.
(8) From Table 3, the extracted features are all very similar to each continents, except Asia. This echoes the above result.
4.
Conclusions and future work
In this study, we identify the features of daily biweekly growth rates of COVID-19 confirmed cases and deaths via orthonormal bases (features) which derive from Fourier analysis. Then we analyse the inner products which represent the levels of chosen features. The variabilities for each country show the levels of deaths under spatial frequencies are much more concentrated than others. The generated results are summarised in Results 3. There are some limitations in this study and future improvements to be done:
● The associated meanings of the orthonormal features from Fourier analysis are not yet fully explored;
● We use the Euclidean metric to measure the distances between features, which is then used to calculate the variabilities. Indeed Euclidean metric is noted for its geographical properties, but may not be the most suitable in the context of frequencies. One could further introduce other metrics and apply machine learning techniques to find out the optimal ones.
● In this study, we choose the daily biweekly growth rates of confirmed cases and deaths as our research sources. This is a one-sided story. To obtain a fuller picture of the dynamical features, one could add other variables for comparison.
Acknowledgements
This work is supported by the Humanities and Social Science Research Planning Fund Project under the Ministry of Education of China (No. 20XJAGAT001).
Conflict of interest
No potential conflict of interest was reported by the authors.
Table A3.
Inner products w.r.t. temporal (case: upper top and death: upper bottom blocks) and spatial (case: lower top and death: lower bottom blocks) frequencies at a national level.
Table A4.
Temporal inner product for continents (World, Africa, Asia, Europe, North and South America) w.r.t. daily biweekly growth rates of cases (upper block) and deaths (lower block) from March 22nd, 2020 to June 11th, 2021 (447 days).
Table A5.
Distance matrices for daily biweekly growth rates of cases (uppermost block) and deaths (2nd block) w.r.t. temporal frequencies and the ones of cases (3rd block) and deaths (bottommost block) w.r.t. spatial frequencies.
Table A6.
Variability for 117 countries with respect to daily biweekly growth rates of cases and death under temporal frequencies and variability for 109 days with respect to daily biweekly growth rates of cases and death under spatial frequencies.
Y. Liu, X. Chen, Z. Wang, Z. Wang, R. K. Ward, X. Wang, Deep learning for pixel-level image fusion: Recent advances and future prospects, Inform. Fusion, 42 (2018), 158–173. https://doi.org/10.1016/j.inffus.2017.10.007 doi: 10.1016/j.inffus.2017.10.007
[2]
H. Zhang, H. Xu, X. Tian, J. Jiang, J. Ma, Image fusion meets deep learning: A survey and perspective, Inform. Fusion, 76 (2021), 323–336. https://doi.org/10.1016/j.inffus.2021.06.008 doi: 10.1016/j.inffus.2021.06.008
[3]
J. Ma, Y. Ma, C. Li, Infrared and visible image fusion methods and applications: A survey, Inform. Fusion, 45 (2019), 153–178. https://doi.org/10.1016/j.inffus.2018.02.004 doi: 10.1016/j.inffus.2018.02.004
[4]
C. Yang, J. Zhang, X. Wang, X. Liu, A novel similarity based quality metric for image fusion, Inform. Fusion, 9 (2008), 156–160. https://doi.org/10.1016/j.inffus.2006.09.001 doi: 10.1016/j.inffus.2006.09.001
[5]
X. Zhang, P. Ye, G. Xiao, VIFB: a visible and infrared image fusion benchmark, in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), (2020), 468–478. https://doi.org/10.1109/CVPRW50498.2020.00060
[6]
L. J. Chipman, T. M. Orr, L. N. Graham, Wavelets and image fusion, in International Conference on Image Processing, (1995), 248–251. https://doi.org/10.1109/ICIP.1995.537627
[7]
A. V. Vanmali, V. M. Gadre, Visible and NIR image fusion using weight-map-guided Laplacian-Gaussian pyramid for improving scene visibility, Sādhanā, 42 (2017), 1063–1082. https://doi.org/10.1007/s12046-017-0673-1 doi: 10.1007/s12046-017-0673-1
[8]
L. Sun, Y. Li, M. Zheng, Z. Zhong, Y. Zhang, MCnet: Multiscale visible image and infrared image fusion network, Signal Process., 208 (2023), 108996. https://doi.org/10.1016/j.sigpro.2023.108996 doi: 10.1016/j.sigpro.2023.108996
[9]
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 770–778. https://doi.org/10.1109/CVPR.2016.90
[10]
J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: Unified, real-time object detection, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 779–788. https://doi.org/10.1109/CVPR.2016.91
[11]
Y. Zhang, Y. Wang, H. Li, S. Li, Cross-compatible embedding and semantic consistent feature construction for sketch re-identification, in Proceedings of the 30th ACM International Conference on Multimedia, (2022), 3347–3355. https://doi.org/10.1145/3503161.3548224
[12]
H. Li, N. Dong, Z. Yu, D. Tao, G. Qi, Triple adversarial learning and multi-view imaginative reasoning for unsupervised domain adaptation person re-identification, IEEE Trans. Circuits Syst. Video Technol., 32 (2022), 2814–2830. https://doi.org/10.1109/TCSVT.2021.3099943 doi: 10.1109/TCSVT.2021.3099943
[13]
O. Ronneberger, P. Fischer, T. Brox, U-Net: Convolutional networks for biomedical image segmentation, in Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015, 9351 (2015), 234–241. https://doi.org/10.1007/978-3-319-24574-4_28
[14]
L. Tang, H. Huang, Y. Zhang, G. Qi, Z. Yu, Structure-embedded ghosting artifact suppression network for high dynamic range image reconstruction, Knowl.-Based Syst., 263 (2023), 110278. https://doi.org/10.1016/j.knosys.2023.110278 doi: 10.1016/j.knosys.2023.110278
[15]
H. Li, X. Wu, DenseFuse: A fusion approach to infrared and visible images, IEEE Trans. Image Process., 28 (2019), 2614–2623. https://doi.org/10.1109/TIP.2018.2887342 doi: 10.1109/TIP.2018.2887342
[16]
H. Li, J. Liu, Y. Zhang, Y. Liu, A deep learning framework for infrared and visible image fusion without strict registration, Int. J. Comput. Vision, (2024), 1625–1644. https://doi.org/10.1007/s11263-023-01948-x doi: 10.1007/s11263-023-01948-x
[17]
L. Qu, S. Liu, M. Wang, Z. Song, Transmef: A transformer-based multi-exposure image fusion framework using self-supervised multi-task learning, preprint, arXiv: 2112.01030.
[18]
K. He, X. Chen, S. Xie, Y. Li, P. Dollár, R. Girshick, Masked autoencoders are scalable vision learners, in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2022), 15979–15988. https://doi.org/10.1109/CVPR52688.2022.01553
[19]
S. Li, X. Kang, L. Fang, J. Hu, H. Yin, Pixel-level image fusion: A survey of the state of the art, Inform. Fusion, 33 (2017), 100–112. https://doi.org/10.1016/j.inffus.2016.05.004 doi: 10.1016/j.inffus.2016.05.004
[20]
H. Li, X. Qi, W. Xie, Fast infrared and visible image fusion with structural decomposition, Knowl.-Based Syst., 204 (2020), 106182. https://doi.org/10.1016/j.knosys.2020.106182 doi: 10.1016/j.knosys.2020.106182
[21]
Q. Zhang, Y. Liu, R. S. Blum, J. Han, D. Tao, Sparse representation based multi-sensor image fusion for multi-focus and multi-modality images: A review, Inform. Fusion, 40 (2018), 57–75. https://doi.org/10.1016/j.inffus.2017.05.006 doi: 10.1016/j.inffus.2017.05.006
[22]
M. Xie, J. Wang, Y. Zhang A unified framework for damaged image fusion and completion based on low-rank and sparse decomposition, Signal Process.: Image Commun., 98 (2021), 116400. https://doi.org/10.1016/j.image.2021.116400 doi: 10.1016/j.image.2021.116400
[23]
H. Li, Y. Wang, Z. Yang, R. Wang, X. Li, D. Tao, Discriminative dictionary learning-based multiple component decomposition for detail-preserving noisy image fusion, IEEE Trans. Instrum. Meas., 69 (2020), 1082–1102. https://doi.org/10.1109/TIM.2019.2912239 doi: 10.1109/TIM.2019.2912239
[24]
W. Xiao, Y. Zhang, H. Wang, F. Li, H. Jin, Heterogeneous knowledge distillation for simultaneous infrared-visible image fusion and super-resolution, IEEE Trans. Instrum. Meas., 71 (2022), 1–15. https://doi.org/10.1109/TIM.2022.3149101 doi: 10.1109/TIM.2022.3149101
[25]
Y. Zhang, M. Yang, N. Li, Z. Yu, Analysis-synthesis dictionary pair learning and patch saliency measure for image fusion, Signal Process., 167 (2020), 107327. https://doi.org/10.1016/j.sigpro.2019.107327 doi: 10.1016/j.sigpro.2019.107327
[26]
Y. Niu, S. Xu, L. Wu, W. Hu, Airborne infrared and visible image fusion for target perception based on target region segmentation and discrete wavelet transform, Math. Probl. Eng., 2012 (2012), 1–10. https://doi.org/10.1155/2012/275138 doi: 10.1155/2012/275138
[27]
D. M. Bulanon, T. F. Burks, V. Alchanatis, Image fusion of visible and thermal images for fruit detection, Biosyst. Eng., 103 (2009), 12–22. https://doi.org/10.1016/j.biosystemseng.2009.02.009 doi: 10.1016/j.biosystemseng.2009.02.009
[28]
M. Choi, R. Y. Kim, M. Nam, H. O. Kim, Fusion of multispectral and panchromatic satellite images using the curvelet transform, IEEE Geosci. Remote Sens. Lett., 2 (2005), 136–140. https://doi.org/10.1109/LGRS.2005.845313 doi: 10.1109/LGRS.2005.845313
[29]
Y. Liu, X. Chen, J. Cheng, H. Peng, Z. Wang, Infrared and visible image fusion with convolutional neural networks, Int. J. Wavelets, Multiresolution Inf. Process., 16 (2018), 1850018. https://doi.org/10.1142/S0219691318500182 doi: 10.1142/S0219691318500182
[30]
J. Ma, W. Yu, P. Liang, C. Li, J. Jiang, FusionGAN: A generative adversarial network for infrared and visible image fusion, Inform. Fusion, 48 (2019), 11–26. https://doi.org/10.1016/j.inffus.2018.09.004 doi: 10.1016/j.inffus.2018.09.004
[31]
H. Li, Y. Cen, Y. Liu, X. Chen, Z. Yu, Different input resolutions and arbitrary output resolution: A meta learning-based deep framework for infrared and visible image fusion, IEEE Trans. Image Process., 30 (2021), 4070–4083. https://doi.org/10.1109/TIP.2021.3069339 doi: 10.1109/TIP.2021.3069339
[32]
H. Zhang, J. Ma, SDNet: A versatile squeeze-and-decomposition network for real-time image fusion, Int. J. Comput. Vision, 129 (2021), 2761–2785. https://doi.org/10.1007/s11263-021-01501-8 doi: 10.1007/s11263-021-01501-8
[33]
H. Xu, J. Ma, J. Jiang, X. Guo, H. Ling, U2fusion: A unified unsupervised image fusion network, IEEE Trans. Pattern Anal. Mach. Intell., 44 (2020), 502–518. https://doi.org/10.1109/TPAMI.2020.3012548 doi: 10.1109/TPAMI.2020.3012548
[34]
L. Tang, J. Yuan, J. Ma, Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network, Inform. Fusion, 82 (2022), 28–42. https://doi.org/10.1016/j.inffus.2021.12.004 doi: 10.1016/j.inffus.2021.12.004
[35]
J. Liu, X. Fan, Z. Huang, G. Wu, R. Liu, W. Zhong, et al., Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection, in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2022), 5792–5801. https://doi.org/10.1109/CVPR52688.2022.00571
[36]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, et al., Attention is all you need, in Proceedings of the 31st International Conference on Neural Information Processing Systems, (2017), 6000–6010.
[37]
H. Chen, Y. Wang, T. Guo, C, Xu, Y. Deng, Z. Liu, et al., Pre-trained image processing transformer, in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2021), 12294–12305. https://doi.org/10.1109/CVPR46437.2021.01212
[38]
X. Zhu, W. Su, L. Lu, B. Li, X. Wang, J. Dai, Deformable DETR: Deformable transformers for end-to-end object detection, preprint, arXiv: 2010.04159.
[39]
S. Zheng, J. Lu, H. Zhao, X. Zhu, Z. Luo, Y. Wang, et al., Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers, in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2021), 6877–6886. https://doi.org/10.1109/CVPR46437.2021.00681
[40]
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, et al., An image is worth 16×16 words: Transformers for image recognition at scale, in International Conference on Learning Representations ICLR 2021, (2021).
[41]
K. Han, A. Xiao, E. Wu, J. Guo, C. Xu, Y. Wang, Transformer in transformer, in 35th Conference on Neural Information Processing Systems (NeurIPS 2021), (2021), 1–12.
[42]
C. Chen, R. Panda, Q. Fan, RegionViT: Regional-to-local attention for vision transformers, preprint, arXiv: 2106.02689.
[43]
V. Vs, J. M. J. Valanarasu, P. Oza, V. M. Patel, Image fusion transformer, in 2022 IEEE International Conference on Image Processing (ICIP), (2022), 3566–3570. https://doi.org/10.1109/ICIP46576.2022.9897280
[44]
Y. Zhang, Y. Tian, Y. Kong, B. Zhong, Y. Fu, Residual dense network for image restoration, IEEE Trans. Pattern Anal. Mach. Intell., 43 (2021), 2480–2495. https://doi.org/10.1109/TPAMI.2020.2968521 doi: 10.1109/TPAMI.2020.2968521
[45]
R. Hou, D. Zhou, R. Nie, D. Liu, L. Xiong, Y. Guo, et al., VIF-Net: An unsupervised framework for infrared and visible image fusion, IEEE Trans. Comput. Imaging, 6 (2020), 640–651. https://doi.org/10.1109/TCI.2020.2965304 doi: 10.1109/TCI.2020.2965304
[46]
J. Liu, Y. Wu, Z. Huang, R. Liu, X. Fan, SMoA: Searching a modality-oriented architecture for infrared and visible image fusion, IEEE Signal Process. Lett., 28 (2021), 1818–1822. https://doi.org/10.1109/LSP.2021.3109818 doi: 10.1109/LSP.2021.3109818
[47]
Y. Zhang, Y. Liu, P. Sun, H. Yan, X. Zhao, L. Zhang, IFCNN: A general image fusion framework based on convolutional neural network, Inform. Fusion, 54 (2020), 99–118. https://doi.org/10.1016/j.inffus.2019.07.011 doi: 10.1016/j.inffus.2019.07.011
[48]
H. Zhang, H. Xu, Y. Xiao, X. Guo, J. Ma, Rethinking the image fusion: A fast unified image fusion network based on proportional maintenance of gradient and intensity, in Proceedings of the AAAI Conference on Artificial Intelligence, 34 (2020), 12797–12804. https://doi.org/10.1609/aaai.v34i07.6975
[49]
H. Li, X. Wu, CrossFuse: A novel cross attention mechanism based infrared and visible image fusion approach, Inform. Fusion, 103 (2024), 102147. https://doi.org/10.1016/j.inffus.2023.102147 doi: 10.1016/j.inffus.2023.102147
[50]
H. Li, X. Wu, J. Kittler, RFN-Nest: An end-to-end residual fusion network for infrared and visible images, Inform. Fusion, 73 (2021), 72–86. https://doi.org/10.1016/j.inffus.2021.02.023 doi: 10.1016/j.inffus.2021.02.023
[51]
M. Deshmukh, U. Bhosale, Image fusion and image quality assessment of fused images, Int. J. Image Process., 4 (2010), 484–508.
[52]
H. Chen, P. K. Varshney, A human perception inspired quality metric for image fusion based on regional information, Inform. Fusion, 8 (2007), 193–207. https://doi.org/10.1016/j.inffus.2005.10.001 doi: 10.1016/j.inffus.2005.10.001
[53]
V. Aslantas, E. Bendes, A new image quality metric for image fusion: The sum of the correlations of differences, AEU-International J. Electron. Commun., 69 (2015), 1890–1896. https://doi.org/10.1016/j.aeue.2015.09.004 doi: 10.1016/j.aeue.2015.09.004
[54]
Z. Wang, A. C. Bovik, A universal image quality index, IEEE Signal Process. Lett., 9 (2002), 81–84. https://doi.org/10.1109/97.995823 doi: 10.1109/97.995823
[55]
D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, preprint, arXiv: 1412.6980.
[56]
A. Toet, The TNO multiband image data collection, Data Brief, 15 (2017), 249–251. https://doi.org/10.6084/m9.figshare.1008029.v2 doi: 10.6084/m9.figshare.1008029.v2
[57]
X. Jia, C. Zhu, M. Li, W. Tang, W. Zhou, LLVIP: A visible-infrared paired dataset for low-light vision, in 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), (2021), 3489–3497. https://doi.org/10.1109/ICCVW54120.2021.00389
H. Li, J. Zhao, J. Li, Z. Yu, G. Liu, Feature dynamic alignment and refinement for infrared–visible image fusion: Translation robust fusion, Inform. Fusion, 95 (2023), 26–41. https://doi.org/10.1016/j.inffus.2023.02.011 doi: 10.1016/j.inffus.2023.02.011
Table 2.
Orthonormal temporal frequencies for 109 days (upper block or F109) and orthonormal spatial frequencies for 117 countries (lower block or F117).
Table A3.
Inner products w.r.t. temporal (case: upper top and death: upper bottom blocks) and spatial (case: lower top and death: lower bottom blocks) frequencies at a national level.
Table A4.
Temporal inner product for continents (World, Africa, Asia, Europe, North and South America) w.r.t. daily biweekly growth rates of cases (upper block) and deaths (lower block) from March 22nd, 2020 to June 11th, 2021 (447 days).
Table A5.
Distance matrices for daily biweekly growth rates of cases (uppermost block) and deaths (2nd block) w.r.t. temporal frequencies and the ones of cases (3rd block) and deaths (bottommost block) w.r.t. spatial frequencies.
Table A6.
Variability for 117 countries with respect to daily biweekly growth rates of cases and death under temporal frequencies and variability for 109 days with respect to daily biweekly growth rates of cases and death under spatial frequencies.
Figure 1. Overall framework of our proposed method. (a) The proposed self-supervised masked image reconstruction network. (b) The image fusion architecture
Figure 2. Detailed structures of CNN-Transformer-based encoder, ConvBlock, and Transformer
Figure 3. Detailed framework of S-CPFF
Figure 4. Detailed structures of SA and CA
Figure 5. Vision quality comparison of our method with five SOTA fusion methods on the RoadScene dataset
Figure 6. Vision quality comparison of our method with five SOTA fusion methods on the TNO dataset
Figure 7. Vision quality comparison of our method with five SOTA fusion methods on the MSRS dataset
Figure 8. Vision effect of our method on the LLVIP and CT-MRI datasets. (a)–(c) are our fusion results on the LLVIP dataset, and (d)–(f) are our fusion results on the CT-MRI dataset
Figure 9. Vision quality comparison of the ablation study on proposed modules. From left to right, infrared image, visible image, and the results without self-supervised mask reconstruction task, results without Transformer-Block, results without S-CPFF, and results of our CTFusion