Research article Special Issues

Cross-view remote sensing and street-level data fusion for intelligent traffic congestion analysis

  • Published: 19 January 2026
  • MSC : 68T07

  • The issue of urban traffic congestion is a persistent problem for the sustainable management of cities through transportation systems, as there is a need for models that integrate and analyze heterogeneous sources to yield accurate, interpretable outcomes. This paper introduces the cross-view fusion network (CVF-Net), a new multimodal deep learning framework for analyzing congestion across entire cities by combining remote-sensing imagery (drone aerial views), street-view camera images, and graph-structured sensor data into a single model. This model is introduced through a very novel architecture that includes a hierarchical attention fusion transformer (HAFT), which fuses cross-view attention (CVA) between the aerial and ground view, a temporal graph neural network (TGNN) that uses a spatio-temporal dynamic, and a graph refinement (GR) network for consistency relative to the graph topology. Extensive experiments across three benchmarks (CityFlowV2, METR-LA, PEMS-BAY) demonstrate that CVF-Net consistently outperforms other recent state-of-the-art methods, reducing forecasting error (MAE) by 9.3% and increasing tracking continuity (IDF1) by 7.0%. Ablation studies suggest that hierarchical fusion and temporal modeling improve accuracy and stability, while sensitivity analyses show that attention maps capture congestion and causal temporal patterns, which are real symptoms of congestion. The model also shows strong cross-dataset generalizability and robustness to sensor noise, which extends its performance in the real world. Unlike existing spatio-temporal GNNs and multimodal Transformers that rely on flat feature aggregation or implicitly assume cross-view alignment, the proposed framework introduces a hierarchical, alignment-aware fusion strategy that explicitly integrates aerial visual context with graph-temporal traffic dynamics.

    Citation: Inzamam Mashood Nasir, Hend Alshaya, Sara Tehsin, Wided Bouchelligua. Cross-view remote sensing and street-level data fusion for intelligent traffic congestion analysis[J]. AIMS Mathematics, 2026, 11(1): 1547-1589. doi: 10.3934/math.2026065

    Related Papers:

  • The issue of urban traffic congestion is a persistent problem for the sustainable management of cities through transportation systems, as there is a need for models that integrate and analyze heterogeneous sources to yield accurate, interpretable outcomes. This paper introduces the cross-view fusion network (CVF-Net), a new multimodal deep learning framework for analyzing congestion across entire cities by combining remote-sensing imagery (drone aerial views), street-view camera images, and graph-structured sensor data into a single model. This model is introduced through a very novel architecture that includes a hierarchical attention fusion transformer (HAFT), which fuses cross-view attention (CVA) between the aerial and ground view, a temporal graph neural network (TGNN) that uses a spatio-temporal dynamic, and a graph refinement (GR) network for consistency relative to the graph topology. Extensive experiments across three benchmarks (CityFlowV2, METR-LA, PEMS-BAY) demonstrate that CVF-Net consistently outperforms other recent state-of-the-art methods, reducing forecasting error (MAE) by 9.3% and increasing tracking continuity (IDF1) by 7.0%. Ablation studies suggest that hierarchical fusion and temporal modeling improve accuracy and stability, while sensitivity analyses show that attention maps capture congestion and causal temporal patterns, which are real symptoms of congestion. The model also shows strong cross-dataset generalizability and robustness to sensor noise, which extends its performance in the real world. Unlike existing spatio-temporal GNNs and multimodal Transformers that rely on flat feature aggregation or implicitly assume cross-view alignment, the proposed framework introduces a hierarchical, alignment-aware fusion strategy that explicitly integrates aerial visual context with graph-temporal traffic dynamics.



    加载中


    [1] M. Akhtar, S. Moridpour, A review of traffic congestion prediction using artificial intelligence, J. Adv. Transport., 2021 (2021), 8878011. https://doi.org/10.1155/2021/8878011 doi: 10.1155/2021/8878011
    [2] L. Kessler, F. Rempe, K. Bogenberger, Multi-sensor data fusion for accurate traffic speed and travel time reconstruction, Front. Future Transp., 2 (2021), 766951. https://doi.org/10.3389/ffutr.2021.766951 doi: 10.3389/ffutr.2021.766951
    [3] A. Sheehan, A. Beddows, D. C. Green, S. Beevers, City scale traffic monitoring using worldview satellite imagery and deep learning: a case study of barcelona, Remote Sens., 15 (2023), 5709. https://doi.org/10.3390/rs15245709 doi: 10.3390/rs15245709
    [4] J. Ye, J. Zhao, K. Ye, C. Xu, How to build a graph-based deep learning architecture in traffic domain: a survey, IEEE Trans. Intell. Transp. Syst., 23 (2020), 3904–3924. https://doi.org/10.1109/TITS.2020.3043250 doi: 10.1109/TITS.2020.3043250
    [5] X. Ma, Z. Dai, Z. He, J. Ma, Y. Wang, Y. Wang, Learning traffic as images: a deep convolutional neural network for large-scale transportation network speed prediction, Sensors, 17 (2017), 818. https://doi.org/10.3390/s17040818 doi: 10.3390/s17040818
    [6] N. Kumar, M. Raubal, Applications of deep learning in congestion detection, prediction and alleviation: a survey, Transp. Res. Part C: Emerg. Technol., 133 (2021), 103432. https://doi.org/10.1016/j.trc.2021.103432 doi: 10.1016/j.trc.2021.103432
    [7] J. Qiu, Y. Zhao, Traffic prediction with data fusion and machine learning, Analytics, 4 (2025), 12. https://doi.org/10.3390/analytics4020012 doi: 10.3390/analytics4020012
    [8] J. Gitahi, M. Hahn, M. Storz, C. Bernhard, M. Feldges, R. Nordentoft, Multi-sensor traffic data fusion for congestion detection and tracking, Int. Arch. Photogramm., Remote Sens. Spat. Inf. Sci., 43 (2020), 173–180. https://doi.org/10.5194/isprs-archives-XLIII-B1-2020-173-2020 doi: 10.5194/isprs-archives-XLIII-B1-2020-173-2020
    [9] M. Deng, K. Chen, K. Lei, Y. Chen, Y. Shi, Mvcv-traffic: multiview road traffic state estimation via cross-view learning, Int. J. Geogr. Inf. Sci., 37 (2023), 2205–2237. https://doi.org/10.1080/13658816.2023.2249968 doi: 10.1080/13658816.2023.2249968
    [10] Y. Alotaibi, K. Nagappan, T. Thanarajan, S. Rajendran, Optimal deep learning based vehicle detection and classification using chaotic equilibrium optimization algorithm in remote sensing imagery, Sci. Rep., 15 (2025), 17921. https://doi.org/10.1038/s41598-025-02491-0 doi: 10.1038/s41598-025-02491-0
    [11] G. Mujtaba, A. Jalal, Remote sensing based traffic monitoring via semantic segmentation and deep learning, 2024 26th International Multi-Topic Conference (INMIC), 2024, 1–6. https://doi.org/10.1109/INMIC64792.2024.11004336
    [12] X. Lu, Q. Weng, Deep learning-based road extraction from remote sensing imagery: progress, problems, and perspectives, ISPRS J. Photogramm. Remote Sens., 228 (2025), 122–140. https://doi.org/10.1016/j.isprsjprs.2025.07.013 doi: 10.1016/j.isprsjprs.2025.07.013
    [13] D. Chakraborty, D. Dutta, C. S. Jha, Remote sensing and deep learning for traffic density assessment, In: C. S. Jha, A. Pandey, V. Chowdary, V. Singh, Geospatial technologies for resources planning and management, Water Science and Technology Library, Springer, 2022,611–630.
    [14] W. Jiang, J. Luo, Graph neural network for traffic forecasting: a survey, Expert Syst. Appl., 207 (2022), 117921. https://doi.org/10.1016/j.eswa.2022.117921 doi: 10.1016/j.eswa.2022.117921
    [15] L. Zhao, Y. Song, C. Zhang, Y. Liu, P. Wang, T. Lin, et al., T-gcn: a temporal graph convolutional network for traffic prediction, IEEE Trans. Intell. Transp. Syst., 21 (2020), 3848–3858. https://doi.org/10.1109/TITS.2019.2935152 doi: 10.1109/TITS.2019.2935152
    [16] B. Yu, H. Yin, Z. Zhu, Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting, arXiv Preprint, 2017. https://doi.org/10.48550/arXiv.1709.04875
    [17] S. Guo, Y. Lin, N. Feng, C. Song, H. Wan, Attention based spatial-temporal graph convolutional networks for traffic flow forecasting, Proceedings of the AAAI Conference on Artificial Intelligence, 33 (2019), 922–929. https://doi.org/10.1609/aaai.v33i01.3301922 doi: 10.1609/aaai.v33i01.3301922
    [18] Z. Shao, Z. Zhang, W. Wei, F. Wang, Y. Xu, X. Cao, et al., Decoupled dynamic spatial-temporal graph neural network for traffic forecasting, arXiv Preprint, 2022. https://doi.org/10.48550/arXiv.2206.09112
    [19] Y. Li, W. Peng, J. Chen, H. Xu, Dynamic spatio-temporal attention-based graph neural network using ordinary differential equation and multi-scale semantics for traffic prediction, IEEE Trans. Intell. Transp. Syst., 26 (2025), 19862–19875. https://doi.org/10.1109/TITS.2025.3612204 doi: 10.1109/TITS.2025.3612204
    [20] J. Shao, S. Li, K. Zhang, A. Wang, M. Li, Cross-city traffic prediction based on deep domain adaptive transfer learning, Transp. Res. Part C: Emerg. Technol., 176 (2025), 105152. https://doi.org/10.1016/j.trc.2025.105152 doi: 10.1016/j.trc.2025.105152
    [21] S. Afandizadeh, S. Abdolahi, H. Mirzahossein, Deep learning algorithms for traffic forecasting: a comprehensive review and comparison with classical ones, J. Adv. Transp., 2024 (2024), 9981657. https://doi.org/10.1155/2024/9981657 doi: 10.1155/2024/9981657
    [22] X. Luo, C. Zhu, D. Zhang, Q. Li, Stg4traffic: a survey and benchmark of spatial-temporal graph neural networks for traffic prediction, arXiv Preprint, 2023. https://doi.org/10.48550/arXiv.2307.00495
    [23] D. Zhang, F. Wang, L. Ning, Z. Zhao, J. Gao, X. Li, Integrating sam with feature interaction for remote sensing change detection, IEEE Trans. Geosci. Remote Sens., 62 (2024), 4513011. https://doi.org/10.1109/TGRS.2024.3483775 doi: 10.1109/TGRS.2024.3483775
    [24] J. Gao, D. Zhang, F. Wang, L. Ning, Z. Zhao, X. Li, Combining SAM with limited data for change detection in remote sensing, IEEE Trans. Geosci. Remote Sens., 63 (2025), 5614311. https://doi.org/10.1109/TGRS.2025.3545040 doi: 10.1109/TGRS.2025.3545040
    [25] Z. Tang, M. Naphade, M. Y. Liu, X. Yang, S. Birchfield, S. Wang, et al., Cityflow: a city-scale benchmark for multi-target multi-camera vehicle tracking and re-identification, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. https://doi.org/10.1109/CVPR.2019.00900
    [26] W. Chen, L. Chen, Y. Xie, W. Cao, Y. Gao, X. Feng, Multi-range attentive bicomponent graph convolutional network for traffic forecasting, Proceedings of the AAAI conference on artificial intelligence, 34 (2020), 3529–3536. https://doi.org/10.1609/aaai.v34i04.5758 doi: 10.1609/aaai.v34i04.5758
    [27] Y. Li, R. Yu, C. Shahabi, Y. Liu, Diffusion convolutional recurrent neural network: data-driven traffic forecasting, arXiv Preprint, 2017. https://doi.org/10.48550/arXiv.1707.01926
    [28] M. Naphade, S. Wang, D. C. Anastasiu, Z. Tang, M. C. Chang, Y. Yao, et al., The 7th AI city challenge, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2023, 5538–5548. https://doi.org/10.1109/CVPRW59228.2023.00586
    [29] H. M. Hsu, Y. Wang, J. Cai, J. N. Hwang, Multi-target multi-camera tracking of vehicles by graph auto-encoder and self-supervised camera link model, 2022 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW), 2022,489–499. https://doi.org/10.1109/WACVW54805.2022.00055
    [30] J. Ye, X. Yang, S. Kang, Y. He, W. Zhang, L. Huang, et al., A robust mtmc tracking system for ai-city challenge 2021, 021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2021, 4039–4048. https://doi.org/10.1109/CVPRW53098.2021.00456
    [31] N. Parashuram, K. Vijayalakshmi, Anchor-aware graph neural network with temporal attention for accurate traffic flow forecasting, Int. J. Intell. Eng. Syst., 18 (2025), 283–297.
    [32] R. Kumar, J. M. Moreira, J. Chandra, Dygcn-lstm: a dynamic gcn-lstm based encoder-decoder framework for multistep traffic prediction, Appl. Intell., 53 (2023), 25388–25411. https://doi.org/10.1007/s10489-023-04871-3 doi: 10.1007/s10489-023-04871-3
    [33] S. Shleifer, C. McCreery, V. Chitters, Incrementally improving graph wavenet performance on traffic prediction, arXiv Preprint, 2019. https://doi.org/10.48550/arXiv.1912.07390
    [34] L. Xiong, X. Yuan, Z. Hu, X. Huang, P. Huang, Gated fusion adaptive graph neural network for urban road traffic flow prediction, Neural Process. Lett., 56 (2024), 9. https://doi.org/10.1007/s11063-024-11479-2 doi: 10.1007/s11063-024-11479-2
  • Reader Comments
  • © 2026 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(84) PDF downloads(10) Cited by(0)

Article outline

Figures and Tables

Figures(21)  /  Tables(20)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog