Research article

E2-ViTUNet: Enhanced dual-encoder vision transformer UNet for single-image rain removal

  • † The authors contributed equally to this work.
  • Received: 22 June 2025 Revised: 18 July 2025 Accepted: 04 August 2025 Published: 15 August 2025
  • The rapid development of image deraining technology has made it possible to effectively restore images captured even in severe rainy weather, thereby alleviating the decline in image quality. However, existing image deraining solutions typically use overly deep neural networks, which are prone to gradient vanishing and explosion. This often results in the restored image losing essential background details. To address this important issue, this paper proposed an enhanced dual-encoder vision transformer UNet for single-image rain removal. First, a dual-branch heterogeneous encoding structure was formed by integrating two encoders with different architectures. These encoders extract rain streaks by leveraging their distinct characteristics. Specifically, the first encoder was based on a convolutional neural network, termed the residual encoder network, which utilizes the local receptive field capabilities of convolutional kernels to capture both rain streaks and corresponding background details. Concurrently, the second encoder, the parallel vision transformer encoder network, employs self-attention mechanisms to model long-range dependencies and establish global contextual relationships between rain streaks and background details. Subsequently, the rain streak information extracted by the two encoders was fused. The decoder then performs the rain removal task. To preserve spatial consistency and overall image integrity, high-frequency detail refinement was facilitated through residual connections between the encoders and the decoder. Evaluations on synthetic and real-world datasets confirm the algorithm's robustness in processing rain streaks of varied density. The restored images exhibit enhanced visual clarity, improving visibility for computer vision applications.

    Citation: Huirong Fang, Zhixiang Chen, Ziyang Zheng, Hui Wang. E2-ViTUNet: Enhanced dual-encoder vision transformer UNet for single-image rain removal[J]. Electronic Research Archive, 2025, 33(8): 4740-4762. doi: 10.3934/era.2025213

    Related Papers:

  • The rapid development of image deraining technology has made it possible to effectively restore images captured even in severe rainy weather, thereby alleviating the decline in image quality. However, existing image deraining solutions typically use overly deep neural networks, which are prone to gradient vanishing and explosion. This often results in the restored image losing essential background details. To address this important issue, this paper proposed an enhanced dual-encoder vision transformer UNet for single-image rain removal. First, a dual-branch heterogeneous encoding structure was formed by integrating two encoders with different architectures. These encoders extract rain streaks by leveraging their distinct characteristics. Specifically, the first encoder was based on a convolutional neural network, termed the residual encoder network, which utilizes the local receptive field capabilities of convolutional kernels to capture both rain streaks and corresponding background details. Concurrently, the second encoder, the parallel vision transformer encoder network, employs self-attention mechanisms to model long-range dependencies and establish global contextual relationships between rain streaks and background details. Subsequently, the rain streak information extracted by the two encoders was fused. The decoder then performs the rain removal task. To preserve spatial consistency and overall image integrity, high-frequency detail refinement was facilitated through residual connections between the encoders and the decoder. Evaluations on synthetic and real-world datasets confirm the algorithm's robustness in processing rain streaks of varied density. The restored images exhibit enhanced visual clarity, improving visibility for computer vision applications.



    加载中


    [1] H. Que, J. Weng, Y. Fang, K. Hu, H. Wei, Y. Xu, DCD-Net: Image deraining with delta convolution and joint calibration attention, Signal Image Video Process., 19 (2025), 42. https://doi.org/10.1007/s11760-024-03682-4 doi: 10.1007/s11760-024-03682-4
    [2] M. Chen, P. Wang, D. Shang, P. Wang, Cycle-attention-derain: Unsupervised rain removal with CycleGAN, Visual Comput., 39 (2023), 3727–3739. https://doi.org/10.1007/s00371-023-02947-2 doi: 10.1007/s00371-023-02947-2
    [3] S. Sun, S. Fan, Y. F. Wang, Exploiting image structural similarity for single image rain removal, in 2014 IEEE International Conference on Image Processing (ICIP), (2014), 4482–4486. https://doi.org/10.1109/ICIP.2014.7025909
    [4] Y. Luo, Y. Xu, H. Ji, Removing rain from a single image via discriminative sparse coding, in 2015 IEEE International Conference on Computer Vision (ICCV), (2015), 3397–3405. https://doi.org/10.1109/ICCV.2015.388
    [5] Y. Li, R. T. Tan, X. Guo, J. Lu, M. S. Brown, Rain streak removal using layer priors, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 2736–2744. https://doi.org/10.1109/CVPR.2016.299
    [6] Y. Chen, C. T. Hsu, A generalized low-rank appearance model for spatio-temporally correlated rain streaks, in 2013 IEEE International Conference on Computer Vision, (2013), 1968–1975. https://doi.org/10.1109/CVPR.2016.299
    [7] L. Kang, C. Lin, Y. Fu, Automatic single-image-based rain streaks removal via image decomposition, in IEEE Trans. Image Process., 21 (2012), 1742–1755. https://doi.org/10.1109/TIP.2011.2179057
    [8] Z. Zheng, Z. Chen, S. Wang, W. Wang, Dual-attention U-Net and multi-convolution network for single-image rain removal, Visual Comput., 40 (2024), 7637–7649. https://doi.org/10.1007/s00371-023-03198-x doi: 10.1007/s00371-023-03198-x
    [9] Z. Zheng, Z. Chen, S. Wang, W. Wang, H. Wang, Memory-efficient multi-scale residual dense network for single image rain removal, Comput. Vision Image Understanding, 235 (2023), 103766. https://doi.org/10.1016/j.cviu.2023.103766 doi: 10.1016/j.cviu.2023.103766
    [10] Z. Zheng, Z. Chen, W. Wang, M. Huang, H. Wang, Dual parallel multi-scale residual overlay network for single-image rain removal, Signal Image Video Process., 18 (2024), 2413–2428. https://doi.org/10.1007/s11760-023-02917-0 doi: 10.1007/s11760-023-02917-0
    [11] K. Zhang, D. Li, W. Luo, W. Ren, W. Liu, Enhanced spatio-temporal interaction learning for video deraining: Faster and better, IEEE Trans. Pattern Anal. Mach. Intell., 45 (2023), 1287–1293. https://doi.org/10.1109/TPAMI.2022.3148707 doi: 10.1109/TPAMI.2022.3148707
    [12] W. Ran, Y. Yang, H. Lu, Single image rain removal boosting via directional gradient, in 2020 IEEE International Conference on Multimedia and Expo (ICME), (2020), 1–6. https://doi.org/10.1109/ICME46284.2020.9102800
    [13] S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, M. Yang, et al., Multi-stage progressive image restoration, in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2021), 14816–14826. https://doi.org/10.1109/CVPR46437.2021.01458
    [14] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, et al., An image is worth 16$\times$16 words: Transformers for image recognition at scale, preprint, arXiv: 2010.11929.
    [15] I. Sutskever, O. Vinyals, Q. V. Le, Sequence to sequence learning with neural networks, in Proceedings of the 28th International Conference on Neural Information Processing Systems, 2 (2014), 3104–3112.
    [16] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 770–778. https://doi.org/10.1109/CVPR.2016.90
    [17] R. K. Srivastava, K. Greff, J. Schmidhuber, Training very deep networks, in Advances in Neural Information Processing Systems 28 (NIPS 2015), (2015).
    [18] X. Fu, J. Huang, D. Zeng, Y. Huang, X. Ding, J. Paisley, Removing rain from single images via a deep detail network, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 1715–1723. https://doi.org/10.1109/CVPR.2017.186
    [19] X. Li, J. Wu, Z. Lin, H. Liu, H. Zha, Recurrent squeeze-and-excitation context aggregation net for single image deraining, in Computer Vision-ECCV 2018, 11211 (2018), 262–277. https://doi.org/10.1007/978-3-030-01234-2_16
    [20] D. Ren, W. Zuo, Q. Hu, P. Zhu, D. Meng, Progressive image deraining networks: A better and simpler baseline, in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), 3932–3941. https://doi.org/10.1007/978-3-030-01234-2_16
    [21] H. Wang, Q. Xie, Q. Zhao, D. Meng, A model-driven deep neural network for single image rain removal, in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020), 3100–3109. https://doi.org/10.1109/CVPR42600.2020.00317
    [22] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, in Proceedings of the IEEE, 86 (1998), 2278–2324. https://doi.org/10.1109/5.726791
    [23] J. L. Elman, Finding structure in time, Cognit. Sci., 14 (1990), 179–211. https://doi.org/10.1207/s15516709cog1402_1 doi: 10.1207/s15516709cog1402_1
    [24] R. Pascanu, T. Mikolov, Y. Bengio, On the difficulty of training recurrent neural networks, in Proceedings of the 30th International Conference on International Conference on Machine Learning, 28 (2013), 1310–1318.
    [25] S. Woo, J. Park, J. Lee, I. S. Kweon, CBAM: Convolutional block attention module, in Computer Vision-ECCV 2018, (2018), 3–19. https://doi.org/10.1007/978-3-030-01234-2_1
    [26] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, et al., Attention is all you need, in Proceedings of the 31st International Conference on Neural Information Processing Systems, (2017), 6000–6010.
    [27] Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli, Image quality assessment: from error visibility to structural similarity, IEEE Trans. Image Process., 13 (2004), 600–612. https://doi.org/10.1109/TIP.2003.819861 doi: 10.1109/TIP.2003.819861
    [28] W. Yang, R. T. Tan, J. Feng, J. Liu, Z. Guo, S. Yan, Deep joint rain detection and removal from a single image, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 1685–1694. https://doi.org/10.1109/CVPR.2017.183
    [29] Q. Huynh-Thu, M. Ghanbari, Scope of validity of psnr in image/video quality assessment, Electron. Lett., 44 (2008), 800–801. https://doi.org/10.1049/el:20080522 doi: 10.1049/el:20080522
    [30] L. Zhang, L. Zhang, X. Mou, D. Zhang, FSIM: A feature similarity index for image quality assessment, IEEE Trans. Image Process., 20 (2011), 2378–2386. https://doi.org/10.1109/TIP.2011.2109730 doi: 10.1109/TIP.2011.2109730
    [31] J. Pan, S. Liu, D. Sun, J. Zhang, Y. Liu, J. Ren, et al., Learning dual convolutional neural networks for low-level vision, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2018), 3070–3079. https://doi.org/10.1109/CVPR.2018.00324
    [32] H. Zhang, V. M. Patel, Density-aware single image de-raining using a multi-stream dense network, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2018), 695–704. https://doi.org/10.1109/CVPR.2018.00079
    [33] T. Wang, X. Yang, K. Xu, S. Chen, Q. Zhang, R. W. H. Lau, Spatial attentive single-image deraining with a high quality real rain dataset, in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), 12262–12271. https://doi.org/10.1109/CVPR.2019.01255
    [34] Q. Yi, J. Li, Q. Dai, F. Fang, G. Zhang, T. Zeng, Structure-preserving deraining with residue channel prior guidance, in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), (2021), 4218–4227. https://doi.org/10.1109/ICCV48922.2021.00420
    [35] Q. Guo, J. Sun, F. Juefei-Xu, L. Ma, X. Xie, W. Feng, et al., EfficientDeRain: Learning pixel-wise dilation filtering for high-efficiency single-image deraining, in Proceedings of the AAAI Conference on Artificial Intelligence, 35 (2021), 1487–1495. https://doi.org/10.1609/aaai.v35i2.16239
    [36] S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, M. Yang, Restormer: Efficient transformer for high-resolution image restoration, in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2022), 5718–5729. https://doi.org/10.1109/CVPR52688.2022.00564
  • Reader Comments
  • © 2025 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(566) PDF downloads(22) Cited by(0)

Article outline

Figures and Tables

Figures(18)  /  Tables(4)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog