Research article Special Issues

Introducing attention shortcuts in convolutional neural networks

  • Received: 01 March 2025 Revised: 25 April 2025 Accepted: 09 May 2025 Published: 19 August 2025
  • It is a proven fact that nowadays, thanks to convolutional neural networks that implement skip connection mechanisms, we can train increasingly deeper, more accurate, and efficient networks. These networks successfully address the degradation problem previously experienced in very deep networks. Among the most popular are ResNets and DenseNets. ResNets introduce skip connections using summation, while DenseNets employ concatenation. The summation mechanism in ResNets can limit the adaptation of prior information to the specific needs of each layer. In contrast, the DenseNet concatenation mechanism can become computationally expensive as convolutional blocks attempt to process all accumulated prior information. Therefore, in this paper, we proposed a new attention-based skip connection mechanism: Attention Shortcuts. This mechanism allows convolutional blocks to process the most relevant prior information, reducing computational burden. We conducted a preliminary experimental study adapting the proposed mechanism to the ResNet-50 backbone and compared its performance to the original ResNet.

    Citation: David Erroz, Mikel Galar. Introducing attention shortcuts in convolutional neural networks[J]. Electronic Research Archive, 2025, 33(8): 4785-4798. doi: 10.3934/era.2025215

    Related Papers:

  • It is a proven fact that nowadays, thanks to convolutional neural networks that implement skip connection mechanisms, we can train increasingly deeper, more accurate, and efficient networks. These networks successfully address the degradation problem previously experienced in very deep networks. Among the most popular are ResNets and DenseNets. ResNets introduce skip connections using summation, while DenseNets employ concatenation. The summation mechanism in ResNets can limit the adaptation of prior information to the specific needs of each layer. In contrast, the DenseNet concatenation mechanism can become computationally expensive as convolutional blocks attempt to process all accumulated prior information. Therefore, in this paper, we proposed a new attention-based skip connection mechanism: Attention Shortcuts. This mechanism allows convolutional blocks to process the most relevant prior information, reducing computational burden. We conducted a preliminary experimental study adapting the proposed mechanism to the ResNet-50 backbone and compared its performance to the original ResNet.



    加载中


    [1] C. Zhang, F. Rameau, J. Kim, D. M. Argaw, J. Bazin, I. S. Kweon, DeepPTZ: Deep self-calibration for PTZ cameras, in 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), (2020), 1030–1038. https://doi.org/10.1109/WACV45572.2020.9093629
    [2] A. Esteva, K. Chou, S. Yeung, N. Naik, A. Madani, A. Mottaghi, et al., Deep learning-enabled medical computer vision, npj Digital Med., 4 (2021), 5. https://doi.org/10.1038/s41746-020-00376-2 doi: 10.1038/s41746-020-00376-2
    [3] Y. Feng, B. Chen, T. Dai, S. Xia, Adversarial attack on deep product quantization network for image retrieval, in Proceedings of the AAAI conference on Artificial Intelligence, 34 (2020), 10786–10793. https://doi.org/10.1609/aaai.v34i07.6708
    [4] Z. Zou, K. Chen, Z. Shi, Y. Guo, J. Ye, Object detection in 20 years: A survey, in Proceedings of the IEEE, 111 (2023), 257–276. https://doi.org/10.1109/JPROC.2023.3238524
    [5] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, et al., An image is worth $16\times16$ words: Transformers for image recognition at scale, preprint, arXiv: 2010.11929.
    [6] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, et al., Backpropagation applied to handwritten zip code recognition, Neural Comput., 1 (1989), 541–551. https://doi.org/10.1162/neco.1989.1.4.541 doi: 10.1162/neco.1989.1.4.541
    [7] A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with deep convolutional neural networks, Commun. ACM, 60 (2017), 84–90. https://doi.org/10.1145/3065386 doi: 10.1145/3065386
    [8] K. Simonya, A. Zisserman, Very deep convolutional networks for large-scale image recognition, preprint, arXiv: 1409.1556.
    [9] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 770–778. https://doi.org/10.1109/CVPR.2016.90
    [10] G. Huang, Z. Liu, L. Van Der Maaten, K. Q. Weinberger, Densely connected convolutional networks, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 2261–2269. https://doi.org/10.1109/CVPR.2017.243
    [11] M. D. Zeiler, R. Fergus, Visualizing and understanding convolutional networks, in Computer Vision-ECCV 2014, 8689 (2014), 818–833. https://doi.org/10.1007/978-3-319-10590-1_53
    [12] X. Glorot, Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, (2010), 249–256.
    [13] Y. Bengio, P. Simard, P. Frasconi, Learning long-term dependencies with gradient descent is difficult, IEEE Trans. Neural Networks, 5 (1994), 157–166. https://doi.org/10.1109/72.279181 doi: 10.1109/72.279181
    [14] S. Ioffe, C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, in Proceedings of the 32nd International Conference on Machine Learning, (2015), 448–456.
    [15] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, et al., Going deeper with convolutions, in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2015), 1–9. https://doi.org/10.1109/CVPR.2015.7298594
    [16] T. Raiko, H. Valpola, Y. LeCun, Deep learning made easier by linear transformations in perceptrons, in Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, (2012), 924–932.
    [17] R. K. Srivastava, K. Greff, J. Schmidhuber, Highway networks, preprint, arXiv: 1505.00387.
    [18] Q. Liao, T. Poggio, Bridging the gaps between residual learning, recurrent neural networks and visual cortex, preprint, arXiv: 1604.03640.
    [19] D. Bahdanau, K. Cho, Y. Bengio, Neural machine translation by jointly learning to align and translate, preprint, arXiv: 1409.0473.
    [20] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, et al., Attention is all you need, in Proceedings of the 31st International Conference on Neural Information Processing Systems, (2017), 6000–6010.
    [21] F. Wang, M. Jiang, C. Qian, S. Yang, C. Li, H. Zhang, et al., Residual attention network for image classification, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 6450–6458. https://doi.org/10.1109/CVPR.2017.683
    [22] M. Jaderberg, K. Simonyan, A. Zisserman, K. Kavukcuoglu, Spatial transformer networks, in Proceedings of the 29th International Conference on Neural Information Processing Systems, 2 (2015), 2017–2025.
    [23] H. Zhang, I. Goodfellow, D. Metaxas, A. Odena, Self-attention generative adversarial networks, in Proceedings of the 36th International Conference on Machine Learning, (2019), 7354–7363.
    [24] J. Deng, W. Dong, R. Socher, L. Li, K. Li, L. Fei-Fei, ImageNet: A large-scale hierarchical image database, in 2009 IEEE Conference on Computer Vision and Pattern Recognition, (2009), 248–255. https://doi.org/10.1109/CVPR.2009.5206848
    [25] A. Krizhevsky, Learning Multiple Layers of Features from Tiny Images, 2009. Available from: https://www.cs.toronto.edu/kriz/learning-features-2009-TR.pdf
    [26] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the inception architecture for computer vision, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 2818–2826. https://doi.org/10.1109/CVPR.2016.308
    [27] C. M. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, 1995. https://doi.org/10.1093/oso/9780198538493.001.0001
    [28] W. N. Venables, B. D. Ripley, Modern applied statistics with S, $4{th}$ edition, Springer, 2002.
    [29] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Comput., 9 (1997), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735 doi: 10.1162/neco.1997.9.8.1735
    [30] G. Huang, Y. Sun, Z. Liu, D. Sedr, K. Q. Weinberger, Deep networks with stochastic depth, in Computer Vision-ECCV 2016, 9908 (2016), 646–661. https://doi.org/10.1007/978-3-319-46493-0_39
    [31] G. Larsson, M. Maire, G. Shakhnarovich, Fractalnet: Ultra-deep neural networks without residuals, preprint, arXiv: 1605.07648.
    [32] C. Zhang, P. Benz, D. M. Argaw, S. Lee, J. Kim, F. Rameau, et al., Resnet or densenet? Introducing dense shortcuts to resnet, in 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), (2021), 3549–3558. https://doi.org/10.1109/WACV48630.2021.00359
    [33] Y. Chen, J. Li, H. Xiao, X. Jin, S. Yan, J. Feng, Dual path networks, in Proceedings of the 31st International Conference on Neural Information Processing Systems, (2017), 4470–4478.
    [34] W. Wang, X. Li, T. Lu, J. Yang, Mixed link networks, in Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, (2018), 2819–2825. https://doi.org/10.24963/ijcai.2018/391
    [35] X. Glorot, A. Bordes, Y. Bengio, Deep sparse rectifier neural networks, in Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, (2011), 315–323.
    [36] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, 86 (1998), 2278–2324. https://doi.org/10.1109/5.726791 doi: 10.1109/5.726791
    [37] J. L. Ba, J. R. Kiros, G. E. Hinton, Layer normalization, preprint, arXiv: 1607.06450.
    [38] S. Mascarenhas, M. Agarwal, A comparison between vgg16, vgg19 and resnet50 architecture frameworks for image classification, in 2021 International Conference on Disruptive Technologies for Multi-Disciplinary Research and Applications (CENTCON), (2021), 96–99. https://doi.org/10.1109/CENTCON52345.2021.9687944
    [39] C. Lee, S. Xie, P. Gallagher, Z. Zhang, Z. Tu, Deeply-supervised nets, in Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, (2015), 562–570.
  • Reader Comments
  • © 2025 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(486) PDF downloads(17) Cited by(0)

Article outline

Figures and Tables

Figures(2)  /  Tables(2)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog