Research article

A transfer deep residual shrinkage network for bird sound recognition

  • Published: 04 July 2025
  • Bird sound recognition has important applications in bird monitoring and ecological protection. However, in complicated environments, noise and insufficient sample data are the major factors affecting recognition accuracy. We proposed a bird sound recognition method based on a developed transfer deep residual shrinkage network. First, a deep residual shrinkage network with noise resistance was constructed based on the structural characteristics of the residual shrinkage module, multi-scale operations, and the characteristics of bird sound Mel spectrograms. Then, the deep residual shrinkage network was pre-trained using a bird sound dataset, applying an unfreezing fine-tuning strategy, to mitigate the impact of insufficient training data. A transfer learning alleviated the problem of data scarcity by utilizing pre-trained models, while the deep residual shrinkage network enhanced the performance of the model in a noisy environment by optimizing the network structure. Experimental results showed that this method achieves high recognition accuracy under noise and small data sets. It has advantages over the compared methods and is suitable for ecological monitoring fields such as bird population monitoring. The method has good application prospects.

    Citation: Xiao Chen, Zhaoyou Zeng, Tong Xu. A transfer deep residual shrinkage network for bird sound recognition[J]. Electronic Research Archive, 2025, 33(7): 4135-4150. doi: 10.3934/era.2025185

    Related Papers:

  • Bird sound recognition has important applications in bird monitoring and ecological protection. However, in complicated environments, noise and insufficient sample data are the major factors affecting recognition accuracy. We proposed a bird sound recognition method based on a developed transfer deep residual shrinkage network. First, a deep residual shrinkage network with noise resistance was constructed based on the structural characteristics of the residual shrinkage module, multi-scale operations, and the characteristics of bird sound Mel spectrograms. Then, the deep residual shrinkage network was pre-trained using a bird sound dataset, applying an unfreezing fine-tuning strategy, to mitigate the impact of insufficient training data. A transfer learning alleviated the problem of data scarcity by utilizing pre-trained models, while the deep residual shrinkage network enhanced the performance of the model in a noisy environment by optimizing the network structure. Experimental results showed that this method achieves high recognition accuracy under noise and small data sets. It has advantages over the compared methods and is suitable for ecological monitoring fields such as bird population monitoring. The method has good application prospects.



    加载中


    [1] X. Chen, Z. Zeng, Bird sound recognition based on adaptive frequency cepstral coefficient and improved support vector machine using a hunter-prey optimizer, Math. Biosci. Eng., 20 (2023), 19438–19453. https://doi.org/10.3934/mbe.2023860 doi: 10.3934/mbe.2023860
    [2] A. Gil-Tena, S. Saura, L. Brotons, Effects of forest composition and structure on bird species richness in a Mediterranean context: implications for forest ecosystem management, For. Ecol. Manage., 242 (2007), 470–476. https://doi.org/10.1016/j.foreco.2007.01.080 doi: 10.1016/j.foreco.2007.01.080
    [3] X. Chen, R. Jing, C. Sun, Attention mechanism feedback network for image super-resolution, J. Electron. Imaging, 31 (2022), 043006. https://doi.org/10.1117/1.JEI.31.4.043006 doi: 10.1117/1.JEI.31.4.043006
    [4] X. Chen, J. Zhu, Land scene classification for remote sensing images with an improved capsule network, J. Appl. Remote Sens., 16 (2022), 026510. https://doi.org/10.1117/1.JRS.16.026510 doi: 10.1117/1.JRS.16.026510
    [5] A. Krizhevsky, I. Sutskever, G. Hinton, ImageNet classification with deep convolutional neural networks, Commun. ACM, 60 (2017), 84–90. https://doi.org/10.1145/3065386 doi: 10.1145/3065386
    [6] X. Chen, C. Sun, Multiscale recursive feedback network for image super-resolution, IEEE Access, 10 (2022), 6393–6406. https://doi.org/10.1109/ACCESS.2022.3142510 doi: 10.1109/ACCESS.2022.3142510
    [7] E. Sprengel, M. Jaggi, Y. Kilcher, T. Hofmann, Audio based bird species identification using deep learning techniques, LifeCLEF, 2016 (2016), 547–559.
    [8] R. Rajan, A. Noumida, Multi-label bird species classification using transfer learning, in 2021 International Conference on Communication, Control and Information Sciences, Idukki, India, (2021), 1–5. https://doi.org/10.1109/ICCISc52257.2021.9484858
    [9] A. Saad, J. Ahmed, A. Elaraby, Classification of bird sound using high-and low-complexity convolutional neural networks, Trait. Signal, 39 (2022), 187–193. https://doi.org/10.18280/ts.390119 doi: 10.18280/ts.390119
    [10] S. Chen, Y. Li, Application of random forest classifier combining time frequency texture features in bird sound recognition, Comput. Appl. Software, 31 (2014), 154–157. https://doi.org/10.3969/j.issn.1000-386x.2014.01.040 doi: 10.3969/j.issn.1000-386x.2014.01.040
    [11] W. Zhang, H. Sun, B. Zhou, TBRAFusion: Infrared and visible image fusion based on two-branch residual attention Transformer, Electron. Res. Arch., 33 (2025), 158–180. https://doi.org/10.3934/era.2025009 doi: 10.3934/era.2025009
    [12] M. Sun, A vision sensing-based automatic evaluation method for teaching effect based on deep residual network, Math. Biosci. Eng., 20 (2023), 6358–6373. https://doi.org/10.3934/mbe.2023275 doi: 10.3934/mbe.2023275
    [13] X. Chen, Y. Gao, C. Wang, Fractional derivative method to reduce noise and improve SNR for Lamb wave signals, J. Vibroeng., 17 (2015), 4211–4218.
    [14] X. Chen, C. Wang, Noise removing for Lamb wave signals by fractional differential, J. Vibroeng., 16 (2014), 2676–2684.
    [15] X. Chen, J. Li, Noise reduction for ultrasonic Lamb wave signals by empirical mode decomposition and wavelet transform, J. Vibroeng., 15 (2013), 1157–1165.
    [16] X. Chen, C. Wang, Noise suppression for Lamb wave signals by Tsallis mode and fractional-order differential, Acta Phys. Sin., 63 (2014), 184301. https://doi.org/10.7498/aps.63.184301 doi: 10.7498/aps.63.184301
    [17] X. Chen, C. Wang, Tsallis distribution-based fractional derivative method for Lamb wave signal recovery, Res. Nondestr. Eval., 26 (2015), 174–188. https://doi.org/10.1080/09349847.2015.1023913 doi: 10.1080/09349847.2015.1023913
    [18] L. Ni, X. Chen, Mode separation for multimode Lamb waves based on dispersion compensation and fractional differential, Acta Phys. Sin., 67 (2018), 406–415. https://doi.org/10.7498/aps.67.20180561 doi: 10.7498/aps.67.20180561
    [19] X. Hu, Q. Yu, H. Yu, An ECG denoising method combining variational modal decomposition and wavelet soft threshold, Concurrency Comput. Pract. Exper., (2022), e7048.
    [20] X. Chen, Y. Gao, L. Bao, Lamb wave signal retrieval by wavelet ridge, J. Vibroeng., 16 (2014), 464–476.
    [21] V. Nair, G. Hinton, Rectified linear units improve restricted Boltzmann machines, in Proceedings of the 27th International Conference on Machine Learning, (2010), 807–814.
    [22] R. Zhao, K. Mao, Fuzzy bag-of-words model for document representation, IEEE Trans. Fuzzy Syst., 26 (2017), 794–804. https://doi.org/10.1109/TFUZZ.2017.2690222 doi: 10.1109/TFUZZ.2017.2690222
    [23] J. Xing, Y. Wu, D. Huang, X. Liu, Transfer learning for robust urban network-wide traffic volume estimation with uncertain detector deployment scheme, Electron. Res. Arch., 31 (2023), 207–228. https://doi.org/10.3934/era.2023011 doi: 10.3934/era.2023011
    [24] J. Liu, Y. Zhang, D. Lv, J. Lu, S. Xie, J. Zi, et al., Birdsong classification based on ensemble multi-scale convolutional neural network, Sci. Rep., 12 (2022), 8636. https://doi.org/10.1038/s41598-022-12121-8 doi: 10.1038/s41598-022-12121-8
    [25] B. Chandu, A. Munikoti, K. Murthy, G. Murthy V., C. Nagaraj, Automated bird species identification using audio signal processing and neural networks, in 2020 International Conference on Artificial Intelligence and Signal Processing, IEEE, (2020), 1–5. https://doi.org/10.1109/AISP48273.2020.9073584
    [26] X. Chen, R. Jing, Video super resolution based on deformable 3D convolutional group fusion, Sci. Rep., 15 (2025), 9050. https://doi.org/10.1038/s41598-025-93758-z doi: 10.1038/s41598-025-93758-z
    [27] X. Chen, W. Zhan, Effect of transducer shadowing of ultrasonic anemometers on wind velocity measurement, IEEE Sens. J., 21 (2021), 4731–4738. https://doi.org/10.1109/JSEN.2020.3030634 doi: 10.1109/JSEN.2020.3030634
  • Reader Comments
  • © 2025 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(779) PDF downloads(44) Cited by(0)

Article outline

Figures and Tables

Figures(7)  /  Tables(7)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog