How to construct low-altitude aerial image datasets for deep learning

Xin Shu; Xin Cheng; Shubin Xu; Yunfang Chen; Tinghuai Ma; Wei Zhang; Xin Shu; Xin Cheng; Shubin Xu; Yunfang Chen; Tinghuai Ma; Wei Zhang

doi:10.3934/mbe.2021053

Mathematical Biosciences and Engineering

2021, Volume 18, Issue 2: 986-999. doi: 10.3934/mbe.2021053

Previous Article Next Article

Research article Special Issues

How to construct low-altitude aerial image datasets for deep learning

1.
School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
2.
Cyberspace Security Research Institute, China Electronics Technology Group Corporation, Xiong'an New Area 071000, China
3.
School of Computer & Software, Nanjing University of information science & Technology, Nanjing 210044, China
4.
Jiangsu Key Laboratory of Big Data Security and Intelligent Processing, Nanjing University of Posts and Telecommunications, Nanjing 210023, China

Received: 24 October 2020 Accepted: 21 December 2020 Published: 05 January 2021

The combination of Unmanned Aerial Vehicle (UAV) technologies and computer vision makes UAV applications more and more popular. Computer vision tasks based on deep learning usually require a large amount of task-related data to train algorithms for specific tasks. Since the commonly used datasets are not designed for specific scenarios, in order to give UAVs stronger computer vision capabilities, large enough aerial image datasets are needed to be collected to meet the training requirements. In this paper, we take low-altitude aerial image object detection as an example to propose a framework to demonstrate how to construct datasets for specific tasks. Firstly, we introduce the existing low-altitude aerial images datasets and analyze the characteristics of low-altitude aerial images. On this basis, we put forward some suggestions on data collection of low-altitude aerial images. Then, we recommend several commonly used image annotation tools and crowdsourcing platforms for data annotation to generate labeled data for model training. In addition, in order to make up the shortage of data, we introduce data augmentation techniques, including traditional data augmentation and data augmentation based on oversampling and generative adversarial networks.
- UAVs,
- aerial image,
- datasets,
- deep learning,
- data augmentation
Citation: Xin Shu, Xin Cheng, Shubin Xu, Yunfang Chen, Tinghuai Ma, Wei Zhang. How to construct low-altitude aerial image datasets for deep learning[J]. Mathematical Biosciences and Engineering, 2021, 18(2): 986-999. doi: 10.3934/mbe.2021053

Related Papers:

Abstract

The combination of Unmanned Aerial Vehicle (UAV) technologies and computer vision makes UAV applications more and more popular. Computer vision tasks based on deep learning usually require a large amount of task-related data to train algorithms for specific tasks. Since the commonly used datasets are not designed for specific scenarios, in order to give UAVs stronger computer vision capabilities, large enough aerial image datasets are needed to be collected to meet the training requirements. In this paper, we take low-altitude aerial image object detection as an example to propose a framework to demonstrate how to construct datasets for specific tasks. Firstly, we introduce the existing low-altitude aerial images datasets and analyze the characteristics of low-altitude aerial images. On this basis, we put forward some suggestions on data collection of low-altitude aerial images. Then, we recommend several commonly used image annotation tools and crowdsourcing platforms for data annotation to generate labeled data for model training. In addition, in order to make up the shortage of data, we introduce data augmentation techniques, including traditional data augmentation and data augmentation based on oversampling and generative adversarial networks.

References

[1]	J. M. Peña, A. I. Castro, J. Torres–Sánchez, D. Andújar, C. S. Martín, J. Dorado, et al., Estimating tree height and biomass of a poplar plantation with image-based UAV technology, AIMS Agric. Food, 3 (2018), 313–326. doi: 10.3934/agrfood.2018.3.313
[2]	S. Chen, Y. Zhang, Y. Zhang, J. Yu, Y. Zhu, Embedded system for road damage detection by deep convolutional neural network, Math. Biosci. Eng., 16 (2019), 7982–7994. doi: 10.3934/mbe.2019402
[3]	M. Everingham, L. V. Gool, C. K. Williams, J. Winn, A. Zisserman, The pascal visual object classes (voc) challenge, Int. J. Comput. Vision, 88 (2010), 303–338. doi: 10.1007/s11263-009-0275-4
[4]	J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, L. Fei, ImageNet: A large–scale hierarchical image database, in 2009 IEEE Conference on Computer Vision and Pattern Recognition, (2009), 248–255.
[5]	T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, et al., Microsoft coco: Common objects in context, in European Conference on Computer Vision, Springer, (2014), 740–755.
[6]	D. Du, Y. Qi, H. Yu, Y. Yang, K. Duan, G. Li, et al., The unmanned aerial vehicle benchmark: Object detection and tracking, in European Conference on Computer Vision, Springer, (2018), 375–391.
[7]	P. Zhu, L. Wen, X. Bian, H. Ling, Q. Hu, Vision meets drones: A challenge, preprint, arXiv: 1804.07437.
[8]	A. Robicquet, A. Sadeghian, A. Alahi, S. Savarese, Learning social etiquette: Human trajectory understanding in crowded scenes, in European Conference on Computer Vision, Springer, (2016), 549–565.
[9]	M. Mueller, N. Smith, B. Ghanem, A benchmark and simulator for UAV tracking, in European Conference on Computer Vision, Springer, (2016), 445–461.
[10]	M. Barekatain, M. Marti, H. Shih, S. Murray, K. Nakayama, Y. Matsuo, et al., Okutama-action: An aerial view video dataset for concurrent human action detection, in 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, (2017), 2153–2160.
[11]	S. Oh, A. Hoogs, A. Perera, N. Cuntoor, C. Chen, J. T. Lee, et al., A large–scale benchmark dataset for event recognition in surveillance video, in 2011 IEEE Conference on Computer Vision and Pattern Recognition, (2011), 3153–3160.
[12]	T. Shu, D. Xie, B. Rothrock, S. Todorovic, S. C. Zhu, Joint inference of groups, events and human roles in aerial videos, in 2015 IEEE Conference on Computer Vision and Pattern Recognition, (2015), 4576–4584.
[13]	M. Bonetto, P. Korshunov, G. Ramponi, T. Ebrahimi, Privacy in mini–drone based video surveillance, in 2015 IEEE International Conference on Automatic Face Gesture Recognition, (2015), 1–6.
[14]	M. Hsieh, Y. Lin, W. H. Hsu, Drone–based object counting by spatially regularized regional proposal network, in 2017 IEEE International Conference on Computer Vision, (2017), 4165–4173.
[15]	F. Kamran, M. Shahzad, F. Shafait, Automated military vehicle detection from low-altitude aerial images, in 2018 Digital Image Computing: Techniques and Applications, (2018), 1–8.
[16]	X. Xu, X. Zhang, B. Yu, X. S. Hu, C. Rowen, J. Hu, et al., DAC-SDC low power object detection challenge for UAV applications, preprint, arXiv: 1809.00110.
[17]	C. Vondrick, D. Patterson, D. Ramanan, Efficiently scaling up crowdsourced video annotation, Int. J. Comput. Vision, 101 (2013), 184–204. doi: 10.1007/s11263-012-0564-1
[18]	C. Gu, C. Sun, D. A. Ross, C. Vondrick, C. Pantofaru, Y. Li, et al., Ava: A video dataset of spatio–temporally localized atomic visual actions, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2018), 6047–6056.
[19]	M. Kisantal, Z. Wojna, J. Murawski, J. Naruniec, K. Cho, Augmentation for small object detection, in 9th International Conference on Advances in Computing and Information Technology, 2019.
[20]	W. Liu, L. Cheng, D. Meng, Brain slices microscopic detection using simplified SSD with Cycle–GAN data augmentation, in International Conference on Neural Information Processing, Springer, (2018), 454–463.
[21]	C. Shorten, T. M. Khoshgoftaar, A survey on image data augmentation for deep learning, J. Big Data, 6 (2019), 1–48. doi: 10.1186/s40537-018-0162-3
[22]	K. Chatfield, K. Simonyan, A. Vedaldi, A. Zisserman, Return of the devil in the details: Delving deep into convolutional nets, preprint, arXiv: 1405.3531.
[23]	R. Mash, B. Borghetti, J. Pecarina, Improved aircraft recognition for aerial refueling through data augmentation in convolutional neural networks, in International Symposium on Visual Computing, Springer, (2016), 113–122.
[24]	L. Taylor, G. Nitschke, Improving deep learning using generic data augmentation, preprint, arXiv: 1708.06020.
[25]	F. J. Morenobarea, F. Strazzera, J. M. Jerez, D. Urda, L. Franco, Forward noise adjustment scheme for data augmentation, in 2018 IEEE Symposium Series on Computational Intelligence, (2018), 728–734.
[26]	L. Hu, The Quest for Machine Learning, 1st edition, Posts and Telecommunications Press, Beijing, 2018.
[27]	N. V. Chawla, K. W. Bowyer, L. O. Hall, W. P. Kegelmeyer, SMOTE: synthetic minority over–sampling technique, J. Artif. Intell. Res., 16 (2002), 321–357. doi: 10.1613/jair.953
[28]	H. Inoue, Data augmentation by pairing samples for images classification, preprint, arXiv: 1801.02929.
[29]	H. Zhang, M. Cisse, Y. N. Dauphin, D. Lopez-Paz, Mixup: Beyond empirical risk minimization, preprint, arXiv: 1710.09412.
[30]	I. Goodfellow, J. Pougetabadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, et al., Generative adversarial nets, Adv. Neural Inf. Process. Syst., 27 (2014), 2672–2680.
[31]	U. Shaham, Y. Yamada, S. Negahban, Conditional generative adversarial nets, preprint, arXiv: 1411.1784.
[32]	J. Zhu, T. Park, P. Isola, A. A. Efros, Unpaired image-to-image translation using cycle-consistent adversarial networks, in 2017 IEEE International Conference on Computer Vision, (2017), 2242–2251.
[33]	T. Karras, S. Laine, T. Aila, A style-based generator architecture for generative adversarial networks, in 2019 IEEE Conference on Computer Vision and Pattern Recognition, (2019), 4401–4410.
[34]	W. Jiang, N. Ying, Improve object detection by data enhancement based on generative adversarial nets, preprint, arXiv: 1903.01716.

Reader Comments

Your name:*

Email:*
© 2021 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)