Unmanned aerial vehicles (UAVs) are being increasingly adopted as flexible remote sensing platforms for smart logistics applications, including warehouse inventory, last-mile delivery supervision, traffic flow analysis, port operations, and infrastructure inspection. Despite their advantages, reliable object detection in UAV-based remote sensing imagery remains challenging due to small object sizes, dense object distributions, arbitrary orientations, and complex backgrounds commonly encountered in logistics environments. Although recent YOLO-based detectors have shown promising performance, their effectiveness is often limited in high-resolution aerial scenes and under practical computational constraints imposed by UAV platforms. To address these challenges, this paper proposes DV-YOLO, an enhanced deep learning framework tailored for object detection in UAV-based remote sensing imagery for logistics-oriented applications. The proposed model extends YOLOv9 through a deeper and wider backbone architecture coupled with optimized feature fusion strategies that jointly exploit spatial and semantic representations. A novel cross-path fusion network at deep feature map (CPFNDFM) is introduced to improve the detection of small and densely distributed logistics-related objects such as vehicles, containers, and infrastructure elements. In addition, a lightweight connection aggregation (CA) module, inspired by VoVNet and ShuffleNetV2, is integrated to enhance feature reuse while maintaining computational efficiency suitable for real-time UAV deployment. Furthermore, a challenging benchmark dataset, termed harder vision drone, is constructed by combining and refining samples from VisDrone and DOTA to better reflect real-world UAV remote sensing scenarios in logistics environments. Extensive experimental evaluations conducted on VisDrone 2021, DOTA v2, and the proposed dataset demonstrate that DV-YOLO consistently outperforms state-of-the-art detectors, achieving up to 3.5% improvement in mean average precision (mAP) compared with YOLOv9. These results highlight the potential of the proposed framework to support robust, accurate, and efficient aerial perception for smart logistics and UAV-based remote sensing applications.
Citation: Ahmed A. Alsheikhy, Mohammad Barr, Sahbi Boubaker, Yahia Said. DV-YOLO: a deep learning framework for small-object detection in UAV-based remote sensing imagery with applications to smart logistics[J]. AIMS Mathematics, 2026, 11(4): 12043-12063. doi: 10.3934/math.2026494
Unmanned aerial vehicles (UAVs) are being increasingly adopted as flexible remote sensing platforms for smart logistics applications, including warehouse inventory, last-mile delivery supervision, traffic flow analysis, port operations, and infrastructure inspection. Despite their advantages, reliable object detection in UAV-based remote sensing imagery remains challenging due to small object sizes, dense object distributions, arbitrary orientations, and complex backgrounds commonly encountered in logistics environments. Although recent YOLO-based detectors have shown promising performance, their effectiveness is often limited in high-resolution aerial scenes and under practical computational constraints imposed by UAV platforms. To address these challenges, this paper proposes DV-YOLO, an enhanced deep learning framework tailored for object detection in UAV-based remote sensing imagery for logistics-oriented applications. The proposed model extends YOLOv9 through a deeper and wider backbone architecture coupled with optimized feature fusion strategies that jointly exploit spatial and semantic representations. A novel cross-path fusion network at deep feature map (CPFNDFM) is introduced to improve the detection of small and densely distributed logistics-related objects such as vehicles, containers, and infrastructure elements. In addition, a lightweight connection aggregation (CA) module, inspired by VoVNet and ShuffleNetV2, is integrated to enhance feature reuse while maintaining computational efficiency suitable for real-time UAV deployment. Furthermore, a challenging benchmark dataset, termed harder vision drone, is constructed by combining and refining samples from VisDrone and DOTA to better reflect real-world UAV remote sensing scenarios in logistics environments. Extensive experimental evaluations conducted on VisDrone 2021, DOTA v2, and the proposed dataset demonstrate that DV-YOLO consistently outperforms state-of-the-art detectors, achieving up to 3.5% improvement in mean average precision (mAP) compared with YOLOv9. These results highlight the potential of the proposed framework to support robust, accurate, and efficient aerial perception for smart logistics and UAV-based remote sensing applications.
| [1] |
Y. Matsuzaka, R. Yashiro, AI-based computer vision techniques and expert systems, AI, 4 (2023), 289–302. https://doi.org/10.3390/ai4010013 doi: 10.3390/ai4010013
|
| [2] |
H. Yu, K. Zhang, X. Zhao, Y. Zhang, B. Cui, S. Sun, et al., Research on data link channel decoding optimization scheme for drone power inspection scenarios, Drones, 7 (2023), 662. https://doi.org/10.3390/drones7110662 doi: 10.3390/drones7110662
|
| [3] |
M. N. A. Ramadan, T. Basmaji, A. Gad, H. Hamdan, B. T. Akgün, M. A. H. Ali, et al., Towards early forest fire detection and prevention using AI-powered drones and the IoT, Int. Things, 27 (2024), 101248. https://doi.org/10.1016/j.iot.2024.101248 doi: 10.1016/j.iot.2024.101248
|
| [4] |
N. Al-lQubaydhi, A. Alenezi, T. Alanazi, A. Senyor, N. Alanezi, B. Alotaibi, et al., Deep learning for unmanned aerial vehicles detection: a review, Comput. Sci. Rev., 51 (2024), 100614. https://doi.org/10.1016/j.cosrev.2023.100614 doi: 10.1016/j.cosrev.2023.100614
|
| [5] |
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016,770–778. https://doi.org/10.1109/CVPR.2016.90 doi: 10.1109/CVPR.2016.90
|
| [6] |
H. Nguyen, Improving faster R-CNN framework for fast vehicle detection, Math. Probl. Eng., 2019 (2019), 3808064. https://doi.org/10.1155/2019/3808064 doi: 10.1155/2019/3808064
|
| [7] |
J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: unified, real-time object detection, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016,779–788. https://doi.org/10.1109/CVPR.2016.91 doi: 10.1109/CVPR.2016.91
|
| [8] | J. Redmon, YOLOv3: an incremental improvement, arXiv, 2018 https://doi.org/10.48550/arXiv.1804.02767 |
| [9] | T. Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, S. Belongie, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, 2117–2125. https://doi.org/10.1109/CVPR.2017.106 |
| [10] | A. Bochkovskiy, C. Y. Wang, H. Y. M. Liao, YOLOv4: optimal speed and accuracy of object detection, arXiv, 2020. https://doi.org/10.48550/arXiv.2004.10934 |
| [11] | S. Liu, L. Qi, H. Qin, J. Shi, J. Jia, Path aggregation network for instance segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, 8759–8768. |
| [12] | C. Y. Wang, I. H. Yeh, H. Y. M. Liao, YOLOv9: learning what you want to learn using programmable gradient information, arXiv, 2024. https://doi.org/10.48550/arXiv.2402.13616 |
| [13] |
Y. Li, W. Du, P. Yang, T. Wu, J. Zhang, D. Wu, A satisficing conflict resolution approach for multiple UAVs, IEEE Int. Things J., 6 (2019), 1866–1878. https://doi.org/10.1109/JIOT.2018.2889444 doi: 10.1109/JIOT.2018.2889444
|
| [14] |
F. Liu, Q. Zheng, X. Tian, F. Shu, W. Jiang, M. Wang, et al., Rethinking the multi-scale feature hierarchy in object detection transformer (DETR), Appl. Soft Comput., 175 (2025), 113081. https://doi.org/10.1016/j.asoc.2025.113081 doi: 10.1016/j.asoc.2025.113081
|
| [15] |
S. Gui, S. Song, R. Qin, Y. Tang, Remote sensing object detection in the deep learning era—a review, Remote Sens., 16 (2024), 327. https://doi.org/10.3390/rs16020327 doi: 10.3390/rs16020327
|
| [16] | C. Y. Wang, H. Y. M. Liao, Y. H. Wu, P. Y. Chen, J. W. Hsieh, I. H. Yeh, CSPNet: a new backbone that can enhance learning capability of CNN, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, 2020,390–391. |
| [17] |
W. Feng, Y. Zhu, J. Zheng, H. Wang, X. Chen, Embedded YOLO: a real-time object detector for small intelligent trajectory cars, Math. Probl. Eng., 2021 (2021), 6555513. https://doi.org/10.1155/2021/6555513 doi: 10.1155/2021/6555513
|
| [18] | N. Ma, X. Zhang, H. T. Zheng, J. Sun, ShuffleNet V2: practical guidelines for efficient CNN architecture design, Proceedings of the European Conference on Computer Vision, 2018,116–131. |
| [19] |
Y. Chen, W. Zheng, Y. Zhao, T. H. Song, H. Shin, DW-YOLO: an efficient object detector for drones and self-driving vehicles, Arab. J. Sci. Eng., 48 (2023), 1427–1436. https://doi.org/10.1007/s13369-022-06874-7 doi: 10.1007/s13369-022-06874-7
|
| [20] |
P. Jiang, D. Ergu, F. Liu, Y. Cai, B. Ma, A review of YOLO algorithm developments, Procedia Comput. Sci., 199 (2022), 1066–1073. https://doi.org/10.1016/j.procs.2022.01.135 doi: 10.1016/j.procs.2022.01.135
|
| [21] |
A. Lee, S. P. Yong, W. Pedrycz, J. Watada, Testing a vision-based autonomous drone navigation model in a forest environment, Algorithms, 17 (2024), 139. https://doi.org/10.3390/a17040139 doi: 10.3390/a17040139
|
| [22] |
S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: towards real-time object detection with region proposal networks, Adv. Neural Inf. Process. Syst., 28 (2015). https://doi.org/10.1109/TPAMI.2016.2577031 doi: 10.1109/TPAMI.2016.2577031
|
| [23] |
S. Yang, Z. Wang, Y. Wang, C. Zhang, Autonomous obstacle avoidance of UAV based on deep reinforcement learning, J. Intell. Fuzzy Syst., 43 (2022), 7441–7452. https://doi.org/10.3233/JIFS-211192 doi: 10.3233/JIFS-211192
|
| [24] | M. A. Wiering, M. V. Otterlo, Reinforcement learning, Adapt. Learn. Optim., 12 (2012), 729–730. |
| [25] | Y. Huang, Deep q-networks, In: H. Dong, Z. Ding, S. Zhang, Deep reinforcement learning: fundamentals, research and applications, Springer, 2020,135–160. https://doi.org/10.1007/978-981-15-4095-0_4 |
| [26] | Y. Wang, H. He, X. Tan, Truly proximal policy optimization, Proceedings of the Conference on Uncertainty in Artificial Intelligence, 2020,113–122. |
| [27] |
J. C. de Jesus, V. A. Kich, A. H. Kolling, R. B. Grando, M. A. de S. L. Cuadros, D. F. T. Gamarra, Soft actor-critic for navigation of mobile robots, J. Intell. Robot. Syst., 102 (2021), 31. https://doi.org/10.1007/s10846-021-01389-2 doi: 10.1007/s10846-021-01389-2
|
| [28] |
M. Thorpe, Y. van Gennip, Deep limits of residual neural networks, Res. Math. Sci., 10 (2023), 1–28. https://doi.org/10.1007/s40687-022-00370-y doi: 10.1007/s40687-022-00370-y
|
| [29] |
V. K. Singh, N. Anand, S. K. Sharma, A. Anjali, M. K. Shukla, R. S. Rathore, Low-light image enhancement for edge-based security surveillance in 6G-IoT visual systems, IEEE Int. Things J., 13 (2025), 8309–8322. https://doi.org/10.1109/JIOT.2025.3629839 doi: 10.1109/JIOT.2025.3629839
|
| [30] |
X. Wang, N. Pang, Y. Xu, T. Huang, J. Kurths, On state-constrained containment control for nonlinear multiagent systems using event-triggered input, IEEE Trans. Syst. Man Cybern. Syst., 54 (2024), 2530–2538. https://doi.org/10.1109/TSMC.2023.3345365 doi: 10.1109/TSMC.2023.3345365
|
| [31] |
Z. Wang, C. Mu, S. Hu, C. Chu, X. Li, Modelling the dynamics of regret minimization in large agent populations: a master equation approach, Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022,534–540. https://doi.org/10.24963/ijcai.2022/76 doi: 10.24963/ijcai.2022/76
|
| [32] |
A. M. Saxe, Y. Bansal, J. Dapello, M. Advani, A. Kolchinsky, B. D. Tracey, et al., On the information bottleneck theory of deep learning, J. Stat. Mech., 2019 (2019), 124020. https://doi.org/10.1088/1742-5468/ab3985 doi: 10.1088/1742-5468/ab3985
|
| [33] | C. Y. Wang, H. Y. M. Liao, I. H. Yeh, Designing network design strategies through gradient path analysis, arXiv, 2022. https://doi.org/10.48550/arXiv.2211.04800 |
| [34] |
G. Huang, Z. Liu, L. V. D. Maaten, K. Q. Weinberger, Densely connected convolutional networks, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, 4700–4708. https://doi.org/10.1109/CVPR.2017.243 doi: 10.1109/CVPR.2017.243
|
| [35] |
P. Zhu, L. Wen, D. Du, X. Bian, H. Fan, Q. Hu, et al., Detection and tracking meet drones challenge, IEEE Trans. Pattern Anal. Mach. Intell., 44 (2021), 7380–7399. https://doi.org/10.1109/TPAMI.2021.3119563 doi: 10.1109/TPAMI.2021.3119563
|
| [36] |
G. S. Xia, X. Bai, J. Ding, Z. Zhu, S. Belongie, J. Luo, et al., DOTA: a large-scale dataset for object detection in aerial images, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, 3974–3983. https://doi.org/10.1109/CVPR.2018.00418 doi: 10.1109/CVPR.2018.00418
|
| [37] | W. Li, Y. Chen, K. Hu, J. Zhu, Oriented reppoints for aerial object detection, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, 1829–1838. |
| [38] |
J. Han, J. Ding, N. Xue, G. S. Xia, ReDet: a rotation-equivariant detector for aerial object detection, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, 2786–2795. https://doi.org/10.1109/CVPR46437.2021.00281 doi: 10.1109/CVPR46437.2021.00281
|
| [39] | D. L. Nguyen, X. T. Vo, A. Priadana, K. H. Jo, Minor object recognition from drone image sequence, In: G. Irie, C. Shin, T. Shibata, K. Nakamura, Frontiers of computer vision, Springer, 2024. https://doi.org/10.1007/978-981-97-4249-3_12 |
| [40] | Q. Bi, B. Zhou, J. Yi, W. Ji, H. Zhan, G. S. Xia, GOOD: towards domain generalized oriented object detection, arXiv, 2024. https://doi.org/10.48550/arXiv.2402.12765 |
| [41] |
J. Ni, S. Zhu, G. Tang, C. Ke, T. Wang, A small-object detection model based on improved YOLOv8s for UAV image scenarios, Remote Sens., 16 (2024), 2465. https://doi.org/10.3390/rs16132465 doi: 10.3390/rs16132465
|