Small object detection remains a significant challenge in computer vision, especially in scenarios such as remote sensing and unmanned aerial vehicles (UAVs) application, where there is considerable room for improvement in detection accuracy. The difficulty primarily arises from factors such as low resolution of the images, complex backgrounds in the images, and insufficient feature representation. To address these challenges, we propose FCA-YOLO, a novel detection framework built upon YOLOv11, specifically optimized for small object detection from UAV perspectives. First, a down-sampling structure was designed to better preserve small object features, which integrated a redundant compression feature transformation strategy with an inverted bottleneck residual block to enhance feature flow and representation capacity. Second, we propose a cross-scale feature fusion module that integrates spatial and channel attention mechanisms to effectively align and optimize multi-scale features, thereby enhancing the model's focus on small objects. Finally, a specialized detection structure was designed to enhance sensitivity to small targets, combining a dedicated detection head with skip connections that fused deep semantic features and shallow details, thereby improving the model's ability to capture fine-grained small information. Experimental results on the VisDrone2019 dataset demonstrated that FCA-YOLO outperforms the baseline model, achieving improvements of 3.1% in precision, 4.7% in recall, and 5% in mAP@0.5, while reducing the number of parameters by 30%. Compared with other YOLO variants and state-of-the-art algorithms, the proposed method achieves superior performance in terms of detection accuracy. Further evaluations on the DOTAv1.0 and VEDAI datasets validated the robustness and consistent detection performance of the proposed model across aerial imaging scenarios.
Citation: Qiming Li, Yonghui Yan, Shaohui Lan. FCA-YOLO: A small object detection method based on feature attention fusion for UAV remote sensing images[J]. AIMS Mathematics, 2026, 11(2): 5172-5191. doi: 10.3934/math.2026211
Small object detection remains a significant challenge in computer vision, especially in scenarios such as remote sensing and unmanned aerial vehicles (UAVs) application, where there is considerable room for improvement in detection accuracy. The difficulty primarily arises from factors such as low resolution of the images, complex backgrounds in the images, and insufficient feature representation. To address these challenges, we propose FCA-YOLO, a novel detection framework built upon YOLOv11, specifically optimized for small object detection from UAV perspectives. First, a down-sampling structure was designed to better preserve small object features, which integrated a redundant compression feature transformation strategy with an inverted bottleneck residual block to enhance feature flow and representation capacity. Second, we propose a cross-scale feature fusion module that integrates spatial and channel attention mechanisms to effectively align and optimize multi-scale features, thereby enhancing the model's focus on small objects. Finally, a specialized detection structure was designed to enhance sensitivity to small targets, combining a dedicated detection head with skip connections that fused deep semantic features and shallow details, thereby improving the model's ability to capture fine-grained small information. Experimental results on the VisDrone2019 dataset demonstrated that FCA-YOLO outperforms the baseline model, achieving improvements of 3.1% in precision, 4.7% in recall, and 5% in mAP@0.5, while reducing the number of parameters by 30%. Compared with other YOLO variants and state-of-the-art algorithms, the proposed method achieves superior performance in terms of detection accuracy. Further evaluations on the DOTAv1.0 and VEDAI datasets validated the robustness and consistent detection performance of the proposed model across aerial imaging scenarios.
| [1] |
Z. Li, Y. Wang, N. Zhang, Y. Zhang, Z. Zhao, D. Xu, et al., Deep learning-based object detection techniques for remote sensing images: a survey, Remote Sens., 14 (2022), 2385. https://doi.org/10.3390/rs14102385 doi: 10.3390/rs14102385
|
| [2] |
I. Attri, L. K. Awasthi, T. P. Sharma, P. Rathee, A review of deep learning techniques used in agriculture, Ecol. Inf., 77 (2023), 102217. https://doi.org/10.1016/j.ecoinf.2023.102217 doi: 10.1016/j.ecoinf.2023.102217
|
| [3] |
X. W. Ye, T. Jin, C. B. Yun, A review on deep learning-based structural health monitoring of civil infrastructures, Smart Struct. Syst., 24 (2019), 567–585. https://doi.org/10.12989/sss.2019.24.5.567 doi: 10.12989/sss.2019.24.5.567
|
| [4] |
A. Boukerche, Z. Hou, Object detection using deep learning methods in traffic scenarios, ACM Comput. Surv., 54 (2021), 1–35. https://doi.org/10.1145/3434398 doi: 10.1145/3434398
|
| [5] |
M. Hong, S. Li, Y. Yang, F. Zhu, Q. Zhao, L. Lu, SSPNet: Scale selection pyramid network for tiny person detection from UAV images, IEEE Geosci. Remote Sens. Lett., 19 (2021), 1–5. https://doi.org/10.1109/LGRS.2021.3103069 doi: 10.1109/LGRS.2021.3103069
|
| [6] |
Y. Cai, D. Du, L. Zhang, L. Wen, W. Wang, Y. Wu, et al., Guided attention network for object detection and counting on drones, Proceedings of the 28th ACM International Conference on Multimedia, 2020,709–717. https://doi.org/10.1145/3394171.3413816 doi: 10.1145/3394171.3413816
|
| [7] |
Y. Li, Q. Li, J. Pan, Y. Zhou, H. Zhu, H. Wei, C. Liu, SOD-YOLO: Small-object-detection algorithm based on improved YOLOv8 for UAV images, Remote Sens., 16 (2024), 3057. https://doi.org/10.3390/rs16163057 doi: 10.3390/rs16163057
|
| [8] | N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), San Diego, CA, USA, 2005,886–893. |
| [9] |
C. Cortes, V. Vapnik, Support-vector networks, Mach. Learn., 20 (1995), 273–297. https://doi.org/10.1007/BF00994018 doi: 10.1007/BF00994018
|
| [10] |
P. F. Felzenszwalb, R. B. Girshick, D. McAllester, D. Ramanan, Object detection with discriminatively trained part-based models, IEEE Trans. Pattern Anal. Mach. Intell., 32 (2010), 1627–1645. https://doi.org/10.1109/TPAMI.2009.167 doi: 10.1109/TPAMI.2009.167
|
| [11] |
S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks, IEEE Trans. Pattern Anal. Mach. Intell., 39 (2017), 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031 doi: 10.1109/TPAMI.2016.2577031
|
| [12] | W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, et al., SSD: Single shot multibox detector, Proceedings of the 14th European Conference on Computer Vision (ECCV), Amsterdam, Netherlands, 2016, 21–37. https://doi.org/10.1007/978-3-319-46448-0_2 |
| [13] |
T. Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, Focal loss for dense object detection, IEEE Trans. Pattern Anal. Mach. Intell., 42 (2020), 318–327. https://doi.org/10.1109/TPAMI.2018.2858826 doi: 10.1109/TPAMI.2018.2858826
|
| [14] | X. Zhou, D. Wang, P. Krähenbühl, Objects as points, arXiv Preprint, 2019. https://doi.org/10.48550/arXiv.1904.07850 |
| [15] |
M. Tan, R. Pang, Q. V. Le, EfficientDet: Scalable and efficient object detection, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, 10781–10790. https://doi.org/10.1109/CVPR42600.2020.01079 doi: 10.1109/CVPR42600.2020.01079
|
| [16] |
J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: unified, real-time object detection, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016,779–788. https://doi.org/10.1109/CVPR.2016.91 doi: 10.1109/CVPR.2016.91
|
| [17] | G. Jocher, J. Qiu, Ultralytics YOLO11, GitHub, San Francisco, CA, USA, 2024. |
| [18] |
Z. Liu, Y. Gao, Q. Du, M. Chen, W. Lv, YOLO-extract: Improved YOLOv5 for aircraft object detection in remote sensing images, IEEE Access, 11 (2023), 1742–1751. https://doi.org/10.1109/ACCESS.2023.3233964 doi: 10.1109/ACCESS.2023.3233964
|
| [19] |
Y. Zhang, M. Ye, G. Zhu, Y. Liu, P. Guo, J. Yan, FFCA-YOLO for small object detection in remote sensing images, IEEE Trans. Geosci. Remote Sens., 62 (2024), 1–15. https://doi.org/10.1109/TGRS.2024.3363057 doi: 10.1109/TGRS.2024.3363057
|
| [20] |
R. Li, Y. Shen, YOLOSR-IST: A deep learning method for small target detection in infrared remote sensing images based on super-resolution and YOLO, Signal Process., 208 (2023), 108962. https://doi.org/10.1016/j.sigpro.2023.108962 doi: 10.1016/j.sigpro.2023.108962
|
| [21] |
M. Wang, W. Yang, L. Wang, D. Chen, F. Wei, H. Ke, et al., FE-YOLOv5: Feature enhancement network based on YOLOv5 for small object detection, J. Vis. Commun. Image R., 90 (2023), 103752. https://doi.org/10.1016/j.jvcir.2023.103752 doi: 10.1016/j.jvcir.2023.103752
|
| [22] |
K. Li, Z. Lu, L. Meng, Z. Gao, YOLO-FA: Type-1 fuzzy attention based YOLO detector for vehicle detection, Expert Syst. Appl., 237 (2024), 121209. https://doi.org/10.1016/j.eswa.2023.121209 doi: 10.1016/j.eswa.2023.121209
|
| [23] |
Y. Hui, J. Wang, B. Li, DSAA-YOLO: UAV remote sensing small target recognition algorithm for YOLOv7 based on dense residual super-resolution and anchor frame adaptive regression strategy, J. King Saud Univ.-Comput. Inf. Sci., 36 (2024), 101863. https://doi.org/10.1016/j.jksuci.2023.101863 doi: 10.1016/j.jksuci.2023.101863
|
| [24] |
X. Zhu, S. Lyu, X. Wang, Q. Zhao, TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 2021, 2778–2788. https://doi.org/10.1109/iccvw54120.2021.00312 doi: 10.1109/iccvw54120.2021.00312
|
| [25] |
J. Gong, J. Zhao, F. Li, H. Zhang, Vehicle detection in thermal images with an improved YOLOv3-tiny, 2020 IEEE International Conference on Power, Intelligent Computing and Systems (ICPICS), 2020,253–256. https://doi.org/10.1109/ICPICS50287.2020.9201995 doi: 10.1109/ICPICS50287.2020.9201995
|
| [26] |
L. Shen, B. Lang, Z. Song, CA-YOLO: Model optimization for remote sensing image object detection, IEEE Access, 11 (2023), 64769–64781. https://doi.org/10.1109/ACCESS.2023.3290480 doi: 10.1109/ACCESS.2023.3290480
|
| [27] |
F. Tang, J. Ding, Q. Quan, L. Wang, C. Ning, S. K. Zhou, CMUNeXt: An efficient medical image segmentation network based on large kernel and skip fusion, 2024 IEEE International Symposium on Biomedical Imaging (ISBI), 2024, 1–5. https://doi.org/10.1109/ISBI56570.2024.10635609 doi: 10.1109/ISBI56570.2024.10635609
|
| [28] |
K. Han, Y. Wang, Q. Tian, J. Guo, C. Xu, C. Xu, GhostNet: More features from cheap operations, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, 1580–1589. https://doi.org/10.1109/cvpr42600.2020.00165 doi: 10.1109/cvpr42600.2020.00165
|
| [29] |
Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, Q. Hu, ECA-Net: Efficient channel attention for deep convolutional neural networks, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, 11534–11542. https://doi.org/10.1109/cvpr42600.2020.01155 doi: 10.1109/cvpr42600.2020.01155
|
| [30] |
D. Du, P. Zhu, L. Wen, X. Bian, H. Lin, Q. Hu, et al., VisDrone-DET2019: The vision meets drone object detection in image challenge results, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), 2019,213–226. https://doi.org/10.1109/ICCVW.2019.00030 doi: 10.1109/ICCVW.2019.00030
|
| [31] |
P. Adarsh, P. Rathi, M. Kumar, YOLOv3-Tiny: Object detection and recognition using one stage improved model, 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), 2020,687–694. https://doi.org/10.1109/ICACCS48705.2020.9074315 doi: 10.1109/ICACCS48705.2020.9074315
|
| [32] | Ultralytics, YOLOv5: A state-of-the-art real-time object detection system, 2021. Available from: https://docs.ultralytics.com. |
| [33] |
C. Li, L. Li, H. Jiang, K. Weng, Y. Geng, L. Li, et al., YOLOv6: A single-stage object detection framework for industrial applications, arXiv Preprint, 2022. https://doi.org/10.48550/arXiv.2209.02976 doi: 10.48550/arXiv.2209.02976
|
| [34] | G. Jocher, A. Chaurasia, J. Qiu, J. Stoken, YOLOv8: Ultralytics official implementation, GitHub repository, 2023. Available from: https://github.com/ultralytics/ultralytics. |
| [35] |
H. Chen, K. Chen, G. Ding, J. Han, Z. Lin, L. Liu, et al., YOLOv10: Real-time end-to-end object detection, Adv. Neural Inf. Proc. Syst., 37 (2024), 107984–108011. https://doi.org/10.52202/079017-3429 doi: 10.52202/079017-3429
|
| [36] |
Y. Tian, Q. Ye, D. Doermann, YOLOv12: Attention-centric real-time object detectors, arXiv Preprint, 2025. https://doi.org/10.48550/arXiv.2502.12524 doi: 10.48550/arXiv.2502.12524
|
| [37] |
Z. Cai, N. Vasconcelos, Cascade R-CNN: High quality object detection and instance segmentation, IEEE Trans. Pattern Anal. Mach. Intell., 43 (2021), 1483–1498. https://doi.org/10.1109/TPAMI.2019.2956516 doi: 10.1109/TPAMI.2019.2956516
|
| [38] |
Z. Chen, C. Yang, Q. Li, F. Zhao, Z. J. Zha, F. Wu, Disentangle your dense object detector, Proceedings of the 29th ACM International Conference on Multimedia, 2021, 4939–4948. https://doi.org/10.1145/3474085.3475351 doi: 10.1145/3474085.3475351
|
| [39] |
T. Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollar, Focal loss for dense object detection, 2017 IEEE International Conference on Computer Vision (ICCV), 2017, 2980–2988. https://doi.org/10.1109/ICCV.2017.324 doi: 10.1109/ICCV.2017.324
|
| [40] |
H. Zhang, Y. Wang, F. Dayoub, N. Sünderhauf, VarifocalNet: An IoU-aware dense object detector, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, 8514–8523. https://doi.org/10.1109/cvpr46437.2021.00841 doi: 10.1109/cvpr46437.2021.00841
|
| [41] |
Y. Cao, Z. He, L. Wang, W. Wang, Y. Yuan, D. Zhang, VisDrone-DET2021: The vision meets drone object detection challenge results, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 2021, 2847–2854. https://doi.org/10.1109/iccvw54120.2021.00319 doi: 10.1109/iccvw54120.2021.00319
|
| [42] | G. T. Mao, T. M. Deng, N. J. Yun, Object detection in UAV images based on multi-scale split attention, Acta Aeronauticaet Astronaut. Sin., 44 (2023), 326738. |
| [43] |
J. Su, Y. Qin, Z. Jia, B. Liang, MPE-YOLO: Enhanced small target detection in aerial imaging, Sci. Rep., 14 (2024), 17799. https://doi.org/10.21203/rs.3.rs-3998400/v1 doi: 10.21203/rs.3.rs-3998400/v1
|
| [44] |
G. S. Xia, X. Bai, J. Ding, Z. Zhu, S. Belongie, J. Luo, et al., DOTA: A large-scale dataset for object detection in aerial images, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, 3974–3983. https://doi.org/10.1109/cvpr.2018.00418 doi: 10.1109/cvpr.2018.00418
|
| [45] |
S. Razakarivony, F. Jurie, Vehicle detection in aerial imagery: a small target detection benchmark, J. Vis. Commun. Image Represent., 34 (2016), 187–203. https://doi.org/10.1016/j.jvcir.2015.11.002 doi: 10.1016/j.jvcir.2015.11.002
|
| [46] |
Y. Zhao, W. Lv, S. Xu, J. Wei, G. Wang, Q. Dang, et al., Detrs beat YOLOs on real-time object detection, 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, 16965–16974. https://doi.org/10.1109/cvpr52733.2024.01605 doi: 10.1109/cvpr52733.2024.01605
|