This study proposes an algorithm based on an improved YOLOv8s for nighttime vehicle recognition under low light and complex background conditions, addressing issues such as shallow feature loss, low recognition accuracy for occluded targets, and bounding box distortion. The core optimization strategies are as follows: First, RepGFPN (Re-parameterized Generalized Feature Pyramid Network) employs re-parameterized fusion, utilizing a customized fusion module to achieve multi-scale feature interaction. This preserves shallow-level details while eliminating redundant sampling and adjusting channels to enhance fusion efficiency. Second, the DAT (Deformable Attention Transformer) attention mechanism generates reference grids, while the Lightweight Network predicts deformation point offsets. Bilinear interpolation extracts key features, enhancing learning capabilities for small-sized and occluded targets. Third, the MPDIoU (Minimum Point Distance Intersection over Union) loss function minimizes Euclidean distances between bounding box corners, integrating intersection-over-union optimization to mitigate deformation caused by vehicle overlap. Experimental results demonstrate superior performance over the original model: accuracy improved by 6.4%, recall increased by 4.9%, latency reduced by 6.3 ms, mAP0.5 was boosted by 4.4%, and N_Params decreased by 2.5.
Citation: Ling Peng, Libiao Jiang. Nighttime vehicle target detection based on visual features[J]. Applied Computing and Intelligence, 2026, 6(1): 23-37. doi: 10.3934/aci.2026002
This study proposes an algorithm based on an improved YOLOv8s for nighttime vehicle recognition under low light and complex background conditions, addressing issues such as shallow feature loss, low recognition accuracy for occluded targets, and bounding box distortion. The core optimization strategies are as follows: First, RepGFPN (Re-parameterized Generalized Feature Pyramid Network) employs re-parameterized fusion, utilizing a customized fusion module to achieve multi-scale feature interaction. This preserves shallow-level details while eliminating redundant sampling and adjusting channels to enhance fusion efficiency. Second, the DAT (Deformable Attention Transformer) attention mechanism generates reference grids, while the Lightweight Network predicts deformation point offsets. Bilinear interpolation extracts key features, enhancing learning capabilities for small-sized and occluded targets. Third, the MPDIoU (Minimum Point Distance Intersection over Union) loss function minimizes Euclidean distances between bounding box corners, integrating intersection-over-union optimization to mitigate deformation caused by vehicle overlap. Experimental results demonstrate superior performance over the original model: accuracy improved by 6.4%, recall increased by 4.9%, latency reduced by 6.3 ms, mAP0.5 was boosted by 4.4%, and N_Params decreased by 2.5.
| [1] |
R. Litoriya, K. Bandhu, S. Gupta, I. Rajawat, H. Jagwani, C. Yadav, Implementing visual assistant using YOLO and SSD for visually-impaired persons, Journal of Automation, Mobile Robotics and Intelligent Systems, 17 (2023), 79–87. https://doi.org/10.14313/jamris/4-2023/33 doi: 10.14313/jamris/4-2023/33
|
| [2] |
I. Ogunrinde, S. Bernadin, Improved deepSORT-based object tracking in foggy weather for AVs using sematic labels and fused appearance feature network, Sensors, 24 (2024), 4692. https://doi.org/10.3390/s24144692 doi: 10.3390/s24144692
|
| [3] |
A. Sitepu, C. Liu, Optimized visual recognition through a deep convolutional neural network with hierarchical modular organization, IEEE Access, 12 (2024), 95517–95528. https://doi.org/10.1109/ACCESS.2024.3426350 doi: 10.1109/ACCESS.2024.3426350
|
| [4] |
M. Soltani-Gol, A. Asgharzadeh-Bonab, H. Soltanian-Zadeh, J. Mazlum, RDCU-Net: a multi-scale residual dilated convolution U-Net with spatial pyramid pooling for brain tumor segmentation, AUT J. Elec. Eng., 56 (2024), 203–212. https://doi.org/10.22060/eej.2023.22395.5538 doi: 10.22060/eej.2023.22395.5538
|
| [5] | M. Shyamala Devi, S. Alex David, S. Vinoth Kumar, M. Sandeep Prasan Kumar, S. Rohith, Fast-RCNN coupled four-dense layered deep fully connected neural network-based insulator chain defect defection, In: Proceedings of international conference on recent trends in computing, Singapore: Springer, 2024,513–524. https://doi.org/10.1007/978-981-97-1724-8_44 |
| [6] |
J. Wang, Y. Chen, X. Ji, Z. Dong, M. Gao, Z. He, SpikeTOD: a biologically interpretable spike-driven object detection in challenging traffic scenarios, IEEE Trans. Intell. Transp., 25 (2024), 21297–21314. https://doi.org/10.1109/TITS.2024.3468038 doi: 10.1109/TITS.2024.3468038
|
| [7] |
J. Wang, Y. Chen, X. Ji, Z. Dong, M. Gao, C. Lai, Vehicle-mounted adaptive traffic sign detector for small-sized signs in multiple working conditions, IEEE Trans. Intell. Transp., 25 (2023), 710–724. https://doi.org/10.1109/TITS.2023.3309644 doi: 10.1109/TITS.2023.3309644
|
| [8] |
J. Wang, Y. Chen, X. Ji, Z. Dong, M. Gao, C. Lai, Metaverse meets intelligent transportation system: an efficient and instructional visual perception framework, IEEE Trans. Intell. Transp., 25 (2024), 14986–15001. https://doi.org/10.1109/TITS.2024.3398586 doi: 10.1109/TITS.2024.3398586
|
| [9] |
L. Encío, D. Fuertes, C. Del-Blanco, I. Aguilar, C. Pérez-Benito, A. Jevtić, et al., Enhanced nighttime vehicle detection for on-board processing, IEEE Access, 13 (2025), 44817–44835. https://doi.org/10.1109/ACCESS.2025.3548837 doi: 10.1109/ACCESS.2025.3548837
|
| [10] |
F. Zhang, Nighttime vehicle detection algorithm based on improved YOLOv7, IEEE Access, 13 (2025), 126043–126051. https://doi.org/10.1109/ACCESS.2025.3587717 doi: 10.1109/ACCESS.2025.3587717
|
| [11] |
Z. Aldoski, C. Koren, Traffic sign detection and quality assessment using YOLOv8 in daytime and nighttime conditions, Sensors, 25 (2025), 1027. https://doi.org/10.3390/s25041027 doi: 10.3390/s25041027
|
| [12] | Y. Jiang, Z. Tan, J. Wang, X. Sun, M. Lin, H. Li, Giraffedet: a heavy-neck paradigm for object detection, arXiv: 2202.04256. https://doi.org/10.48550/arXiv.2202.04256 |
| [13] |
A. Shaikh, S. Amin, M. Zeb, A. Sulaiman, M. Al Reshan, H. Alshahrani, Enhanced brain tumor detection and segmentation using densely connected convolutional networks with stacking ensemble learning, Comput. Biol. Med., 186 (2025), 109703. https://doi.org/10.1016/j.compbiomed.2025.109703 doi: 10.1016/j.compbiomed.2025.109703
|
| [14] | X. Xu, Y. Jiang, W. Chen, Y. Huang, Y. Zhang, X. Sun, Damo-yolo: a report on real-time object detection design, arXiv: 2211.15444. https://doi.org/10.48550/arXiv.2211.15444 |
| [15] | C. Wang, M. Jia, M. Li, C. Bao, W. Jin, Attention is all you need for blind room volume estimation, Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023, 1341–1345. https://doi.org/10.1109/ICASSP48485.2024.10447723 |
| [16] |
L. Yin, L. Wang, S. Lu, R. Wang, Y. Yang, B. Yang, et al., Convolution-transformer for image feature extraction, Comput. Model. Eng. Sci., 141 (2024), 87–106. https://doi.org/10.32604/cmes.2024.051083 doi: 10.32604/cmes.2024.051083
|
| [17] | W. Wang, E. Xie, X. Li, D. Fan, K. Song, D. Liang, et al., Pyramid vision transformer: a versatile backbone for dense prediction without convolutions, Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), 2021,548–558. https://doi.org/10.1109/ICCV48922.2021.00061 |
| [18] | Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, et al., Swin transformer: Hierarchical vision transformer using shifted windows, Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), 2021, 9992–10002. https://doi.org/10.1109/ICCV48922.2021.00986 |
| [19] | Z. Xia, X. Pan, S. Song, L. Li, G. Huang, Vision transformer with deformable attention, Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, 4784–4793. https://doi.org/10.1109/CVPR52688.2022.00475 |
| [20] | S. Ma, Y. Xu, Mpdiou: a loss for efficient and accurate bounding box regression, arXiv: 2307.07662. https://doi.org/10.48550/arXiv.2307.07662 |
| [21] | F. Yu, H. Chen, X. Wang, W. Xian, Y. Chen, F. Liu, et al., Bdd100k: a diverse driving dataset for heterogeneous multitask learning, Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, 2633–2642. https://doi.org/10.1109/CVPR42600.2020.00271 |