Research article

Workshop AGV path planning based on improved A* algorithm


  • Received: 25 October 2023 Revised: 03 December 2023 Accepted: 22 December 2023 Published: 10 January 2024
  • This article proposes an improved A* algorithm aimed at improving the logistics path quality of automated guided vehicles (AGVs) in digital production workshops, solving the problems of excessive path turns and long transportation time. The traditional A* algorithm is improved internally and externally. In the internal improvement process, we propose an improved node search method within the A* algorithm to avoid generating invalid paths; offer a heuristic function which uses diagonal distance instead of traditional heuristic functions to reduce the number of turns in the path; and add turning weights in the A* algorithm formula, further reducing the number of turns in the path and reducing the number of node searches. In the process of external improvement, the output path of the internally improved A* algorithm is further optimized externally by the improved forward search optimization algorithm and the Bessel curve method, which reduces path length and turns and creates a path with fewer turns and a shorter distance. The experimental results demonstrate that the internally modified A* algorithm suggested in this research performs better when compared to six conventional path planning methods. Based on the internally improved A* algorithm path, the full improved A* algorithm reduces the turning angle by approximately 69% and shortens the path by approximately 10%; based on the simulation results, the improved A* algorithm in this paper can reduce the running time of AGV and improve the logistics efficiency in the workshop. Specifically, the walking time of AGV on the improved A* algorithm path is reduced by 12s compared to the traditional A* algorithm.

    Citation: Na Liu, Chiyue Ma, Zihang Hu, Pengfei Guo, Yun Ge, Min Tian. Workshop AGV path planning based on improved A* algorithm[J]. Mathematical Biosciences and Engineering, 2024, 21(2): 2137-2162. doi: 10.3934/mbe.2024094

    Related Papers:

    [1] Fuhong Meng, Guowu Yuan, Hao Zhou, Hao Wu, Yi Ma . Improved MViTv2-T model for insulator defect detection. AIMS Electronics and Electrical Engineering, 2025, 9(1): 1-25. doi: 10.3934/electreng.2025001
    [2] Sanjay Dubey, M. C. Chinnaiah, I. A. Pasha, K. Sai Prasanna, V. Praveen Kumar, R. Abhilash . An IoT based Ayurvedic approach for real time healthcare monitoring. AIMS Electronics and Electrical Engineering, 2022, 6(3): 329-344. doi: 10.3934/electreng.2022020
    [3] Deven Nahata, Kareem Othman . Exploring the challenges and opportunities of image processing and sensor fusion in autonomous vehicles: A comprehensive review. AIMS Electronics and Electrical Engineering, 2023, 7(4): 271-321. doi: 10.3934/electreng.2023016
    [4] Patrick Seeling . Augmented Reality Device Operator Cognitive Strain Determination and Prediction. AIMS Electronics and Electrical Engineering, 2017, 1(1): 100-110. doi: 10.3934/ElectrEng.2017.1.100
    [5] Benaoumeur Ibari, Mourad Hebali, Baghdadi Rezali, Menaouer Bennaoum . Collision detection and external force estimation for robot manipulators using a composite momentum observer. AIMS Electronics and Electrical Engineering, 2024, 8(2): 247-264. doi: 10.3934/electreng.2024011
    [6] Habib Hadj-Mabrouk . Contribution of artificial intelligence and machine learning to the assessment of the safety of critical software used in railway transport. AIMS Electronics and Electrical Engineering, 2019, 3(1): 33-70. doi: 10.3934/ElectrEng.2019.1.33
    [7] Santosh Prabhakar Agnihotri, Mandar Padmakar Joshi . Alternating current servo motor and programmable logic controller coupled with a pipe cutting machine based on human-machine interface using dandelion optimizer algorithm - attention pyramid convolution neural network. AIMS Electronics and Electrical Engineering, 2024, 8(1): 1-27. doi: 10.3934/electreng.2024001
    [8] K.V. Dhana Lakshmi, P.K. Panigrahi, Ravi kumar Goli . Machine learning assessment of IoT managed microgrid protection in existence of SVC using wavelet methodology. AIMS Electronics and Electrical Engineering, 2022, 6(4): 370-384. doi: 10.3934/electreng.2022022
    [9] Syeda Nadiah Fatima Nahri, Shengzhi Du, Barend J. van Wyk, Oluwaseun Kayode Ajayi . A comparative study on time-delay estimation for time-delay nonlinear system control. AIMS Electronics and Electrical Engineering, 2025, 9(3): 314-338. doi: 10.3934/electreng.2025015
    [10] Levent Ergün, Roman Müller Hainbach, Stefan Butzmann . Methodology of a hierarchical and automated failure analysis and its advantages. AIMS Electronics and Electrical Engineering, 2024, 8(3): 370-379. doi: 10.3934/electreng.2024017
  • This article proposes an improved A* algorithm aimed at improving the logistics path quality of automated guided vehicles (AGVs) in digital production workshops, solving the problems of excessive path turns and long transportation time. The traditional A* algorithm is improved internally and externally. In the internal improvement process, we propose an improved node search method within the A* algorithm to avoid generating invalid paths; offer a heuristic function which uses diagonal distance instead of traditional heuristic functions to reduce the number of turns in the path; and add turning weights in the A* algorithm formula, further reducing the number of turns in the path and reducing the number of node searches. In the process of external improvement, the output path of the internally improved A* algorithm is further optimized externally by the improved forward search optimization algorithm and the Bessel curve method, which reduces path length and turns and creates a path with fewer turns and a shorter distance. The experimental results demonstrate that the internally modified A* algorithm suggested in this research performs better when compared to six conventional path planning methods. Based on the internally improved A* algorithm path, the full improved A* algorithm reduces the turning angle by approximately 69% and shortens the path by approximately 10%; based on the simulation results, the improved A* algorithm in this paper can reduce the running time of AGV and improve the logistics efficiency in the workshop. Specifically, the walking time of AGV on the improved A* algorithm path is reduced by 12s compared to the traditional A* algorithm.



    With the rapid development of the chemical industry, the safety management of chemical parks has become a crucial aspect of maintaining industrial ecological balance. Chemical parks, due to the presence of numerous hazardous chemicals and high-risk operational environments, demand strict management of personnel safety behaviors. Statistics show that frequent industrial accidents in China's manufacturing sector have resulted in significant casualties and economic losses each year, with most of these accidents caused by operator violations, such as smoking in work areas or failing to wear safety equipment. Consequently, enhancing safety management levels in chemical parks is imperative, prompting increasing attention from scholars and researchers toward exploring effective safety management strategies and technologies.

    A significant portion of production accidents in the chemical industry is attributed to unsafe behaviors of personnel. For instance, behaviors such as smoking in hazardous areas or not wearing protective equipment are major causes of accidents. According to data from the State Administration of Work Safety, over 70% of accidents each year are due to unsafe behaviors. This situation not only threatens the lives of workers but also brings substantial economic losses and reputational damage to enterprises.

    With the rapid advancements in computer technology and continuous breakthroughs in artificial intelligence, machine vision recognition technology based on image processing has emerged as a vital tool for enhancing monitoring efficiency. Machine vision technology spans from traditional target detection to deep learning-based object detection. Traditional methods typically use HOG, Haar features, or color information combined with classifiers like AdaBoost or Support Vector Machines (SVM) for target recognition. In the framework of deep learning, object detection algorithms have evolved into two major categories: Region-based two-stage detection algorithms, such as R-CNN, Fast R-CNN [1], and Faster R-CNN [2] and direct regression-based single-stage detection algorithms, including SSD [3] and the YOLO [4]series. These advanced machine vision technologies offer new solutions for safety management in chemical parks, significantly improving the accuracy and real-time performance of monitoring, thereby effectively preventing and reducing industrial accidents.

    Although deep learning algorithms have shown great potential in detecting smoking behavior and helmet-wearing compliance in chemical industrial parks, they face numerous challenges in practical applications. On the one hand, the monitoring environment in chemical plants is often complex, with drastic lighting changes, high personnel density, and frequent occlusions, all of which pose significant obstacles to the detection of small targets. For instance, small objects such as distant workers or smoke are easily missed or misidentified, while complex background interference, such as occlusion or backlighting, can significantly reduce detection accuracy. On the other hand, traditional video image analysis methods rely on manual feature extraction, which is not only inefficient but also poorly adaptable to dynamic industrial environments. Although some industrial parks have deployed deep learning-based detection models in their equipment, these models often fail to meet practical demands, exhibiting low detection rates along with frequent false positives and false negatives. Therefore, improving the model's ability to detect small targets in complex environments, enhancing robustness, and effectively reducing false and missed detections remain critical research issues in this field.

    To achieve accurate classification and precise localization of personnel safety in chemical parks, meet the precision requirements of industrial equipment, reduce false alarms and missed detections, and significantly decrease the workload of manual inspection, we propose a personnel safety behavior detection system based on an artificial intelligence framework. The major contributions of this algorithm include:

    (1) APIoU Bounding Box Regression Loss: We introduce APIoU as the bounding box regression loss for network optimization. This modification effectively balances the gradient gains between high-quality and low-quality samples, improving the model's localization capabilities.

    (2) RCPCA Attention Mechanism: A novel attention mechanism, RCPCA, is proposed, enabling the model to better extract background information, thereby enhancing detection performance.

    (3) RFAConv to Replace Traditional Convolution: The use of RFAConv instead of traditional convolution (Conv) assigns different weights to each receptive field position and feature channel, highlighting important detail information.

    (4) Bidirectional Feature Pyramid Network (BiFPN): The neck design incorporates a Bidirectional Feature Pyramid Network (BiFPN) for the weighted fusion of multi-scale feature maps, enhancing the model's ability to detect targets of varying scales.

    (5) Small Object Detection Layer: An additional small object detection layer is integrated into the YOLOv8 network, significantly improving the detection capability for small targets.

    The structure of this paper is as follows: In Section 2, we provide an overview of related work and the current state of research; in Section 3, we introduce the methodology, detailing the proposed algorithm's framework and implementation specifics, including the loss function, RCPCA attention mechanism, RFAConv, and BiFPN; in Section 4, we discuss the experimental setup, results, and provide a discussion on the effectiveness of the method; finally, in Section 5, we conclude the research findings and discuss future research directions.

    In the context of personnel behavior safety recognition in chemical plants, traditional target detection methods extract relevant features using descriptors and employ classifiers to detect worker safety based on category information. For instance, Rubaiyat et al. [5] utilized Discrete Cosine Transform (DCT) to extract frequency domain information and HOG features from images, employing SVM to identify candidate worker regions. They further used Circular Hough Transform and color combination features to detect whether workers were wearing safety helmets. Sun et al. [6] improved a visual background extractor to detect moving targets and determined the helmet position based on the head-to-body ratio. They applied Principal Component Analysis (PCA) for feature vector dimensionality reduction and used a Bayesian-optimized SVM model to recognize safety helmets, followed by the Mean Shift algorithm to track them. Additionally, Huang et al. [7] designed a smoking behavior detection method, using SVM to model and classify smoke features and smoking actions. Despite their effectiveness to some extent, these traditional methods face challenges in complex real-world scenarios, including high time complexity, low robustness, low accuracy, and susceptibility to false positives and false negatives.

    Deep learning-based two-stage object detection methods involve initially identifying candidate regions, followed by classification and localization of these regions to detect worker safety. Park et al. [8] employed a Region-based Fully Convolutional Network (R-FCN) for object detection and classification, utilizing transfer learning techniques to train the model for safety helmet detection. Widiarsini et al. [9] constructed a smoking detection model based on Region-based Convolutional Neural Networks (R-CNN) and used Mask R-CNN for cigarette segmentation. While two-stage detection algorithms perform well in terms of accuracy, their detection speed is relatively slow, making it challenging to meet real-time requirements. Sond et al. [10] utilized Very Deep Residual Networks (VD-ResNet) to detect construction workers in image sequences with varying postures and backgrounds, providing efficient and accurate technical support for real-time detection.

    Deep learning-based single-stage object detection methods achieve a balance between detection accuracy and real-time performance, making them widely applicable in engineering practices. These methods use global information to regress target detection bounding boxes and category information directly from the entire image. For instance, Wang et al. [11] replaced the residual convolutional layers in YOLOv3's Darknet-53 network with Inception-ResNet modules, modified the number of convolutional layers to enhance network performance, and added detection scales for small targets. They used the K-means algorithm to cluster anchor boxes, thereby improving the network's ability to detect small safety helmets and uniforms. Deng et al. [12] improved YOLOv4 for helmet detection, using K-means clustering to obtain optimal prior boxes and multi-scale training to enhance suitability across detection scales. Tan et al. [13] extended YOLOv5 by adding scales for small target detection and introduced the DIoU-NMS algorithm to replace the traditional non-maximum suppression, resulting in more accurate suppression of helmet prediction boxes. Fu et al. [14] proposed the GD-YOLO network based on YOLOv7, which includes an efficient feature extraction module, D-LAN, for detecting smoking and phone usage behaviors. Li et al. [15] developed an improved YOLO-PL algorithm for helmet detection in construction environments, aiming to enhance detection accuracy and real-time performance. Nath et al. [16] introduced a real-time detection method for construction worker personal protective equipment (PPE) compliance based on deep learning, which helps reduce construction site accidents and improve safety compliance. Li et al. [17] proposed the YOLOv5-SFE algorithm, integrating spatial and temporal features to improve the accuracy of detecting and recognizing worker behaviors in real-time.

    Although traditional methods and deep learning approaches have made certain progress in personnel behavior safety recognition in chemical environments, challenges remain in terms of detection accuracy in complex scenarios. Traditional methods have high computational complexity and lack robustness, making it difficult to handle complex backgrounds and variable lighting conditions typically encountered in practical applications.

    Two-stage detection models, such as Faster R-CNN and R-FCN, although showing significant advantages in detection accuracy, involve higher computational costs due to the separation of candidate region generation and subsequent classification, which makes them less efficient for high-performance processing. In recent years, Transformer-based models (such as DETR) have achieved end-to-end training and made significant progress, but their high computational complexity and limitations in processing large-scale datasets restrict their application in resource-constrained environments. In contrast, YOLO series models, by adopting a single-stage detection framework, directly regress the target bounding boxes and category information from the entire image, achieving a good balance between accuracy and efficiency. Especially, YOLOv8n, with its streamlined network structure and optimized training strategies, performs exceptionally well in multi-class object detection tasks in complex backgrounds, effectively addressing detection requirements in various safety monitoring scenarios. This gives YOLOv8n a clear advantage in applications that require high safety standards, such as construction sites and chemical plants.

    In summary, considering the system's requirements for detection accuracy and efficiency, we choose YOLOv8n as the baseline model. It achieves an optimal balance between accuracy and efficiency, and can run stably in resource-constrained monitoring environments, meeting the practical needs of high-safety scenarios such as chemical plants.

    The YOLO (You Only Look Once) algorithm, introduced in 2016, is renowned for its speed and real-time capabilities. However, as a single-stage algorithm, it often leads to higher false-positive and false-negative rates, especially with small and densely packed objects. To address these limitations, the YOLO algorithm has undergone continuous optimization, resulting in several iterations, including YOLOv2 [18], YOLOv3 [19], YOLOv4 [20], YOLOX [21], and YOLOv7 [22], as well as modifications like YOLO-Tiny. YOLOv8, one of the latest single-stage object detection algorithms, strikes a good balance between detection accuracy and speed. Given the practical application scenarios in chemical parks, YOLOv8n is particularly suitable due to its simpler network structure, minimal computational requirements, and fastest running speed, making it highly portable. Therefore, we focus on further improving the YOLOv8n model. The network structure of the YOLOv8n model comprises four main components: Input, Backbone, Neck, and Head. The Input stage involves Mosaic data augmentation to enrich the dataset and an anchor-free strategy to reduce the number of predicted boxes, thereby accelerating the Non-Maximum Suppression (NMS) process. The Backbone includes Conv, C2f, and SPPF (Spatial Pyramid Pooling-Fast) modules, which handle convolution, batch normalization, and feature extraction. The Neck combines the Feature Pyramid Network (FPN) and Path Aggregation Network (PAN) to ensure effective feature fusion across different stages. The Head uses a Decoupled Head strategy, separating the classification head from the detection head, and derives target class and location information from feature maps of three different scales. By refining these components, the improved YOLOv8n model is tailored to better address the challenges of detecting unsafe behaviors in chemical parks, thereby enhancing safety management.

    In this paper, we design an improved target detection algorithm for chemical parks, named YOLOv8-ARR, using YOLOv8n as the baseline model. The specific contributions of this work are as follows: (1) We introduce APIoU as the network optimization for bounding box regression loss, effectively balancing the gradient gains between high-quality and low-quality samples and enhancing the model's localization capabilities. (2) We propose a novel attention mechanism, RCPCA, enabling the model to better extract background information and improve feature extraction capabilities. (3) We design the RFConv module to replace the original Conv module, assigning different weights to each receptive field position and feature channel, highlighting important detail information. Additionally, we modify the stride of P2 in the backbone network to 1, enhancing the backbone's extraction capabilities and reducing false positives and missed detections in chemical park images. (4) The neck design incorporates a Bidirectional Feature Pyramid Network (BiFPN) for the weighted fusion of multi-scale feature maps. (5) Considering the need for shallow feature maps for small targets, we introduce an additional P2 small object detection head in the head network to more effectively capture the details and local features of small targets. The YOLOv8-ARR structure diagram is shown in Figure 1.

    Figure 1.  YOLOv8-ARR Structure Diagram. The RCPCA attention mechanism enables the model to better extract background information, enhancing its feature extraction capabilities. The RFConv module is designed to replace the original Conv module, assigning different weights to each receptive field position and feature channel, highlighting important detail information. A Bidirectional Feature Pyramid Network (BiFPN-Concat) is introduced in the neck design for the weighted fusion of multi-scale feature maps.

    The loss function is a crucial component in evaluating samples for deep neural networks. Choosing the appropriate loss function significantly impacts the model's convergence speed, thereby enhancing detection accuracy and minimizing false positives and false negatives. Accurate localization in chemical park detection is particularly essential. The YOLOv8 model employs Complete Intersection over Union (CIoU) for bounding box regression. The CIoU loss function is defined as follows:

    LCIoU=1IoU+d2c2+αv (1)

    where IoU represents the Intersection over Union, d is the distance between the center points of the predicted and ground truth boxes, and c is the diagonal length of the smallest enclosing box covering the predicted and ground truth boxes. The expressions for α and v are given by:

    v=4(arctan(wgthgt)arctan(wh))2π2 (2)
    α=v(1IoU+v) (3)

    Here, hgt and wgt denote the height and width of the ground truth box, while h and w represent the height and width of the predicted box.

    Although the CIoU loss function introduces two penalty terms to account for the differences in center point distance and aspect ratio, it does not directly capture the shape differences between the anchor box and the target box. This can lead to suboptimal solutions or unstable convergence during training. Moreover, these penalty terms do not reflect changes in the size of the target box, potentially affecting the model's performance in detecting objects of various sizes.

    The IoU loss function is affected by unreasonable penalty factors, leading to the expansion of anchor boxes during regression, significantly slowing down the convergence rate. Some loss functions even cause the anchor boxes to increase in size. Therefore, we introduce and improve the PIoU loss function. The PIoU loss function incorporates a target size-adaptive penalty factor and a gradient adjustment function based on anchor box quality, guiding the anchor box regression along an efficient path to achieve faster convergence than the existing IoU loss. The PIoU loss function is given by Eq (4).

    LPIoU=1PIoU (4)

    where PIoU and p are defined in Eqs (5) and (6):

    PIoU=IoU+ep21 (5)
    p=(dw1wgt+dw2wgt+dh1hgt+dh2hgt)4 (6)

    In the detection process within chemical industrial parks, the pixels occupied by objects in the field of view are influenced by lighting conditions and object types. Therefore, to balance the positioning and size of the target bounding boxes and enhance the model's generalization ability in different target scenarios, we have improved the PIoU loss function to obtain the APIoU loss. In the PIoU loss function, we introduce an area ratio, which is the ratio of the product of the predicted box area and the ground truth box area to the square of the area of the smallest enclosing box for the predicted and ground truth boxes. To better illustrate APIoU, we have drawn the structure diagram in Figure 2. The APIoU loss function is given by Eq (7):

    LPIoU=1APIoU (7)

    where APIoU and area are defined in Eqs (8) and (9):

    APIoU=IoU+ep2+earea22 (8)
    area=(wh)(wgthgt)(wchc)2 (9)
    Figure 2.  The structure of APIoU.

    In complex industrial scenarios such as chemical parks, targets often exhibit significant variations in scale and are frequently subject to severe occlusion. To address these challenges, we have optimized the APIoU loss function by introducing an area ratio term, which more accurately captures the size variations of bounding boxes and enhances the model's ability to detect objects of different scales. Additionally, by integrating the regression task with area-related information, the loss function strengthens the model's feature learning capability, enabling it to better capture subtle differences in object shapes and improving adaptability to multi-scale object detection. This effectively overcomes the limitations of traditional PIoU in localization accuracy and better meets the demands of chemical parks for high-precision and robust object detection.

    We propose a reinforced channel-priority contextual attention mechanism (RCPCA) to enhance the model's ability to extract background information, thereby improving object detection performance. In chemical park environments, objects are usually small or occluded, making traditional appearance-based detection methods ineffective. Therefore, extracting background information becomes crucial. The RCPCA mechanism enhances the model's ability to handle complex backgrounds and small objects by integrating three key components: A multi-scale feature extraction unit (MSFU), feature fusion with residual connections, and a channel-priority contextual attention mechanism (CPCA), as shown in Figure 3. MSFU extracts diverse feature representations from multiple scales, while feature fusion and residual connections retain important spatial information. CPCA dynamically adjusts the importance of different channels and guides the model to focus on the most relevant contextual information for accurate object detection. This mechanism significantly improves the robustness and accuracy of the model, especially in challenging scenarios such as occlusion and complex background.

    Figure 3.  The structure of the RCPCA attention mechanism.

    The core of the RCPCA consists of two key feature extraction modules: conv1 and conv2 of the MSFU. Conv1 utilizes a 3×3 convolutional kernel to capture local features, while conv2 uses a 5×5 convolutional kernel to capture broader contextual information. The outputs of these modules are normalized using BatchNorm2d and activated using the GELU activation function, introducing non-linear transformations that provide rich multi-scale feature representations for subsequent feature fusion. Given an input feature map FRH×W×C, where H and W represent the height and width of the feature map, respectively, and C represents the number of channels, the mathematical expressions for MSFU are as follows:

    Fconv1=GELU(BatchNorm2d(Conv2d(F,3×3,1,1))) (10)
    Fconv2=GELU(BatchNorm2d(Conv2d(F,5×5,1,2))) (11)

    Based on the multi-scale features generated by MSFU, channel fusion is performed using a 1×1 convolutional kernel, effectively integrating feature dimensions to form richer feature representations. This fusion strategy helps to combine information from different scales, providing more comprehensive input for the subsequent attention mechanism. Following this, a residual connection mechanism is introduced, adding the fused features to the original input to maintain the deep feature transmission of the network and enhance the model's learning ability. This design helps to alleviate the vanishing or exploding gradient problem in deep network training, while enabling the network to learn more complex feature representations. Finally, the attention features generated by CPCA are multiplied by the result of the feature fusion to obtain enhanced feature representations. The feature fusion strategy is implemented through the 1×1 convolution operation as follows:

    Fcat=Conv2d(Concatenate(Fconv1,Fconv2),1×1,1,0) (12)

    The residual connection is:

    Fres=Fcat+F (13)

    The CPCA mechanism, proposed by Huang et al. [23], is a lightweight and high-performance convolutional neural network attention mechanism that dynamically allocates attention weights in both the channel and spatial dimensions. This mechanism more effectively utilizes the information in the input feature map, enhancing the network model's feature extraction capabilities. The CPCA attention mechanism consists of two major parts: The channel attention module and the spatial attention module.

    The channel attention module enhances the feature representation of each channel by calculating the weights of each channel. Channels containing significant or important feature information are assigned larger weights, while channels with less important feature information are assigned smaller weights. First, the input feature map F(H×W×C) undergoes global max pooling and global average pooling along the channel dimension to compute the maximum and average feature values for each channel, resulting in two feature vectors (1×1×C) representing the global maximum and average features of each channel. These two feature vectors are then input into a two-layer shared multilayer perceptron (MLP), where the first layer has C/r neurons (being the reduction ratio), and the second layer has C neurons. This MLP is used to learn the attention weights for each channel. By learning these weight parameters, the network can adaptively determine which channels are more important for the current task. The outputs of the MLP are then element-wise summed and processed through a Sigmoid function to obtain the final channel attention weight vector CA(F). The calculation formula is:

    CA(F)=σ(MLP(AvgPool(F))+MLP(MaxPool(F))) (14)

    where AvgPool represents global average pooling, MaxPool represents global max pooling, and σ represents the Sigmoid activation function. The channel attention weight vector CA(F) is element-wise multiplied with the input feature map 𝐹 to generate the channel-refined feature map Fcr. The calculation formula is:

    Fcr=CA(F)F (15)

    where denotes element-wise multiplication.

    The spatial attention module captures spatial structural information of the image by calculating spatial weights for each pixel. Complementing the channel attention module, the spatial attention module dynamically updates the feature information at each pixel location based on its spatial weight. First, the channel-refined feature map Fcr, generated by the channel attention module, is used as the input feature map for this module. Fcr is fed into a depthwise convolution layer with a kernel size of 5, producing an intermediate feature map Fm. Next, Fm is passed through three depthwise separable convolution paths L1, L2, and L3 to obtain feature maps of different scales: FL1, FL2, and FL3. Path L1 has convolution kernels of sizes (1, 7) and (7, 1); path L2 has convolution kernels of size (1, 11) and (11, 1); and path L3 has convolution kernels of sizes (1, 21) and (21, 1). These operations effectively capture multi-scale spatial information within the channels. Then, the feature maps Fm, FL1, FL2, and FL3 are element-wise summed. The fused features are processed through a convolution layer with a kernel size of 1, ensuring the integration of channel information and the effective extraction of spatial information, resulting in the spatial attention feature map SA(Fcr). The calculation is as follows:

    SA(Fcr)=Conv1×1(3i=0Branchi(DwConv(F))) (16)

    where DwConv represents depthwise convolution, Branchi, (i{0,1,2,3}) represents the i-th branch, and Branch0 is the residual connection. Convn×m denotes a convolution operation with a kernel size of (n, m).

    The spatial attention feature map SA(Fcr) is then element-wise multiplied with the channel-refined feature map Fcr to generate the CPCA attention-enhanced feature Fcpca. The calculation formula is:

    Fcpca=SA(Fcr)Fcr (17)

    By introducing RFAConv [24] to replace the Conv module, we improved the YOLOv8n model. The specific structure is shown in Figure 4. RFAConv uses a receptive field weight matrix to assign different weights to each receptive field position and feature channel, highlighting important detail information. Additionally, RFAConv dynamically generates receptive field spatial features and adapts the shape and range of the receptive field according to the size of the convolutional kernel to accommodate targets of different sizes. It generates smaller receptive fields for small targets to retain fine details, and larger receptive fields for larger targets to capture global features. This flexible adjustment of receptive field size improves detection accuracy for targets of various sizes.

    Figure 4.  The structure of RFAConv.

    The module implements a lightweight convolutional layer (group convolution), saving many network parameters. Additionally, the generated attention mechanism enables the network to focus on learning important information at each feature map level. Assuming FϵRC×H×W and F'ϵRC×H×W are the input and output feature maps, respectively, the working principle of RFAConv can be expressed as follows:

    F'=Conv2D3×3(Reshape(ARF×FRA)), (18)

    where ARF is the receptive field attention map, FRA is the receptive field spatial feature, Conv2D3×3 is a 3×3 standard convolution, and Reshape is a reshaping operation that changes the tensor dimensions. The formulas for ARF and FRA are as follows:

    A_RF=Softmax(g1×1(AvgPool(F))) (19)
    FRA=ReLU(BN(g3×3(F))) (20)

    where g1×1 represents a group convolution operation with a kernel size of 1×1, AvgPool is an average pooling layer, and BN is batch normalization. Softmax and ReLU are activation functions.

    By integrating the receptive field attention convolution (RFAConv) into the YOLOv8n network, detection performance is significantly enhanced. RFAConv emphasizes the detailed features of targets, reduces information loss, and more accurately locates targets, providing reliable safety support and assurance for personnel safety in chemical industrial parks.

    To solve the problem of detecting targets with various scales in the complex environment of the chemical industrial park, we optimized the feature fusion method. The traditional Concat structure treats high-level semantic features and low-level detail features equally during fusion, resulting in insufficient detection accuracy of the model for multi-scale targets such as helmet wearers under occlusion and smoking behavior at a distance.

    To this end, we designed the BiFPN-Concat structure and introduced the weighted fusion strategy of BiFPN to achieve adaptive integration of features: for small targets, low-level detail features are strengthened to retain position information, and for large targets, high-level semantic features are emphasized to enhance category discrimination, significantly improving the multi-scale detection performance in complex scenarios. The core advantage of BiFPN lies in the bidirectional feature pyramid architecture and dynamic weighting mechanism; the bidirectional path supports the bidirectional flow of features between high and low layers, retaining the position details of low-level features and integrating the semantic information of high-level features; and the weighted strategy avoids the "equal treatment" defect of traditional fusion methods by learning the importance weights of feature maps, effectively reducing information redundancy and loss. Compared with unidirectional fusion structures such as FPN, BiFPN demonstrates stronger feature expression capabilities and detection efficiency in multi-scale target detection in chemical parks. Its dynamic weights are continuously optimized during training, enabling the model to adaptively adjust the fusion strategy according to the importance of the input features, providing an efficient solution for accurate detection in complex scenarios. The formula for adjusting the fusion weights is as follows:

    O=ΣiwiIi+Σjwj (21)

    where wi represents the learnable weights, with ReLU ensuring wi ≥ 0, Ii represents the features of the i-th layer, and a small constant is introduced to avoid numerical instability.

    By integrating the BiFPN structure into the neck network of the model and replacing the original Concat module, we not only improved the detection accuracy of the model but also enhanced its robustness in complex environments. The BiFPN-Concat structure is particularly suitable for target detection in the complex scene of the chemical industrial park, which can meet the needs of precise control of the feature fusion process, thereby further improving the practical application performance of the model.

    SPP (Spatial Pyramid Pooling), proposed by He et al. [25], is a pooling structure used for image processing and computer vision tasks. This structure can perform standard pooling on images of different sizes and ultimately combine them into feature vectors of the same size as the input to the fully connected layer. Considering that some targets in chemical industrial parks are small and require high accuracy in target detection networks, we adopted the SPPCSPC module to replace the original SPPF module in YOLOv8 to improve the model. The SPPCSPC module integrates the CSP (Cross Stage Partial) structure on the basis of the SPP module.

    In SPPCSPC, the overall input is divided into two different branches. The 3 × 3 convolution in the middle is not grouped and remains a standard convolution, while the right side uses pointwise convolution. Finally, the information streams output by all branches are concatenated. The SPPCSPC module provides significant improvements in target detection networks compared to the original SPP module and the SPPF module used in YOLOv8, particularly for smaller targets. The structure of the SPPF module is shown in Figure 5.

    Figure 5.  SPPF module.

    The SPPCSPC structure mainly consists of two substructures: The SPP structure and the CSPC (Cross Stage Partial Connections) structure. The main idea is to introduce cross-stage partial connections into the network to replace the traditional serial connection method in convolutional neural networks for feature propagation. This addresses the bottleneck problem in information transmission, improves feature propagation efficiency, and better utilizes information between low-level and high-level features. Adopting the SPPCSPC structure is beneficial for recognizing objects in chemical industrial parks, as the model can better extract features related to lighting, texture, and other target characteristics.

    To validate the effectiveness of this method, an experimental platform was established using Ubuntu 18.04 as the operating system and PyTorch as the deep learning framework. YOLOv8n was used as the baseline network model. The specific configuration of the experimental environment is shown in Table 1.

    Table 1.  Experimental environment configuration table.
    Environmental Parameter Value
    Operating system Ubuntu18.04
    Deep learning framework PyTorch
    programming language Python3.8
    CPU Intel Xeon Scale 8358
    GPU NVIDIA A100(SXM4, 80GB)
    RAM 256 GB

     | Show Table
    DownLoad: CSV

    Consistent hyperparameters were applied throughout the training process for all experiments. Table 2 shows the exact hyperparameters used during training.

    Table 2.  Training hyperparameters.
    Hyperparameters Value
    Learning Rate 0.01
    Image Size 640 × 640
    Momentum SGD
    Batch Size 16
    Epoch 200
    Weight Decay 0.0005

     | Show Table
    DownLoad: CSV

    We selected Precision (P), Recall (R), F1 Score, and mean Average Precision (mAP@0.5) as evaluation metrics to assess the effectiveness of the improved network. The calculation formulas are as follows:

    IoU=ABAB (22)
    Precision=TPTP+FP (23)
    Recall=TPTP+FN (24)
    AP=n1i=1(ri+1ri)Pinter(ri+1) (25)
    mAP=ki=1APik (26)
    F1Score=PrecisionRecallPrecision+Recall (27)

    In Eq (22), A represents the ground truth box, B represents the predicted box, AB represents the intersection area of A and B, and AB represents the union area of A and B. In Eqs. (23) and (24), TP denotes true positives, FP denotes false positives, and FN denotes false negatives. In Eq. (25), r1,r2,,rn are the recall values corresponding to the first interpolation segment sorted in ascending order of precision P. In Eq (26), k denotes the number of classes, which is 2 in this study. In Eq (27) is the F1 score formula.

    There are no publicly available datasets on the internet specifically describing workers in chemical industrial parks. Therefore, the dataset for this study was collected manually and from online sources, comprising 12,048 images. Sample images are shown in Figure 6. The data was split into training and testing sets in a ratio of 0.8 to 0.2, using the labeling tool "labelImg" for annotation. The annotated data includes image size, the location of objects to be detected within the images, and the types of behaviors to be recognized. The label types are divided into two categories: "smoke" and "hat".

    Figure 6.  Sample images.

    To evaluate the impact of different loss functions, we used the YOLOv8 model as a baseline and selected eleven loss functions, CIoU, SIoU, GIoU, DIoU, EIoU, InnerIoU, ShapeIoU, MDPIoU, NWD, WIoU, and PIoU, along with our newly developed APIoU for experimental comparison based on mAP50 and mAP50-95 metrics. Table 3 presents the experimental results. mAP50 is an important metric for assessing the performance of object detection models, reflecting the model's ability to accurately detect target objects. The experimental results show that the APIoU loss function significantly outperforms the other loss functions, achieving the highest mAP50 value of 0.84703. Notably, compared to the initial CIoU, the mAP50 value of APIoU improved by 1.2777%, and by approximately 0.7% compared to PIoU. This indicates that APIoU not only addresses the problem of anchor box area expansion, making the positioning and size of target bounding boxes more balanced by introducing the area ratio, but also adapts to the complex environments of chemical industrial parks, achieving broader detection performance. Especially when dealing with smaller, dim, or feature-blurred targets, APIoU can achieve precise boundary box regression, thereby obtaining accurate target location and size information. This further validates the effectiveness of the APIoU loss function.

    Table 3.  Comparison of loss functions.
    Loss Function mAP50 mAP50-95 smoke hat
    CIoU [26] 0.83426 0.52213 0.806 0.865
    SIoU [27] 0.83331 0.52313 0.802 0.863
    GIoU [28] 0.83124 0.52351 0.802 0.858
    DIoU [26] 0.83067 0.52584 0.797 0.865
    EIoU [29] 0.82232 0.52337 0.785 0.859
    InnerIoU [30] 0.83271 0.52227 0.803 0.861
    ShapeIoU [31] 0.83105 0.52379 0.807 0.855
    MDPIoU [32] 0.83427 0.52346 0.804 0.863
    NWD [33] 0.83689 0.52234 0.808 0.864
    WIoU [34] 0.84064 0.52526 0.810 0.869
    PIoU [35] 0.84066 0.52671 0.811 0.869
    APIoU 0.84703 0.53358 0.825 0.869

     | Show Table
    DownLoad: CSV

    We compared 23 different attention mechanisms, including TripletAttention, CBAM, SimAM, and others, as shown in Table 4. The results indicate that the performance of the RCPCA attention mechanism is significantly better than the other attention mechanisms in the experiments. The RCPCA achieved mAP50 and mAP50-95 values of 0.84343 and 0.53034, respectively, which are approximately 0.4% and 0.3% higher than those of CPCA. The smoke detection remained at 0.812, and the hat detection value was 0.864. This demonstrates that RCPCA performs better in complex environments of chemical industrial parks, particularly in detecting small and occluded objects and handling the need for background information.

    Table 4.  Comparison of attention mechanisms.
    Model mAP50 mAP50-95 Smoke hat
    TripletAttention [36] 0.83715 0.52758 0.812 0.862
    CBAM [37] 0.83864 0.52787 0.806 0.871
    SimAM [38] 0.83184 0.52682 0.804 0.860
    PolarizedAttention [39] 0.83837 0.52649 0.813 0.863
    BiLevelnchwAttention [40] 0.83887 0.53097 0.809 0.865
    BiLevelRoutingAttention [40] 0.83065 0.52524 0.799 0.860
    SpatialGroupEnhance [41] 0.83065 0.52463 0.801 0.859
    SpatialAttention [42] 0.84051 0.5285 0.812 0.869
    FocalModulation [43] 0.83374 0.52634 0.812 0.854
    MLCA [44] 0.8399 0.52907 0.812 0.867
    LSKblock [45] 0.83585 0.53055 0.803 0.867
    deformableLKA [46] 0.83893 0.53302 0.807 0.869
    SKAttention [47] 0.83816 0.52944 0.809 0.865
    SEAttention [48] 0.83828 0.52597 0.813 0.860
    ParNetAttention [49] 0.83816 0.52906 0.809 0.862
    MHSA [42] 0.8371 0.52793 0.810 0.862
    EfficientChannelAttention [50] 0.8362 0.52864 0.809 0.863
    DoubleAttention [51] 0.63624 0.38685 0.520 0.749
    CoTAttention [52] 0.8361 0.52482 0.808 0.863
    EffectiveSEModule [53] 0.83551 0.52637 0.807 0.863
    DAttention [54] 0.83733 0.52683 0.808 0.866
    EMA [55] 0.83778 0.52699 0.807 0.867
    CPCA [23] 0.83861 0.52747 0.812 0.864
    RCPCA 0.84343 0.53034 0.824 0.863

     | Show Table
    DownLoad: CSV

    The data shows that RCPCA, due to its unique MSFU, feature fusion with residual connections, and CPCA mechanism, significantly outperforms other competing attention models in handling image tasks in chemical industrial parks. This validates the effectiveness and optimization of the proposed RCPCA attention mechanism.

    To validate the performance of the YOLOv8-ARR model in recognizing personnel safety behaviors in complex scenarios, we compared it with several mainstream object detection models. The experiment used a self-built dataset to evaluate the performance of each model in the complex environment of a chemical industrial park. As shown in Table 5, YOLOv8-ARR performed outstandingly in key metrics such as mAP50, smoke detection accuracy, and hat detection accuracy, especially showing a clear advantage in personnel safety behavior recognition tasks.

    Table 5.  Comparison of different models.
    Model FLOPs/G Params/M mAP50 smoke hat
    Faster RCNN 13.1 41.3 0.6687 0.7627 0.5747
    YOLOv3-tiny [56] 13.0 8.7 0.7907 0.7690 0.8120
    YOLOv5n 4.2 1.8 0.8224 0.7920 0.8520
    YOLOv6 [57] 11.9 4.2 0.8221 0.7990 0.8520
    YOLOv7 [22] 103.2 37.2 0.7729 0.6890 0.8568
    YOLOX [21] 15.4 10.6 0.8344 0.7852 0.8837
    DETR [58] 187.0 41.0 0.5508 0.3948 0.7068
    SSD [3] 87.6 26.3 0.7389 0.6770 0.8007
    CenterNet [59] 70.2 32.6 0.8243 0.7830 0.8656
    YOLOv8n 8.1 3.0 0.8345 0.8060 0.8630
    YOLOv8-ARR 92.0 16.1 0.8890 0.8620 0.9160

     | Show Table
    DownLoad: CSV

    YOLOv8-ARR achieved an mAP50 of 0.8890, a 5.45% improvement over YOLOv8n (0.8345). Compared to other models, YOLOv8-ARR demonstrated significantly higher detection accuracy than YOLOv3-tiny (0.7907) and YOLOv5n (0.8224), further confirming its precision advantage in complex scenarios. In smoke detection, YOLOv8-ARR achieved an accuracy of 0.8620, improving by 5.6% over YOLOv8n (0.8060) and performing significantly better than YOLOv3-tiny (0.7690) and YOLOv5n (0.7920), highlighting its exceptional performance in complex environments. In hat detection, YOLOv8-ARR achieved an accuracy of 0.9160, a 5.3% improvement over YOLOv8n (0.8630), surpassing YOLOv3-tiny (0.8120) and YOLOv5n (0.8520), demonstrating its clear advantage in head protection detection.

    Although YOLOv8-ARR has higher FLOPs (92.0 G) and Params (16.1 M), this increase is justified by the improvement in accuracy. YOLOv8-ARR incorporates several enhanced modules, such as the APIoU loss function and RCPCA attention mechanism, which significantly enhance its feature extraction capability. Therefore, the increase in FLOPs and Params is necessary to support higher accuracy detection, and this increase is essential for the performance improvement.

    Overall, YOLOv8-ARR's significant improvement in smoke and hat detection further validates its potential for application in complex environments. It demonstrates outstanding robustness and precise recognition capabilities, especially valuable for personnel safety behavior detection in high-risk environments. As shown in Figure 7, we can see the comparative metrics of each model. Therefore, it is evident that the YOLOv8-ARR model demonstrates superior recognition performance in identifying safety behaviors of personnel in complex scenes within chemical industrial parks compared to other models.

    Figure 7.  Detection results of different models.

    The proposed YOLOv8-ARR model is an optimized version of YOLOv8n, achieved by improving the loss function with APIoU, enhancing the attention mechanism with RCPCA, incorporating BiFPN_Concat, and introducing RFAConv and SPPCSPC. To evaluate the performance of each optimization module, an ablation study was conducted using a variable control method, with training and testing performed on the same dataset and training parameters. The results are shown in Table 6.

    Table 6.  Ablation experiment results.
    Model YOLOv8 APIoU RC
    PCA
    RFA
    Conv
    BiFPN
    _Concat
    SPPC
    SPC
    P2 Recall Precious mAP50 mAP50-95 Smoke Hat
    1 0.781 0.863 0.834 0.522 0.806 0.863
    2 0.782 0.888 0.847 0.533 0.825 0.868
    3 0.788 0.895 0.853 0.541 0.833 0.874
    4 0.788 0.903 0.857 0.543 0.833 0.886
    5 0.799 0.889 0.859 0.546 0.837 0.881
    6 0.820 0.902 0.885 0.568 0.855 0.916
    Our 0.820 0.903 0.889 0.570 0.862 0.916

     | Show Table
    DownLoad: CSV

    The results indicate that using the improved APIoU loss function significantly enhances detection performance, with mAP50 and mAP50-95 increasing by 1.3% and 1.1%, respectively, and other metrics also showing slight improvements. To ensure the model pays more attention to the background information required for object detection in chemical industrial parks, the improved RCPCA attention mechanism was introduced, resulting in a 0.6% improvement in mAP50, mAP50-95, and helmet detection, with a notable 0.8% improvement for small smoking targets. Both Recall and Precision show significant improvements.

    To emphasize important detail information during the detection process in chemical industrial parks, RFAConv was introduced. This module uses a receptive field weight matrix to assign different weights to each receptive field position and feature channel, reducing information loss and more accurately locating targets, providing reliable safety support. The mAP50 and mAP50-95 increase by 0.4% and 0.2%, respectively, with a notable improvement of 1.2% for helmet targets.

    To address the challenges of diverse scale variations in the complex environment of chemical industrial parks, BiFPN_Concat was introduced. This advanced feature fusion and transformation mechanism is better suited for detection applications in such environments. Smoke detection accuracy improves by 0.2%, with other metrics remaining stable, laying a foundation for future improvements.

    Considering the small size of some targets and the high precision requirements of the target detection network in chemical industrial parks, the SPPCSPC structure was adopted. This structure enhances the recognition of objects in these environments, with improved extraction of features such as lighting and texture. Both mAP50 and mAP50-95 increased by 0.6%, and smoke and hat detection improve by 1.2% and 1.7%, respectively. Precision increases by 1.8%. Finally, the introduction of a small target layer (p2) significantly improves the detection rate of small smoking targets, reaching 0.862, an increase of 0.7%.

    To effectively demonstrate the improvements of our model in complex environments of chemical industrial parks, we tested different scenarios within these environments. As shown in Figure 8. In the tests for group A, we can see that our proposed YOLOv8-ARR significantly improves the accuracy of detecting four helmets compared to YOLOv8n. The results were similarly positive for group B, where YOLOv8-ARR provides more stable detections. In group C, YOLOv8-ARR not only maintains high precision but also resolves false positive issues. For further validation, we conducted frame-by-frame detection using high-definition camera footage. In Figure D, we observe a significant improvement in the accuracy of detecting helmets and smoking. In Figure E, YOLOv8n produces a false positive for a person, which YOLOv8-ARR successfully resolves. In Figure F, YOLOv8n shows issues with missed detections and instability, whereas YOLOv8-ARR does not exhibit these problems. In summary, our algorithm can accurately classify and precisely locate personnel safety behaviors in chemical industrial parks. It meets the precision requirements of industrial equipment, reduces false and missed detections, significantly decreases the workload of manual inspections, and improves on-site management efficiency.

    Figure 8.  Comparison of model ablation experiment results.

    To comprehensively evaluate the generalization ability and robustness of the proposed YOLOv8-ARR model, we conducted additional validation experiments on widely recognized public benchmark datasets, Pascal VOC and VisDrone. The VOC dataset covers various object categories and real-world scenes, making it an ideal choice for assessing general object detection performance. In contrast, the VisDrone dataset contains aerial images with small objects, dense distributions, and complex backgrounds, presenting significant detection challenges. The experimental results are summarized in Table 7, demonstrating the outstanding detection accuracy and robustness of YOLOv8-ARR across scenarios.

    Table 7.  Model comparison study on public datasets.
    Datasets Model Precision Recall mAP50 mAP50-95 F1
    VisDrone2019 YOLOv8n 0.45032 0.33466 0.33459 0.19337 0.38397
    YOLOv8-ARR 0.47041
    (+2.01%)
    0.34023
    (+0.56%)
    0.47041
    (+13.58%)
    0.29023
    (9.69%)
    0.39487
    (1.09%)
    VOC2007 YOLOv8n 0.75618 0.62236 0.69885 0.48257 0.68277
    YOLOv8-ARR 0.76342
    (+0.72%)
    0.66270
    (+4.03%)
    0.73531
    (+3.78%)
    0.48257
    (+0.22%)
    0.70950
    (2.67%)
    VOC2012 YOLOv10n 0.62079 0.51201 0.57222 0.41163 0.56118
    YOLOv8-ARR 0.63006
    (+0.38%)
    0.56085
    (+4.88%)
    0.59054
    (+1.83%)
    0.41163
    (+1.49%)
    0.59344
    (3.23%)

     | Show Table
    DownLoad: CSV

    On the VisDrone2019 dataset, characterized by complex object scales, orientations, and significant occlusions, YOLOv8-ARR significantly outperforms the baseline model YOLOv8n. Specifically, precision increases from 0.45032 to 0.47041 (+2.01%), showing an enhanced ability to correctly identify positive samples and reduce false positives. Recall improves by 0.56% (from 0.33466 to 0.34023), highlighting better performance in capturing true positive samples. mAP@50 increases significantly from 0.33459 to 0.47041 (+13.58%), demonstrating substantial improvement in both object localization and classification in challenging scenarios. mAP@50-95 also shows improvement, rising from 0.19337 to 0.29023 (+9.69%), further confirming enhanced robustness in detecting overlapping objects. The F1 score increases from 0.38397 to 0.39487 (+1.09%), indicating a better balance between precision and recall.

    On the VOC2007 dataset, YOLOv8-ARR maintains its competitive edge, with precision rising to 0.76342 (+0.72%), effectively reducing the false detection rate. Recall increases by 4.03% (from 0.62236 to 0.66270), further emphasizing the model's reliability in detecting actual objects. mAP@50 increases from 0.69885 to 0.73531 (+3.78%), reflecting stronger object localization accuracy. mAP@50-95 improves slightly from 0.48257 to 0.48257 (+0.22%), further proving the model's precision in various scenarios. The F1 score increases from 0.68277 to 0.70950 (+2.67%), further demonstrating optimization in the balance between precision and recall.

    For the VOC2012 dataset, YOLOv8-ARR shows more significant improvement compared to YOLOv8n. Precision increases from 0.62079 to 0.63006 (+0.38%), and recall rises by 4.88% (from 0.51201 to 0.56085). mAP@50 increases from 0.57222 to 0.59054 (+1.83%), clearly highlighting substantial progress in classification and object localization performance. mAP@50-95 improves from 0.41163 to 0.41163 (+1.49%), reflecting the model's ability to maintain high-quality detection performance across a wider range of IoU thresholds. The F1 score rises from 0.56118 to 0.59344 (+3.23%), further indicating better optimization of the trade-off between precision and recall.

    Overall, the experimental results strongly demonstrate that YOLOv8-ARR significantly improves detection accuracy in complex scenarios involving diverse scales, orientations, and occlusions, making it a robust and reliable model for practical deployment in various application environments.

    To better demonstrate the outstanding performance of the YOLOv8-ARR model in smoking detection and safety helmet detection, we conducted a detailed visualization analysis of the model's detection results using Grad-CAM. The Grad-CAM images in Figure 9 clearly show how the model focuses on key regions in the image during detection. Through this image analysis, we can gain a more intuitive understanding of how the model locates and recognizes targets in real-world tasks.

    Figure 9.  Grad-CAM visualization results.

    In Group (a), the YOLOv8-ARR model successfully identified the smoke region, and compared to the YOLOv8n model, its confidence increased by 3%. The Grad-CAM image visually demonstrates how the YOLOv8-ARR model concentrates on the smoke source area, further proving its significant advantage in smoking detection tasks.

    In Group (b), we show the model's performance in detecting multiple safety helmets. For the first helmet on the left, the confidence of the YOLOv8-ARR model increased by 4% compared to YOLOv8n; for the first helmet on the right, the confidence increased by 1%. The Grad-CAM image clearly highlights the model's focus on the head region, further emphasizing the superior performance of YOLOv8-ARR in multi-target detection and its efficient detection capability in complex environments.

    The exceptional performance of YOLOv8-ARR not only offers significant advantages in improving detection accuracy but also demonstrates its robustness in complex environments and multi-target detection capabilities, making it highly valuable in real-world applications. Particularly in high-risk environments, such as personnel safety monitoring systems, YOLOv8-ARR provides efficient and real-time detection services, greatly enhancing security and reliability.

    In this paper, we propose a new algorithm for detecting personnel safety behaviors in the complex environments of chemical industrial parks, based on YOLOv8-ARR. The goal of this algorithm is to accurately classify and precisely locate safety behaviors, meeting the precision requirements of industrial equipment while reducing false positives and false negatives. This can significantly reduce the workload of manual inspections and improve on-site management efficiency. YOLOv8-ARR is optimized based on YOLOv8n. First, the algorithm introduces the enhanced Intersection over Union (APIoU) as an optimization method for bounding box regression loss, effectively balancing the gradient gain between high-quality and low-quality samples, thereby improving the accuracy of object localization. Second, a Reinforced Channel-Priority Contextual Attention (RCPCA) mechanism is proposed to improve the extraction of background information, enhancing the model's robustness in complex backgrounds. Next, the RFAConv convolution layer replaces traditional convolution layers, assigning different weights to each receptive field location and feature channel, which highlights critical details and enhances feature extraction ability. Then, the Bidirectional Feature Pyramid Network (BiFPN) is employed to perform weighted fusion of multi-scale feature maps, further strengthening the model's ability to handle multi-scale objects. Finally, a small object detection layer is added, significantly improving the detection accuracy of small objects. Experimental results show that the YOLOv8-ARR model, on a custom chemical industrial park personnel dataset, improves the mean average precision (mAP@0.5) by 5.475% compared to the original YOLOv8n model, significantly increasing accuracy and effectively reducing false positives and false negatives. However, the model faces challenges in cases of extremely small targets or severe occlusion. Future research will introduce super-resolution technology, and integrate image segmentation and multi-view fusion methods to further enhance the model's robustness in complex environments. Future research will incorporate super-resolution technology and combine image segmentation with multi-view fusion methods to improve robustness in complex environments. Additionally, the focus will be on real-time deployment of the model, optimizing the graphical user interface (GUI), and developing mobile applications to enable on-the-go monitoring of personnel safety behaviors in chemical industrial parks. Last, infrared cameras will be used in conjunction with image enhancement algorithms for preprocessing, and spatiotemporal fusion methods will be employed to further enhance real-time detection capabilities in complex environments.

    Zhong Wang: Writing – review & editing, Writing –original draft, Visualization, Methodology, Investigation, Formal analysis, Conceptualization; Lanfang Lei: Writing – review & editing, Writing – original draft, Supervision, Methodology, Formal analysis, Conceptualization; Tong Li: Methodology, Writing – review & editing, Investigation, Formal analysis; Peibei Shi: Validation, Supervision, Methodology.

    The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

    This work was supported by the National Natural Science Foundation of China (Grant No. 61976198), and the Natural Science Research Key Project for Colleges and University of Anhui Province (Grant No. 2022AH052141 and 2022AH052142), and the 2023 Humanities and Social Science General Program sponsored by the Ministry of Education of the People's Republic of China (Grant No. 23YJCZH067), and the Hefei Municipal Natural Science Foundation (Grant No. 202322).

    The authors declare there is no conflict of interest in this paper.



    [1] Z. C. Qin, Research on intelligent selection algorithm of ship logistics optimal route, Ship Sci. Technol., 46 (2018), 75–86.
    [2] X. L. Zhao, L. W. Zhong, Optimization simulation and transport path analysis of intelligent warehouse based on flexsim, Software Guide, 21 (2018), 55–73.
    [3] M. Luo, X. R. Hou, J. Yang, Surface optimal path planning using an extended Dijkstra algorithm, IEEE Access, 8 (2020), 147827–147838. https://doi.org/10.1109/access.2020.3015976 doi: 10.1109/access.2020.3015976
    [4] Y. L. Zhou, N. N. Huang, Airport AGV path optimization model based on ant colony algorithm to optimize Dijkstra algorithm in urban systems, Sustainable Comput. Inf. Syst., 35 (2022), 100716. https://doi.org/10.1016/j.suscom.2022.100716 doi: 10.1016/j.suscom.2022.100716
    [5] Z. B. He, C. G. Liu, X. M. Chu, R. R. Negenborn, Q. Wu, Dynamic anti-collision A-star algorithm for multi-ship encounter situations, Appl. Ocean Res., 118 (2022), 102995. https://doi.org/10.1016/j.apor.2021.102995 doi: 10.1016/j.apor.2021.102995
    [6] Y. W. Gu, Z. T. Zhu, J. D. Lv, L. Shi, Z. J. Hou, S. K. Xu, DM-DQN: Dueling Munchausen deep Q network for robot path planning, Complex Intell. Syst., 9 (2023), 1–14. https://doi.org/10.1007/s40747-022-00948-7 doi: 10.1007/s40747-022-00948-7
    [7] C. W. Miao, G. Z. Chen, C. L. Yan, Y. Y. Wu, Path planning optimization of indoor mobile robot based on adaptive ant colony algorithm, Comput. Ind. Eng., 156 (2021), 107230. https://doi.org/10.1016/j.cie.2021.107230 doi: 10.1016/j.cie.2021.107230
    [8] S. Y. Guo, X. G. Zhang, Y. Q. Du, Y. S. Zheng, Z. Y. Cao, Path planning of coastal ships based on optimized DQN reward function, J. Mar. Sci. Eng., 9 (2021), 210. https://doi.org/10.3390/jmse9020210 doi: 10.3390/jmse9020210
    [9] T. Wang, Z. L. Xue, X. Q. Dong, S. L. Xie, Autonomous intelligent planning method for welding path of complex ship components, Robotica, 39 (2021), 428–437. https://doi.org/10.1017/s0263574720000454 doi: 10.1017/s0263574720000454
    [10] Z. H. Han, S. G. Liu, F. Yu, X. D. Zhang, G. X. Zhang, A 3D measuring path planning strategy for intelligent CMMs based on an improved ant colony algorithm, Int. J. Adv. Manuf. Technol., 93 (2017), 1487–1497. https://doi.org/10.1007/s00170-017-0503-y doi: 10.1007/s00170-017-0503-y
    [11] N. Saito, T. Oda, A. Hirata, Y. Nagai, M. Hirota, K. Katayama, et al., A Tabu list strategy based DQN for AAV mobility in indoor single-path environment: implementation and performance evaluation, Internet Things, 14 (2021), 100394. https://doi.org/10.1016/j.iot.2021.100394
    [12] W. J. Meng, Q. Zheng, L. Yang, P. F. Li, G. Pan, Qualitative measurements of policy discrepancy for return-based deep q-network, IEEE Trans. Neural Networks Learn. Syst., 31 (2019), 4374–4380. https://doi.org/10.1109/tnnls.2019.2948892 doi: 10.1109/tnnls.2019.2948892
    [13] S. BiBi, M. Y. Misro, M. Abbas, Smooth path planning via cubic GHT-Bézier spiral curves based on shortest distance, bending energy and curvature variation energy, AIMS Math., 6 (2021), 8625–8641. https://doi.org/10.3934/math.2021501 doi: 10.3934/math.2021501
    [14] Z. Durakli, V. Nabiyev, A new approach based on Bezier curves to solve path planning problems for mobile robots, J. Comput. Sci., 58 (2022), 101540. https://doi.org/10.1016/j.jocs.2021.101540 doi: 10.1016/j.jocs.2021.101540
    [15] B. Y. Song, Z. D. Wang, L. Zou, L. Xu, F. E. Alsaadi, A new approach to smooth global path planning of mobile robots with kinematic constraints, Int. J. Mach. Learn. Cybern., 10 (2019), 107–119. https://doi.org/10.1007/s13042-017-0703-7 doi: 10.1007/s13042-017-0703-7
    [16] X. Li, L. Wang, Application of improved ant colony optimization in mobile robot trajectory planning, Math. Biosci. Eng., 17 (2020), 6756–6774. https://doi.org/10.3934/mbe.2020352 doi: 10.3934/mbe.2020352
    [17] K. Fransen, J. van Eekelen, Efficient path planning for automated guided vehicles using A* (Astar) algorithm incorporating turning costs in search heuristic, Int. J. Prod. Res., 61 (2023), 707–725. https://doi.org/10.1080/00207543.2021.2015806
    [18] B. Wu, X. N. Chi, C. C. Zhao, W. Zhang, Y. Lu, D. Jiang, Dynamic path planning for forklift AGV based on smoothing A* and improved DWA hybrid algorithm, Sensors, 22 (2022), 7079. https://doi.org/10.3390/s22187079 doi: 10.3390/s22187079
    [19] P. E. Hart, N. J. Nilsson, B. Raphael, A formal basis for the heuristic determination of minimum cost paths, IEEE Trans. Syst. Sci. Cybern., 4 (1968), 100–107. https://doi.org/10.1109/tssc.1968.300136 doi: 10.1109/tssc.1968.300136
    [20] A. K. Guruji, H. Agarwal, D. K. Parsediya, Time-efficient A* algorithm for robot path planning, Procedia Technol., 23 (2016), 144–149. https://doi.org/10.1016/j.protcy.2016.03.010 doi: 10.1016/j.protcy.2016.03.010
    [21] X. D. Wang, H. W. Zhang, S. Liu, J. L.Wang, Y. H. Wang, D. H. Shangguan, Path planning of scenic spots based on improved A* algorithm, Sci. Rep., 12 (2022), 1320. https://doi.org/10.1038/s41598-022-05386-6 doi: 10.1038/s41598-022-05386-6
    [22] X. L. Tong, S. E. Yu, G. Y. Liu, X. D. Niu, C. J. Xia, J. K. Chen, A hybrid formation path planning based on A* and multi-target improved artificial potential field algorithm in the 2D random environments, Adv. Eng. Inf., 54(2022), 101755. https://doi.org/10.1016/j.aei.2022.101755 doi: 10.1016/j.aei.2022.101755
    [23] C. G. Li, X. Huang, J. Ding, K. Song, S. Q. Lu, Global path planning based on a bidirectional alternating search A* algorithm for mobile robots, Comput. Ind. Eng., 168 (2022), 108123. https://doi.org/10.1016/j.cie.2022.108123 doi: 10.1016/j.cie.2022.108123
    [24] C. W. Miao, G. Z. Chen, C. L. Yan, Y. Y. Wu, Path planning optimization of indoor mobile robot based on adaptive ant colony algorithm, Comput. Ind. Eng., 156 (2021), 107230. https://doi.org/10.1016/j.cie.2021.107230 doi: 10.1016/j.cie.2021.107230
    [25] H. D. Li, T. Zhao, S. Dian, Forward search optimization and subgoal-based hybrid path planning to shorten and smooth global path for mobile robots, Knowl.-Based Syst., 258 (2022), 110034. https://doi.org/10.1016/j.knosys.2022.110034 doi: 10.1016/j.knosys.2022.110034
  • Reader Comments
  • © 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(2505) PDF downloads(237) Cited by(1)

Figures and Tables

Figures(23)  /  Tables(6)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog