
Citation: David Melkuev, Danqiao Guo, Tony S. Wirjanto. Applications of random-matrix theory and nonparametric change-point analysis to three notable systemic crises[J]. Quantitative Finance and Economics, 2018, 2(2): 413-467. doi: 10.3934/QFE.2018.2.413
[1] | Wenjie Liang . Research on a vehicle and pedestrian detection algorithm based on improved attention and feature fusion. Mathematical Biosciences and Engineering, 2024, 21(4): 5782-5802. doi: 10.3934/mbe.2024255 |
[2] | Chen Chen, Guowu Yuan, Hao Zhou, Yi Ma . Improved YOLOv5s model for key components detection of power transmission lines. Mathematical Biosciences and Engineering, 2023, 20(5): 7738-7760. doi: 10.3934/mbe.2023334 |
[3] | Miaolong Cao, Hao Fu, Jiayi Zhu, Chenggang Cai . Lightweight tea bud recognition network integrating GhostNet and YOLOv5. Mathematical Biosciences and Engineering, 2022, 19(12): 12897-12914. doi: 10.3934/mbe.2022602 |
[4] | Hui Yao, Yuhan Wu, Shuo Liu, Yanhao Liu, Hua Xie . A pavement crack synthesis method based on conditional generative adversarial networks. Mathematical Biosciences and Engineering, 2024, 21(1): 903-923. doi: 10.3934/mbe.2024038 |
[5] | Zheng Zhang, Xiang Lu, Shouqi Cao . An efficient detection model based on improved YOLOv5s for abnormal surface features of fish. Mathematical Biosciences and Engineering, 2024, 21(2): 1765-1790. doi: 10.3934/mbe.2024076 |
[6] | Xian Fu, Xiao Yang, Ningning Zhang, RuoGu Zhang, Zhuzhu Zhang, Aoqun Jin, Ruiwen Ye, Huiling Zhang . Bearing surface defect detection based on improved convolutional neural network. Mathematical Biosciences and Engineering, 2023, 20(7): 12341-12359. doi: 10.3934/mbe.2023549 |
[7] | Siyu Chen, Yin Zhang, Yuhang Zhang, Jiajia Yu, Yanxiang Zhu . Embedded system for road damage detection by deep convolutional neural network. Mathematical Biosciences and Engineering, 2019, 16(6): 7982-7994. doi: 10.3934/mbe.2019402 |
[8] | Yiwen Jiang . Surface defect detection of steel based on improved YOLOv5 algorithm. Mathematical Biosciences and Engineering, 2023, 20(11): 19858-19870. doi: 10.3934/mbe.2023879 |
[9] | Mingju Chen, Zhongxiao Lan, Zhengxu Duan, Sihang Yi, Qin Su . HDS-YOLOv5: An improved safety harness hook detection algorithm based on YOLOv5s. Mathematical Biosciences and Engineering, 2023, 20(8): 15476-15495. doi: 10.3934/mbe.2023691 |
[10] | Xing Hu, Minghui Yao, Dawei Zhang . Road crack segmentation using an attention residual U-Net with generative adversarial learning. Mathematical Biosciences and Engineering, 2021, 18(6): 9669-9684. doi: 10.3934/mbe.2021473 |
In recent years, China has maintained a relatively stable development trend in its highway transportation, with significant improvements in its level of development. As of the end of 2022, the total length of highways in China has approached 5.2 million kilometers, and the mileage of the expressway network has reached 161,000 kilometers [1]. Given the swift advancement of highway construction in China, the country has now entered the phase of highway maintenance [2]. Therefore, it is crucial to timely detect and repair issues on the road surface. Road surface cracks are the most common type of pavement distress. If they can be detected promptly in their early stages and repaired, it not only effectively prevents them from evolving into more severe pavement issues but also extends the lifespan of expressways. Hence, a fast, convenient, and safe road surface crack detection method holds significant importance for road maintenance.
There are many problems in the traditional road crack detection algorithm. For instance, the road mileage is often too extensive, leading to high human resource costs. Human detection involves complex human factors and is not conducive to objectively evaluating road defect detection accuracy, which cannot be guaranteed. Additionally, the traditional manual road detection method is adverse to inspector safety. In recent years, with the rapid development of various new technologies, including computers, target detection, GPS (global position system), digital CCD (charge-coupled device), etc. [3,4,5], computer vision based on deep learning has gained wide acceptance and application in our daily lives. These issues can be addressed by adopting deep learning object detection algorithms. The YOLO (you only look once) series of algorithms is a neural network algorithm used for real-time object detection. Unlike traditional two-stage object detection methods, the YOLO series is a one-stage detector [6]. It directly predicts the position and category of targets through a single feed forward neural network without the need for candidate box generation and filtering steps, resulting in higher detection accuracy and faster inference speed. Currently, there is a substantial amount of research conducted both domestically and internationally on road surface crack detection and recognition using YOLO algorithms.
Researchers such as Park [7] have established a network model that combines segmentation and detection. During the segmentation process, only a part of the road surface is extracted, and road surface damage is detected based on that portion, which improves accuracy but reduces detection efficiency. X. Su [8] used MobileNetv2 as the backbone network for YOLOv4 and replaced conventional convolutions with depth-wise separable convolutions. Furthermore, the backbone and neck components of the original model also embedded coordinate attention processes and spatial attention mechanisms. These attention mechanisms significantly enhance the detection accuracy and speed for road surface cracks, but the model's final mean average precision (mAP) value is relatively low. M. Wang [9] proposed a method that replaces the GIoU (generalized intersection over union) loss function with EIoU (exponential intersection over union), resolving the issue of large GIoU errors while improving convergence speed and regression accuracy. However, the improved model's inference speed has decreased compared to the original model. A comprehensive analysis of the development from YOLOv1 to YOLOv8 is presented by J. R. Terven et al. [10]. The authors conclude that starting from YOLOv5 and moving onward, all official YOLO models have been fine-tuned to strike a balance between speed and accuracy, aiming to better adapt to specific applications and hardware requirements [11]. Classic YOLOv5 employs a simple convolutional neural network (CNN) architecture, while the latest YOLOv8 employs a more complex network structure comprising multiple residual units and branches [12]. Consequently, YOLOv5's detection accuracy is not on par with that of YOLOv8 when processing road crack images. Although YOLOv8 enhances the model's structure and training effectiveness, it sacrifices detection speed and has a larger parameter count. Therefore, YOLOv5 has been selected for improvement.
These aforementioned detection models each have their advantages and disadvantages, and cannot achieve a balance between detection accuracy and speed, limiting their practical application in engineering [8]. To achieve efficient and accurate road surface crack recognition, this paper introduces a road surface crack detection model that combines attention mechanisms. The main contributions of this paper can be summarized as follows:
1) Based on the Res2Net (a new multi-scale backbone architecture) network, an improved multi-scale Res2-C3 (a new multi-scale backbone architecture of C3) module is suggested to enhance the feature extraction performance.
2) The feature fusion network and backbone of YOLOv5 are combined with the GAM (global attention mechanism) attention mechanism to enhance the model's ability to perceive fracture information.
3) Integrating dynamic serpentine convolution into the feature fusion network, the improved network enhances the model's ability to address irregular shape and deformation problems, which is beneficial for improving the accuracy of road crack identification.
YOLOv5s is a variant of the YOLOv5 series, which is an object detection deep learning algorithm. Compared to other versions of YOLOv5, YOLOv5s is a lightweight model that preserves strong detection performance while decreasing the model's size and computational complexity. The YOLOv5s algorithm network structure consists of the head, neck, and backbone. CSPDarknet53 (cross stage paritial network) serves as the backbone network for YOLOv5s and it effectively extracts image features. Using feature maps with multi-scale information created by combining feature maps from various levels, the neck network integrates these feature maps with the features produced by the backbone network to increase object detection accuracy [13]. The head network is responsible for the final detection steps, constructing a neural network that determines the bounding box positions and recognition types, forming the ultimate output vector 2. Figure 1 represents the improved YOLOv5s network structure.
As seen in Figure 2, the C3 module, which consists of two parallel branches, is the central component of the backbone network. After going through one of the branches' Conv modules, the input feature map is stacked with n bottleneck modules to extract high-level semantic information [14]. The output of the other branch, after passing through a Conv layer, is concatenated with the output of the first branch. Subsequently, feature fusion is performed through another Conv layer before the final output.
The C3 module of YOLOv5s mainly leverages the idea of extracting diversion using the CSPNet [15] and combines the concept of residual structures. It is designed with C3Block, and the bottleNeck module is the CSPNet main branch gradient module. The number of stacked modules is influenced by the parameter 'n', and the value of 'n' changes with the model's size. The bottleneck module is integrated multiple times within the C3 module, to capture higher-level semantic information from the image.
Drawing inspiration from the design concepts of VGG (visual geometry group) [16], GoogleNet [17], and CSPNet [15], the C3 module processes input information through two branches. One branch stacks n bottleneck structures to extract high-level semantic features, while the other branch maintains the original image features through a shortcut connection of the Conv module. Finally, two parallel convolution branches are used to merge the image features, enhancing the feature information within the image. Therefore, the core of feature extraction lies in the design of the bottleneck structure.
In order to improve the model's capacity to extract feature information [18] and acquire richer feature information, deeper network architectures or a higher number of convolutional kernels are frequently used. A Res2-Bottleneck module is proposed by merging the bottleneck structure with the Res2Net module in order to improve the model's feature information extraction capability [19]. The primary idea is to decompose the feature maps from the 3 × 3 convolution layer in the original residual convolution, which receives feature maps from the 1 × 1 convolution layer of the input, into four parts [20]. The first part remains unchanged, the second part passes through a 3 × 3 convolution layer, the third part adds its features to those of the second part before passing through another 3 × 3 convolution layer, and the fourth part adds its features to those of the third part before passing through another 3 × 3 convolution layer. The parallel technique improves the model's capacity to extract characteristics across several scales. In the end, the feature maps from these four parts are concatenated to form feature maps with the same number of layers as the input layer, and are then sent to the output layer for 1 × 1 convolution to perform feature fusion. This structure is referred to as the enhanced multi-scale bottleneck (Res2-Bottleneck), as shown in Figure 3. The Res2-Bottleneck not only increases the multi-scale feature information, but also maximally preserves the original feature information through the residual structure, reducing the loss of shallow features. As a result, more information about pavement cracks may be stored, which is highly advantageous for raising the model's detection accuracy.
In the original C3 module, one branch performs feature extraction by stacking multiple bottleneck modules. To improve the feature extraction performance, the original bottleneck structure is replaced with the improved Res2-Bottleneck structure, resulting in the improved Res2-C3 module, as shown in Figure 4.
The C3 module is the heart of the YOLOv5 backbone network, and the bottleneck structure's design is essential to the C3 module. Building upon the bottleneck structure, the multi-scale extraction module C3-RFEM (receptive field enhancement module) is proposed based on the RFE (receptive field enhancement) module [21]. The main principle of the RFE module is to use four different scale expansion convolution branches to capture multi-scale information and different receptive ranges. These branches share weights, with the only difference being their receptive fields. This approach reduces model parameters and potential overfitting risks. Additionally, it allows for operations of different sizes, making full use of each feature's information. The RFE module can be divided into two parts: one is the multi-branch based on expansion convolution, and the other is the weighted layer, as shown in Figure 5. The multi-branch part uses different expansion convolutions with rates 1, 2, and 3; all these convolutions, however, employ a fixed 3 × 3 convolution kernel size. Residual connections are employed to prevent gradient explosion and vanishing during training. This structure can improve the model's detection accuracy and lessen feature loss during feature extraction.
Replacing the original bottleneck module with the RFE module results in C3RFEM, as illustrated in Figure 6. To ensure that the improved model exhibits better performance, comparative experiments are conducted between the C3RFEM module, which extracts multi-scale feature information based on the RFE module and the Res2-C3 module. This comparison aims to select a model with higher accuracy and faster speed.
The attention mechanism enables the model to selectively concentrate more on target information [22]. Separating the relevance of several channels and using it as a focus is the fundamental component of channel attention, thus weakening the role of uninterested channels. Hybrid attention combines channel attention with spatial attention, with these two parts being consecutive or parallel, forming an attention model for channel features and spatial features.
The SE (squeeze and excitation) [23] attention module is a channel attention module that enhances channel features in input feature maps. However, the SE attention mechanism neglects spatial information, failing to comprehensively extract the feature map information. CBAM (convolutional block attention module) [24] is a spatial attention mechanism that effectively overcomes the shortcomings of SE by utilizing channel information while considering spatial information. However, the CBAM attention mechanism loses cross-dimensional information by ignoring the relationship between channels and space. Recognizing the significance of interactions across dimensions, the GAM [25] attention mechanism is employed, which can lessen information dispersion and enhance global dimensions interaction features.
Though their approaches to channel attention and spatial attention are different, overall, the GAM and CBAM attention mechanisms are comparable. Figure 7 shows the full process, with Mc and Ms standing for channel attention maps and spatial attention maps, respectively.
Three-dimensional arrangements are used by the channel attention sub-module (CAM) to store information in three dimensions. Subsequently, a two-layer multilayer perceptron (MLP) is utilized to enhance the cross-dimensional interdependence across spatial channels. The channel attention sub-module is shown in Figure 8, where the input feature map undergoes dimension transformation, and the transformed feature map is processed through the MLP to restore its original dimensions, resulting in a Sigmoid output.
Two convolutional layers are employed for spatial information fusion in the spatial attention sub-module (SAM) in order to pay attention to spatial input [26]. For SAM, similar to the SE attention mechanism, it first reduces the quantity of channels before increasing them. A SAM is depicted in Figure 9, where channel reduction is achieved by a convolutional kernel with a size of 7, reducing computational load, followed by a convolution operation with a 7-sized kernel to boost the number of channels while preserving channel count consistency. Finally, a Sigmoid output is obtained.
The receptive field of the traditional convolutional kernel is regular, but the shape of road cracks is irregular, causing some receptive fields not to be on the target. Additionally, the receptive field size of the convolutional kernel is fixed, but the size and extent of road cracks vary. If the target is too large, only local features can be extracted, and if the target is too small, interference from irrelevant information occurs. To address the limitation of traditional convolution in effectively adapting to geometric changes in objects, which makes it difficult to recognize objects undergoing rotation, symmetry, and scaling, dynamic snake convolution can be employed [27].
Inspired by deformable convolution [28], the model, in the process of learning features, obtains dynamic snake convolution by changing the shape of the convolutional kernel. Deformable convolution predicts offsets for sampling points, adaptively changes the sampling positions, and focuses on semantic feature points and geometric keypoints of the target. The sampling process of the convolutional kernel is illustrated in Figure 10. The input image first passes through the convolutional branch to calculate the offset. The output feature map has the same size as the input image, with dimensions of 2N. Then, based on the offset, the adaptive sampling points of the backbone network are obtained. Next, using bilinear interpolation, the feature values of the sampling points are obtained. However, the learning of the offset for sampling points is highly free, which may lead to positions far from the target. Moreover, each sampling point has the same weight for the output, and poor-quality sampling points can interfere with feature extraction. Dynamic snake convolution introduces continuity constraints into the design of the convolutional kernel. Each convolutional position is based on its previous position as a reference, freely choosing the oscillation direction and ensuring continuity of perception while allowing for free selection. This enables the convolutional kernel to fit structures and learn features freely on one hand, and on the other hand, it ensures that the convolutional kernel does not deviate too far from the target structure under constraint conditions. By adding dynamic snake convolution to the feature fusion network section of YOLOv5s, the improved network enhances the model's ability to handle irregular shapes and deformation problems. The adaptive changes in the receptive field size based on the target size are beneficial for improving the accuracy of road crack recognition.
The data collection device for the experiment is a multipurpose road inspection vehicle. On the inspection vehicle, there are two cameras arranged in parallel for capturing road information. The images are processed in grayscale to reduce the original data volume and enhance image information. There are 3000 pictures in the dataset that were utilized for this experiment. Training, validation, and test sets of these photos are split into 8:1:1 ratios at random. A portion of the photos is used as the test set, a portion as the validation set, and a portion as the training set. The dataset's pictures used in the experiment are in JPG format with a size of 500 × 500 pixels.
The experiments in this paper's software environment are built using PyTorch 1.8.0, a deep learning framework. It utilizes the GPU (graphics processing unit) of the NVIDIA GeForce RTX 3060 model for accelerated processing. The code is written in Python version 3.8, and it runs on the Windows 10 operating system. The hardware includes an Intel Xeon W-3225 processor, and the acceleration library is CUDA 10.0. There shall be 200 training batches in all [29], and the batch size is configured as 8. The weight file used is YOLOv5s.pt, with an initial learning rate of 0.001, a momentum of 0.9, a weight decay rate of 0.0005, and label smoothing set to 0.1.
The mAP, recall (R), frames per second (FPS), and computational complexity are frequently employed in deep learning to assess the efficacy of models. These are their formulas:
Precision=TPTP+FP | (1) |
Recall=TPTP+FN | (2) |
where: TP represents the number of true positive detections of positive samples; FP represents the number of false detections of negative samples; TN represents the number of false detections of positive samples; and FN represents the number of true negative detections of negative samples.
mAP, which is the average of average precision (AP), is a key metric for object detection algorithms [30]. In object detection models, a higher mAP indicates better detection results on a specific dataset. FPS, which gauges the model detection speed, is employed to evaluate the fracture detection speed; a higher FPS value indicates faster detection and better model performance. The computational complexity of a convolutional neural network model is represented by the number of floating-point operations, known as FLOPs. FLOPs are used as an indirect measure of the speed of neural network models. A smaller FLOPs value indicates lower model complexity and faster target recognition and is calculated as follows:
FLOPs_C=2CinK2HWCout | (3) |
where: Cin represents the number of input channels; K represents the convolutional kernel size; HW represents the size of the output feature map; and Cout represents the number of output feature map channels [31].
To illustrate the appropriateness and efficacy of selecting the Res2-C3 module as a multi-scale extraction module, a horizontal comparison of the performance of Res2-C3 and C3RFEM as multi-scale feature extraction modules is presented. The outcomes of the comparative experiments are displayed in Table 1.
module | mAP/50% | AP/%(groove) | Recall/%(groove) | FPS/(frame/s) | FLOPs |
YOLOv5s | 0.838 | 0.816 | 0.766 | 76.324 | 15.8 |
Res2-C3 | 0.897 | 0.872 | 0.83 | 53.16 | 14.4 |
C3RFEM | 0.873 | 0.85 | 0.798 | 41.308 | 10.1 |
Table 1 illustrates how the Res2-C3 module model strikes a balance between speed and accuracy of detection. Regarding detection accuracy, compared to the original model and the model using the C3RFEM module, the model using the Res2-C3 module improved mAP by 5.9 and 2.4%, respectively. The model with the Res2-C3 module reduced the risk of overfitting because of a decrease in parameters, but it also exhibited a modest decrease in FPS when compared to the original model in terms of detecting speed. In comparison to the model using the C3RFEM module, the FPS increased by 11.852 frames/s, significantly improving the detection speed. Therefore, the Res2-C3 module was chosen as the multi-scale module for the final improved model.
To fully validate the effectiveness of the proposed improvements in this paper, ablation experiments were conducted on the road crack dataset. The label smoothing for all models was set to 0.1 to prevent model overfitting. Each improvement module was embedded into the YOLOv5s model one by one, and the same training parameters and environmental conditions were used in each experiment. Table 2 displays the outcomes of the experiment. In terms of detection accuracy, the model with the highest mAP is the YOLOv5s+Res2-C3+GAM+DSConv model, which improved mAP by 10.1% compared to the YOLOv5s model. When Res2-C3, GAM, and DSConv (dynamic snake convolution) act individually, the highest mAP value is achieved by the YOLOv5s+Res2-C3 model, with an mAP value of 89.7%, which is 5.9% higher than the YOLOv5s model. This indicates that by stacking numerous bottleneck modules, the Res2-C3 module improves the model's feature extraction capabilities and enhances the road fracture detection accuracy. When any two modules are combined, the model's detection accuracy is improved to varying degrees. This suggests the possibility of a synergistic effect between the modules where they complement and enhance each other's capabilities, contributing to improved accuracy and robustness.
module | mAP/50% | AP/%(groove) | Recall/%(groove) | FPS/(frame/s) |
YOLOv5s | 0.838 | 0.816 | 0.766 | 72.324 |
YOLOv5s+Res2-C3 | 0.897 | 0.872 | 0.83 | 53.16 |
YOLOv5s+GAM | 0.885 | 0.876 | 0.815 | 62.449 |
YOLOv5s+DSConv | 0.859 | 0.828 | 0.782 | 51.944 |
YOLOv5s+Res2-C3+GAM | 0.934 | 0.927 | 0.863 | 53.985 |
YOLOv5s+Res2-C3+DSConv | 0.917 | 0.905 | 0.842 | 50.917 |
YOLOv5s+GAM+DSConv | 0.922 | 0.913 | 0.851 | 51.869 |
YOLOv5s+Res2-C3+GAM+DSConv | 0.939 | 0.942 | 0.871 | 49.97 |
In terms of the detection speed, FPS represents the model's speed in terms of detection speed, with higher FPS indicating faster detection. According to the experimental results, the fastest detection model is YOLOv5s, with the FPS value as 72.324. Compared to the original model, all improved models experienced a decrease in detection speed. Among these, the largest decrease in detection speed was observed in the YOLOv5s+Res2-C3+GAM+DSConv model with an FPS value of 49.97 FPS. This is mainly because obtaining more feature information leads to an increase in the model's parameter count. The model requires more computations and weight updates, which increases the time cost of training and reduces detection speed.
In terms of recall rate, the best-performing model is the YOLOv5s+Res2-C3+GAM+DSConv model, achieving a recall rate of 87.1%, which is a 10.5% improvement over the original model.
In summary, through the comparison of multiple indicators, the improved YOLOv5s+Res2-C3+GAM+DSConv model performs the best, with a 12.6% increase in AP for road crack detection and a 10.1% increase in mAP. Since the final improvement model adds three modules relative to the YOLOv5s network, the complexity of the model increases, and it is reasonable that the speed will decrease. The final improved model strikes a balance between mAP and FPS, maintaining high accuracy while achieving relatively fast speed.
Tests were conducted on the YOLOv5s and the final improved model on the above-mentioned dataset, and their various parameters were compared with the newer YOLOv7 and YOLOv8 networks. Table 3 shows that, compared to other networks, the final improved model YOLOv5s+Res2-C3+GAM+DSConv has better parameters in terms of mAP and AP.
module | mAP/50% | AP/%(groove) | Recall/%(groove) | FPS/(frame/s) | FLOPs |
YOLOv5s | 0.838 | 0.816 | 0.766 | 76.324 | 15.8 |
YOLOv5s+Res2-C3+GAM+DSConv | 0.939 | 0.942 | 0.871 | 49.97 | 18.4 |
YOLOv7 | 0.891 | 0.903 | 0.862 | 54.62 | 16.3 |
YOLOv8 | 0.924 | 0.915 | 0.892 | 50.479 | 17.9 |
Compared to YOLOv7 and YOLOv8, YOLOv5 has the lowest accuracy, with mAP values for YOLOv7 and YOLOv8 models being 5.3 and 8.6% higher than YOLOv5, respectively. However, YOLOv8's FPS significantly decreases, leading to slower detection speed. Due to the larger model parameters and more complex network structures of YOLOv7 and YOLOv8, the detection speed becomes slower.
Compared to YOLOv7 and YOLOv8 models, the final improved model YOLOv5s+Res2-C3+GAM+DSConv has the highest detection accuracy, with mAP values 4.8 and 1.5% higher than the YOLOv7 and YOLOv8 models, respectively. In terms of detection speed, the YOLOv5s+Res2-C3+GAM+DSConv model has an FPS value only 0.509 FPS lower than YOLOv8. Although the YOLOv5s+Res2-C3+GAM+DSConv model exhibits a slight decrease in detection speed compared to YOLOv8, it achieves a 2.7% increase in detection accuracy over YOLOv8. Overall, the YOLOv5s+Res2-C3+GAM+DSConv model strikes a balance between detection accuracy and speed.
The enhanced model's detection results are shown in Figure 11. The numbers represent confidence scores, indicating the model's confidence level in its predictions. High confidence scores indicate that the model is very confident in its predictions, while low confidence scores indicate that the model is less certain about the results [32]. From the detection results of road cracks in the images, it is evident that not only were small cracks successfully detected, but these cracks also exhibited relatively high confidence scores. This observation strongly validates the feasibility and effectiveness of the improved algorithm, demonstrating its outstanding performance in crack detection tasks and providing robust support for road maintenance and safety.
In reaction to the shortcomings of conventional techniques for detecting road cracks, such as their poor speed and low accuracy, this paper proposes a method to improve the YOLOv5s model. The method involves using the proposed Res2-C3 module as the core of the backbone network to replace the original C3 module, enabling the extraction of more feature information from input images to reduce the omission of valuable information and increase detection accuracy. Furthermore, the GAM attention mechanism is added to both the YOLOv5s backbone network and the feature fusion network to increase the model's focus on crack information and reduce the false-negative rate for small cracks. Adding dynamic snake convolution to the feature fusion network enables the model's receptive field to adaptively change with the size of road cracks, which is advantageous for improving the accuracy of road crack detection. The paper also introduces label smoothing, setting the label smoothing value to 0.1, to enhance the model's generalization. This model achieves a balance between detection accuracy and speed while controlling the model's parameter size.
To enhance the model's accuracy and robustness, future research should consider collecting more samples of different types of road cracks and using data augmentation techniques to increase dataset diversity. Furthermore, considering real-time road crack detection and application on mobile devices would enhance the algorithm's practicality.
The author declares they have not used Artificial Intelligence (AI) tools in the creation of this article.
This work was supported in part by the Shandong Provincial Department of Science and Technology Research Project on Key Technologies for the Development of All-Terrain Intelligent Orchard Platform. (2019GNC106032).
The author declares there is no conflict of interest.
[1] | lez R, Bouchaud JP (2011) Individual and collective stock dynamics: intraday seasonalities. New J Phys 13: 345–349. |
[2] |
Anderson TW (1963) Asymptotic theory for principal component analysis. Ann Math Stat 34: 122– 148. doi: 10.1214/aoms/1177704248
![]() |
[3] | Andersson E, Bock D, Fris´en M (2004) Detection of turning points in business cycles. J Bus Cycle Manage Anal 1: 93–108. |
[4] |
Andersson E, Bock D, Fris´en M (2006) Some statistical aspects of methods for detection of turning points in business cycles. J Appl Stat 33: 257–278. doi: 10.1080/02664760500445517
![]() |
[5] |
Andrews DWK, Lee I, Ploberger W (1996) Optimal changepoint tests for normal linear regression. J Econometrics 70: 9–38. doi: 10.1016/0304-4076(94)01682-8
![]() |
[6] |
Ang A, Chen J (2002) Asymmetric correlations of equity portfolios. J Financ Econ 63: 443–494. doi: 10.1016/S0304-405X(02)00068-5
![]() |
[7] | Bai Z, Zhou W (2008) Large sample covariance matrices without independence structures in columns. Stat Sinica 18: 425–442. |
[8] | Bai ZD, Silverstein JW (2010) Spectral Analysis of Large Dimensional Random Matrices, Second Edition, Springer, New York. |
[9] | Basserville M, Nikiforov I (1993) Detection of Abrupt Changes: Theory and Applications. Prentice- Hall, Englewood Cli_s, NJ. |
[10] | Beibel M, Lerche HR (2000) A new look at optimal stopping problems related to mathematical finance. Stat Sinica 7: 93–108. |
[11] | Bejan A (2005) Largest eigenvalues and sample covariance matrices. M.Sc. dissertation, Department of Statistics, The University of Warwick. |
[12] | Berkes I, Gombay E, Horv´ath L, et al. (2004) Sequential change-point detection in GARCH(p,q) models. Economet Theor 20: 1140–1167. |
[13] |
Biely C, Thurner S (2008) Random matrix ensembles of time-lagged correlation matrices: derivation of eigenvalue spectra and analysis of financial time-series. Quant Financ 8: 705–722. doi: 10.1080/14697680701691477
![]() |
[14] | Bijlsma M, Klomp J, Duineveld S (2010) Systemic risk in the financial sector: A review and synthesis. CPB Netherland Bureau of Economic Policy Analysis Paper 210. |
[15] |
Billio M, Getmansky M, Lo AW, et al. (2012) Econometric measures of connectedness and systemic risk in the finance and insurance sectors. J Financ Economet 104: 535–559. doi: 10.1016/j.jfineco.2011.12.010
![]() |
[16] |
Bouchaud JP, Potters M (2001) More stylized facts of financial markets: leverage effect and downside correlations. Physica A 299: 60–70. doi: 10.1016/S0378-4371(01)00282-5
![]() |
[17] | Broemling LD, Tsurumi H (1987) Econometrics and Structural Change, Marcel Dekker, New York. |
[18] | Capuano C (2008) The option-iPoD. The probability of default implied by option prices based on entropy. IMF. |
[19] |
Chen J, Gupta AK (1997) Testing and locating variance change-points with application to stock prices. J Am Stat Assoc 92: 739–747. doi: 10.1080/01621459.1997.10474026
![]() |
[20] |
Chordia T, Swaminathan B (2000) Trading volume and cross-autocorrelations in stock returns. J Financ 55: 913–935. doi: 10.1111/0022-1082.00231
![]() |
[21] |
Cizeau P, Potters M, Bouchaud JP (2001) Correlation structure of extreme stock returns. Quant Financ 1: 217–222. doi: 10.1080/713665669
![]() |
[22] |
Conlon T, Ruskin HJ, Crane M (2009) Cross-correlations dynamics in financial time series. Physica A 388: 705–714. doi: 10.1016/j.physa.2008.10.047
![]() |
[23] |
Constantine AG (1963) Some non-central distribution problems in multivariate analysis. Ann Math Stat 34: 1270–1285. doi: 10.1214/aoms/1177703863
![]() |
[24] |
Daniel K, Moskowitz T (2016) Momentum crashes. J Financ Econ 122: 221–247. doi: 10.1016/j.jfineco.2015.12.002
![]() |
[25] |
Davis RA, Pfa_el O, Stelzer R (2014) Limit theory for the largest eigenvalue of sample covariance matrices with heavy-tails. Stoch Proc Appl 124: 18–50. doi: 10.1016/j.spa.2013.07.005
![]() |
[26] | De Brandt O, Hartmann P (2000) Systemic risk: A survey. European Central Bank. |
[27] | Dickey DA, Fuller WA (1979) Distribution of the estimators for autoregressive time series with a unit root. J Am Stat Assoc 74: 427–431. |
[28] |
Dimson E (1979) Risk measurement when shares are subject to infrequent trading. J Financ Econ 7: 197–226. doi: 10.1016/0304-405X(79)90013-8
![]() |
[29] | Doris D (2014) Modeling Systemic Risk in the Options Market. Ph.D. Thesis, Department of Mathematics, New York University, New York, NY. |
[30] |
Drożdż S, Grumer F, Ruf F, et al. (2000) Dynamics of competition between collectivity and noise in the stock market. Physica A 287: 440–449. doi: 10.1016/S0378-4371(00)00383-6
![]() |
[31] | Edelman A, Persson PO (2005) Numerical methods for eigenvalue distributions of random matrices. Math . |
[32] |
Edelman A, Rao NR (2005) Random matrix theory. Acta Numer 14: 233–297. doi: 10.1017/S0962492904000236
![]() |
[33] | Franses PH, van Dijk D (2000) Non-Linear Time Series Models in Empirical Finance. Cambridge University Press, New York, NY. |
[34] |
Geman S (1980) A limit theorem for the norm of random matrices. Ann Probab 8: 252–261. doi: 10.1214/aop/1176994775
![]() |
[35] |
Gopikrishnan P, Rosenov B, Plerou V, et al. (2001) Quantifying and interpreting collective behavior in fnancial markets. Physi Rev E 64: 035106. doi: 10.1103/PhysRevE.64.035106
![]() |
[36] |
Granger CWJ (1969) Investigating causal relations by econometric models and cross-spectral methods. Econometrica 37: 424–438. doi: 10.2307/1912791
![]() |
[37] | International Monetary Fund (2009). Global Financial Stability Report; Responding to the Financial Crisis and Measuring Systemic Risks. Washington, D.C. |
[38] | James AT (1960) The distribution of the latent roots of the covariance matrix. Ann Math Stati 32: 874–882. |
[39] |
Jin B, Wang C, Miao B, et al. (2009) Limiting spectral distribution of large-dimensional sample covariance matrices generated by VARMA. J Multivariate Anal 100: 2112–2125. doi: 10.1016/j.jmva.2009.06.011
![]() |
[40] |
Jobst AA (2013) Multivariate dependence of implied volatilities from equity options as measure of systemic risk. International Review of Financial Analysis 28: 112–129. doi: 10.1016/j.irfa.2013.01.005
![]() |
[41] | Johnstone IM (2001) On the distribution of the largest eigenvalue in principal component analysis. Ann Stat 29: 295–327. |
[42] | Kawahara Y, Yairi T, Machida K (2007) Change-point detection in time-series data based on subspace identification. Proceedings of the 7th IEEE International Conference on Data Mining, 559–564. |
[43] |
Kritzman M, Li Y, Page S, et al. (2011) Principal components as a measure of systemic risk. J Portf Manage 37: 112–126. doi: 10.3905/jpm.2011.37.4.112
![]() |
[44] |
Kwiatkowski D, Phillips P, Schmidt P, et al. (1992) Testing the null hypothesis of stationarity against the alternative of a unit root: How sure are we that economic time series has a unit root? J Econometrics 54: 159–178. doi: 10.1016/0304-4076(92)90104-Y
![]() |
[45] |
Laloux L, Cizeau P, Bouchaud JP (1999) Noise Dressing of Financial Correlation Matrices. Phys Rev Lett 83: 1467–1469. doi: 10.1103/PhysRevLett.83.1467
![]() |
[46] |
Laloux L, Cizeau P, Potters M, et al. (2000) Random matrix theory and financial correlations. Int J Theor Appl Financ 3: 391–397. doi: 10.1142/S0219024900000255
![]() |
[47] |
Lequeux P, Menon M (2010) An eigenvalue approach to risk regimes in currency markets. J Deriv Hedge Funds 16: 123–135. doi: 10.1057/jdhf.2010.10
![]() |
[48] | Lewis M (2010) The Big Short: Inside the Doomsday Machine. W. W. Norton & Company Inc., New York, NY. |
[49] |
Liu H, Aue A, Debashis P (2015) On the Marˇcenko-Pastur law for linear time series. Ann Stat 43: 675–712. doi: 10.1214/14-AOS1294
![]() |
[50] |
Liu S, Yamada M, Collier N, et al. (2013) Change-point detection in time-series data by relative densityratio estimation. Neural Networks 43: 72–83. doi: 10.1016/j.neunet.2013.01.012
![]() |
[51] | Longin F, Solnik B (2001) Extreme correlation of international equity markets. J Financ 5: 649–676. |
[52] | Lorden G (1971) Procedures for reacting to a change in distribution, Ann Math Stat 42: 1897–1908. |
[53] |
Marčenko VA, Pastur LA (1967) Distribution for some sets of random matrices. Math USSR-Sbornik 1: 457–483. doi: 10.1070/SM1967v001n04ABEH001994
![]() |
[54] | Mayya KBK, Amritkar RE (2006) Analysis of delay correlation matrices. Quant Financ. |
[55] | Meng H, Xie WJ, Jiang ZQ, et al. (2014) Systemic risk and spatiotemporal dynamics of the US housing market. Sci Rep-UK 4: 3655. |
[56] | Meric I, Kim S, Kim JH, et al. (2001) Co-movements of U.S., U.K., and Asian stock markets before and after September 11, 2001. J Money Invest Bank 3: 47–57. |
[57] |
Moustakides GV (1986) Optimal stopping times for detecting changes in distributions. Ann Stat 14: 1379–1387. doi: 10.1214/aos/1176350164
![]() |
[58] | Muirhead RJ (1982) Aspects of Multivariate Statistical Theory, Wiley, New York. |
[59] | Murphy KM, Topel RH (1985) Estimation and inference in two-step econometric models. J Bus Econ Stat 34: 370–379. |
[60] |
Newey WK,West KD (1987) A Simple, Positive Semi-definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix. Econometrica 55: 703–708. doi: 10.2307/1913610
![]() |
[61] | NYSE Financial Index (2014) NYSE Euronex. Available from: http://www.nyse.com/about/listed/nykid.shtml. |
[62] |
Page ES (1954) Continuous inspection schemes. Biometrika 41: 100–115. doi: 10.1093/biomet/41.1-2.100
![]() |
[63] | Pan RK, Sinha S (2007) Collective behavior of stock price movements in an emerging market. Phys Rev E 76: 1–9. |
[64] |
Petterson M (1998) Monitoring a freshwater fish population: Statistical surveillance of biodiversity. Environmetrics 9: 139–150. doi: 10.1002/(SICI)1099-095X(199803/04)9:2<139::AID-ENV291>3.0.CO;2-3
![]() |
[65] |
Petzold M, Sonesson C, Bergman E, et al. (2004) Surveillance in longitudinal models: Detection of intrauterine growth restriction. Biometrics 60: 1025–1033. doi: 10.1111/j.0006-341X.2004.00258.x
![]() |
[66] |
Phillips P, Perron P (1988) Time series regression with a unit root. Biometrika 75: 335–346. doi: 10.1093/biomet/75.2.335
![]() |
[67] | Pillai KCS (1976a) Distribution of characteristic roots in multivariate analysis. Part I: Null distributions. Can J Stat 4: 157–184. |
[68] | Pillai KCS (1976b) Distribution of characteristic roots in multivariate analysis. Part II: Non-null distributions. Can J Stat 5: 1–62. |
[69] |
Plerou V, Gopikrishnan P, Rosenow B, et al. (2002) Random matrix approach to cross correlations in financial data. Phy Rev E 65: 066126. doi: 10.1103/PhysRevE.65.066126
![]() |
[70] | Poor V, Hadjiliadis O (2009) Quickest Detection, Cambridge University Press, New York, NY. |
[71] | Preisendorfer RW (1988) Principal component analysis in meteorology and oceanography. North Holland, Amsterdam. |
[72] |
Pukthuanthong K, Roll R (2009) Global market integration: An alternative measure and its application. J Financ Econ 94: 214–232. doi: 10.1016/j.jfineco.2008.12.004
![]() |
[73] |
Pukthuanthong K, Berger D (2012) Market Fragility and International Market Crashes. J Financ Econ 105: 565–580. doi: 10.1016/j.jfineco.2012.03.009
![]() |
[74] | Reinhart C, Rogoff K (2011) This Time Is Di_erent: Eight Centuries of Financial Folly. Princeton University Press, Princeton, New Jersey. |
[75] | Fitch cuts Greece's issuer default ratings to 'RD'. (2012, March 9). Reuters. Available from: http://www.reuters.com/article/2012/03/09/idUSL2E8E97FN20120309. |
[76] | Shiryaev AN (1978) Optimal Stopping Rules. Springer-Werlag, New York. |
[77] | Shiryaev AN (2002) Quickest detection problems in the technical analysis of financial data. Mathematical Finance - Bachelier Congress, 2000 (Paris). Springer, Berlin, 487–521. |
[78] |
Silverstein JW (1985) The smallest eigenvalue of large dimensional Wishart matrix. Ann Probab 13: 1364–1368. doi: 10.1214/aop/1176992819
![]() |
[79] |
Silverstein JW (1995) Strong convergence of the empirical distribution of eigenvalues of largedimensional random matrices. J Multivariate Anal 55: 331–339. doi: 10.1006/jmva.1995.1083
![]() |
[80] | Smith R, Sidel R (2010). Banks keep failing, no end in sight. Wall Street J. Available from: http://online.wsj.com/news/articles/SB20001424052748704760704575516272337762044. |
[81] |
Solnik B, Boucrelle C, Le Fu Y (1996) International market correlation and volatility. Financ Anal J 52: 17–34. doi: 10.2469/faj.v52.n5.2021
![]() |
[82] | Sugiyama M, Suzuki T, Nakajima S, et al. (2008) Direct density ratio estimation in high-dimensional spaces, Ann I Stat Math 60: 699–746. |
[83] |
Tartakovsky AG, Rozovskii BL, Blazek RB, et al. (2006) A novel approach to detection of intrusions in computer networks via adaptive sequential and batch-sequential change-point detection methods. IEEE T Signal Proces 54: 3372–3382. doi: 10.1109/TSP.2006.879308
![]() |
[84] | Thottan M, Ji C (2003) Anomaly detection in IP networks. IEEE T Signal Proces 15: 2191–2204. |
[85] | Thurner S, Biely C (2007) The eigenvalue spectrum of lagged correlation matrices. Acta Phys Pol B 38: 4111–4122. |
[86] |
Tracy CA, Widom H (1996) On orthogonal and symplectic matrix ensembles. Commun Math Phys 177: 727–754. doi: 10.1007/BF02099545
![]() |
[87] |
Trivedi R, Chandramouli R (2005) Secret key estimation in sequential steganography, IEEE T Signal Proces 53: 746–757. bibitemTulino2004 Tulino AM, Verd S (2004) Random Matrix Theory and Wireless Communications. Found Trend Commun Inf Theory 1: 1–182. doi: 10.1561/0100000001
![]() |
[88] | Wetherhill GB, Brown DW (1991) Statistical Process Control. Chapman and Hall, London. |
[89] |
White H (1980) A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48: 817–838. doi: 10.2307/1912934
![]() |
[90] |
Wigner EP (1955) Characteristic vectors of bordered matrices with infinite dimensions. Ann Math 62: 548–564. doi: 10.2307/1970079
![]() |
[91] | Wishart J (1928) The generalized product moment distribution in samples from a normal multivariate population. Biometrika 20: 32–52. |
[92] | Yamada M, Kimura A, Naya F, et al. (2013) Change-point detection with feature selection in highdimensional time-series data. Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence 171: 1827–1833. |
[93] | Yao JF (2012) A note on a Marˇcenko-Pastur type theorem for time series. Stat Probab Letter 82: 20–28. |
[94] | Zhang M, Kolkiewicz AW,Wirjanto TS, et al. (2015) The impacts of financial crisis on sovereign credit risk analysis in Asia and Europe. Int J Financ Eng 2: 143–152. |
1. | S. Jency, G. Ramkumar, 2024, MobileNet based Convolutional Neural Network for Detecting Cracks in Concrete Surfaces, 979-8-3503-7994-5, 1768, 10.1109/ICESC60852.2024.10689858 |
module | mAP/50% | AP/%(groove) | Recall/%(groove) | FPS/(frame/s) | FLOPs |
YOLOv5s | 0.838 | 0.816 | 0.766 | 76.324 | 15.8 |
Res2-C3 | 0.897 | 0.872 | 0.83 | 53.16 | 14.4 |
C3RFEM | 0.873 | 0.85 | 0.798 | 41.308 | 10.1 |
module | mAP/50% | AP/%(groove) | Recall/%(groove) | FPS/(frame/s) |
YOLOv5s | 0.838 | 0.816 | 0.766 | 72.324 |
YOLOv5s+Res2-C3 | 0.897 | 0.872 | 0.83 | 53.16 |
YOLOv5s+GAM | 0.885 | 0.876 | 0.815 | 62.449 |
YOLOv5s+DSConv | 0.859 | 0.828 | 0.782 | 51.944 |
YOLOv5s+Res2-C3+GAM | 0.934 | 0.927 | 0.863 | 53.985 |
YOLOv5s+Res2-C3+DSConv | 0.917 | 0.905 | 0.842 | 50.917 |
YOLOv5s+GAM+DSConv | 0.922 | 0.913 | 0.851 | 51.869 |
YOLOv5s+Res2-C3+GAM+DSConv | 0.939 | 0.942 | 0.871 | 49.97 |
module | mAP/50% | AP/%(groove) | Recall/%(groove) | FPS/(frame/s) | FLOPs |
YOLOv5s | 0.838 | 0.816 | 0.766 | 76.324 | 15.8 |
YOLOv5s+Res2-C3+GAM+DSConv | 0.939 | 0.942 | 0.871 | 49.97 | 18.4 |
YOLOv7 | 0.891 | 0.903 | 0.862 | 54.62 | 16.3 |
YOLOv8 | 0.924 | 0.915 | 0.892 | 50.479 | 17.9 |
module | mAP/50% | AP/%(groove) | Recall/%(groove) | FPS/(frame/s) | FLOPs |
YOLOv5s | 0.838 | 0.816 | 0.766 | 76.324 | 15.8 |
Res2-C3 | 0.897 | 0.872 | 0.83 | 53.16 | 14.4 |
C3RFEM | 0.873 | 0.85 | 0.798 | 41.308 | 10.1 |
module | mAP/50% | AP/%(groove) | Recall/%(groove) | FPS/(frame/s) |
YOLOv5s | 0.838 | 0.816 | 0.766 | 72.324 |
YOLOv5s+Res2-C3 | 0.897 | 0.872 | 0.83 | 53.16 |
YOLOv5s+GAM | 0.885 | 0.876 | 0.815 | 62.449 |
YOLOv5s+DSConv | 0.859 | 0.828 | 0.782 | 51.944 |
YOLOv5s+Res2-C3+GAM | 0.934 | 0.927 | 0.863 | 53.985 |
YOLOv5s+Res2-C3+DSConv | 0.917 | 0.905 | 0.842 | 50.917 |
YOLOv5s+GAM+DSConv | 0.922 | 0.913 | 0.851 | 51.869 |
YOLOv5s+Res2-C3+GAM+DSConv | 0.939 | 0.942 | 0.871 | 49.97 |
module | mAP/50% | AP/%(groove) | Recall/%(groove) | FPS/(frame/s) | FLOPs |
YOLOv5s | 0.838 | 0.816 | 0.766 | 76.324 | 15.8 |
YOLOv5s+Res2-C3+GAM+DSConv | 0.939 | 0.942 | 0.871 | 49.97 | 18.4 |
YOLOv7 | 0.891 | 0.903 | 0.862 | 54.62 | 16.3 |
YOLOv8 | 0.924 | 0.915 | 0.892 | 50.479 | 17.9 |