
Smart technologies are advancing the development of cutting-edge systems by exploring the future network. The Internet of Things (IoT) and many multimedia sensors interact with each other for collecting and transmitting visual data. However, managing enormous amounts of data from numerous network devices is one of the main research challenges. In this context, various IoT systems have been investigated and have provided efficient data retrieval and processing solutions. For multimedia systems, however, controlling inefficient bandwidth utilization and ensuring timely transmission of vital information are key research concerns. Moreover, to transfer multimedia traffic while balancing communication costs for the IoT system, a sustainable solution with intelligence in real-life applications is demanded. Furthermore, trust must be formed for technological advancement to occur; such an approach provides the smart communication paradigm with the incorporation of edge computing. This study proposed a model for optimizing multimedia using a combination of edge computing intelligence and authentic strategies. Mobile edges analyze network states to discover the system's status and minimize communication disruptions. Moreover, direct and indirect authentication determines the reliability of data forwarders and network stability. The proposed authentication approach minimizes the possibility of data compromise and increases trust in multimedia surveillance systems. Using simulation testing, the proposed model outperformed other comparable work in terms of byte delivery, packet overhead, packet delay, and data loss metrics.
Citation: Faten S. Alamri, Khalid Haseeb, Tanzila Saba, Jaime Lloret, Jose M. Jimenez. Multimedia IoT-surveillance optimization model using mobile-edge authentic computing[J]. Mathematical Biosciences and Engineering, 2023, 20(11): 19174-19190. doi: 10.3934/mbe.2023847
[1] | Zhijing Xu, Jingjing Su, Kan Huang . A-RetinaNet: A novel RetinaNet with an asymmetric attention fusion mechanism for dim and small drone detection in infrared images. Mathematical Biosciences and Engineering, 2023, 20(4): 6630-6651. doi: 10.3934/mbe.2023285 |
[2] | Yinghong Xie, Biao Yin, Xiaowei Han, Yan Hao . Improved YOLOv7-based steel surface defect detection algorithm. Mathematical Biosciences and Engineering, 2024, 21(1): 346-368. doi: 10.3934/mbe.2024016 |
[3] | Siyuan Shen, Xing Zhang, Wenjing Yan, Shuqian Xie, Bingjia Yu, Shizhi Wang . An improved UAV target detection algorithm based on ASFF-YOLOv5s. Mathematical Biosciences and Engineering, 2023, 20(6): 10773-10789. doi: 10.3934/mbe.2023478 |
[4] | Yang Pan, Jinhua Yang, Lei Zhu, Lina Yao, Bo Zhang . Aerial images object detection method based on cross-scale multi-feature fusion. Mathematical Biosciences and Engineering, 2023, 20(9): 16148-16168. doi: 10.3934/mbe.2023721 |
[5] | Dawei Li, Suzhen Lin, Xiaofei Lu, Xingwang Zhang, Chenhui Cui, Boran Yang . IMD-Net: Interpretable multi-scale detection network for infrared dim and small objects. Mathematical Biosciences and Engineering, 2024, 21(1): 1712-1737. doi: 10.3934/mbe.2024074 |
[6] | Chen Chen, Guowu Yuan, Hao Zhou, Yi Ma . Improved YOLOv5s model for key components detection of power transmission lines. Mathematical Biosciences and Engineering, 2023, 20(5): 7738-7760. doi: 10.3934/mbe.2023334 |
[7] | Zheng Zhang, Xiang Lu, Shouqi Cao . An efficient detection model based on improved YOLOv5s for abnormal surface features of fish. Mathematical Biosciences and Engineering, 2024, 21(2): 1765-1790. doi: 10.3934/mbe.2024076 |
[8] | Yingying Xu, Chunhe Song, Chu Wang . Few-shot bearing fault detection based on multi-dimensional convolution and attention mechanism. Mathematical Biosciences and Engineering, 2024, 21(4): 4886-4907. doi: 10.3934/mbe.2024216 |
[9] | Lei Yang, Guowu Yuan, Hao Wu, Wenhua Qian . An ultra-lightweight detector with high accuracy and speed for aerial images. Mathematical Biosciences and Engineering, 2023, 20(8): 13947-13973. doi: 10.3934/mbe.2023621 |
[10] | Wenjie Liang . Research on a vehicle and pedestrian detection algorithm based on improved attention and feature fusion. Mathematical Biosciences and Engineering, 2024, 21(4): 5782-5802. doi: 10.3934/mbe.2024255 |
Smart technologies are advancing the development of cutting-edge systems by exploring the future network. The Internet of Things (IoT) and many multimedia sensors interact with each other for collecting and transmitting visual data. However, managing enormous amounts of data from numerous network devices is one of the main research challenges. In this context, various IoT systems have been investigated and have provided efficient data retrieval and processing solutions. For multimedia systems, however, controlling inefficient bandwidth utilization and ensuring timely transmission of vital information are key research concerns. Moreover, to transfer multimedia traffic while balancing communication costs for the IoT system, a sustainable solution with intelligence in real-life applications is demanded. Furthermore, trust must be formed for technological advancement to occur; such an approach provides the smart communication paradigm with the incorporation of edge computing. This study proposed a model for optimizing multimedia using a combination of edge computing intelligence and authentic strategies. Mobile edges analyze network states to discover the system's status and minimize communication disruptions. Moreover, direct and indirect authentication determines the reliability of data forwarders and network stability. The proposed authentication approach minimizes the possibility of data compromise and increases trust in multimedia surveillance systems. Using simulation testing, the proposed model outperformed other comparable work in terms of byte delivery, packet overhead, packet delay, and data loss metrics.
Infrared detection technology is one of the main means to obtain modern information. Compared with visible detection systems, the infrared detection system has the advantages of strong penetration, long detection distance and all-weather visibility. Therefore, infrared detection technology attracts more and more researchers and is widely used in military [1], medical [2], meteorological [3] and other fields. With the gradual opening of low-altitude airspace, unmanned aerial vehicles (UAVs) can be used to collect and track ground targets by carrying infrared equipment. How to effectively detect small targets from the aerial view has significant theoretical significance and engineering demand, as well as social value and economic significance.
In recent years, with the rapid development of deep learning technology, the target detection method has also changed from the traditional method based on manually designed features to the deep neural network (DNN) method based on automatically learned features [4,5]. The deep learning-based target detection methods are generally divided into two-stage methods and one-stage methods [6]. The two-stage methods generate region proposals and then classify them. The classic models are the region-convolutional neural network (R-CNN) series [7], including Fast R-CNN [8], Faster R-CNN [9], Mask R-CNN [10] and so on. They have high detection accuracy, but their detection speed is slow. It is difficult to apply in real-time detection scenarios. The one-stage methods do not have the stage of generating region proposals. They directly generate the final detection results through one stage, so they have a faster detection speed. The classic models are the YOLO series [11], including YOLOv3 [12], YOLOv5 [13], YOLOX [14] and so on.
YOLOv7 [15] is a novel model of the YOLO series, which surpasses most known target detectors in terms of accuracy and speed. Since 2022, YOLOv7 has been implemented in some real-world detection tasks. Soeb et al. [16] created a leaf image dataset from Bangladesh and used YOLOv7 for disease diagnosis. This study provided a solution for precision agriculture applications. Li et al. [17] improved YOLOv7 by embedding gamma correction, improved convolutional block attention module and Alpha GIOU. The improved model was used for the damages detection of aeroengine blades. Driver abnormal behavior is a serious threat to public safety. Liu et al. proposed the CEAMYOLOv7 model for distraction behavior recognition. The global attention mechanism (GAM) was introduced into YOLOv7 to enhance the network's capability to extract key features. The channel expansion (CE) method was also proposed for data augmentation. Moreover, the lightweight processing made the model easier to be deployed. More projects based on YOLOv7 are still being explored [18].
Although the above models show impressive performance in related works, the task of infrared small target detection is still a challenge. On the one hand, due to the long observation distance there is little shape and texture information of infrared small targets. On the other hand, due to the complex background infrared small targets may be obscured and overlapped [19,20]. To detect infrared small targets, researchers have developed some pioneering works. Zhang et al. [21] incorporated target shape reconstruction into the detection of infrared small targets and proposed the ISNet model. Based on Taylor finite difference (TFD)-inspired edge block and two-orientation attention aggregation (TOAA) block, the model can effectively extract edge features and aggregate cross-level features. Additionally, the authors established a new large-scale benchmark, IRSTD-1k, to validate the effectiveness of the proposed idea. To handle the problem of the loss of targets in deep layers, Li et al. [22] proposed a dense nested attention network (DNA-Net). Specifically, the dense nested interactive module (DNIM) and the cascaded channel and spatial attention module (CSAM) were designed to achieve repetitive fusion and enhancement between feature layers. Additionally, an infrared small target dataset, namely NUDT-SIRST, was developed. Results on a set of proposed evaluation metrics showed that the proposed method achieved better performance. A multi-level TransUNet (MTU-Net) in [23] was proposed to detect space-based infrared tiny ships. The Vision Transformer (ViT) Convolutional Neural Network (CNN) hybrid can extract multi-level features. Wu et al. also proposed a copy-rotateresize-paste (CRRP) data augmentation method that alleviates the problem of sample imbalance. Additionally, the authors designed a FocalIoU loss to achieve target localization and shape description. Establishing the largest space-based infrared tiny ship detection dataset NUDT-SIRSTSea was a significant work. In 2022, Lin et al. [24] comprehensively considered the detection performance and practical deployment, and proposed a light-weight infrared small target detection network LIRDNet. This model combined cross-scale feature fusion module (CFM) and bottleneck attention module (BAM). The experimental results demonstrated that the CFM and BAM modules further improved the detection performance with a low amount of parameters and computations. Liu et al. [25] proposed a lightweight model for ship detection in SAR images. Authors added the coordinate attention into the backbone of YOLOv7-tiny, and improved the SPP block and the loss function. Compared with the original model, the precision of the proposed model was increased by 4.6%. This work had not yet been deployed on edge devices. Similarly, Guo et al. [26] also proposed a lightweight SAR ship target detection based on YOLO, namely LMSD-YOLO. This model has better multi-scale adaptation capabilities and has been successfully deployed on mobile platforms. However, there are still difficulties in implementing target detection directly from large-scale SAR images. Zhou et al. [27] improved YOLOv5 to make the model to perform the small target detection task. It is worth noting that authors used the Super-Resolution Generative Adversarial Network (SRGAN) to generate super-resolution images and input images into the improved detection model. Experiments verified that the super-resolution reconstruction for images can improve the detection accuracy of small targets. The disadvantage is that the process of super-resolution reconstruction is very time-consuming.
In this paper, the recent YOLOv7 model as the baseline is used for infrared small target detection. To make the model better adapt to this task domain, we make targeted improvements to YOLOv7 and propose a new detection model namely ISTD-YOLOv7. Our main contributions are summarized as follows:
1) An improved YOLOv7 model (namely, ISTD-YOLOv7) is proposed for infrared small target detection.
2) The update of anchors can make the model to converge better and faster. Feature context and spatial location information can be efficiently exploited by GE attention. NWD can alleviate the sensitivity location deviations of small targets.
3) The performance of ISTD-YOLOv7 is compared with existing models. Ablation studies are performed to investigate the impact of each component. Experiments on a public dataset demonstrate the superiority of the proposed model in infrared small target detection.
The remainder of this paper is organized as follows: Section 2 briefly introduces the YOLOv7 model. Section 3 describes the mechanism of the improved components and presents the improved model. Experimental results and analysis are given in Section 4. Section 5 summarizes the work of this paper.
YOLOv7, as one of the latest representative models of the YOLO series, was proposed by Wang et al. [15] in 2022. Compared to previous YOLO series, the main contributions of YOLOv7 are that authors proposed the model re-parameterization, model scaling, extended efficient layer aggregation networks (E-ELAN), etc. This series of architectural alterations makes YOLOv7 not only more accurate, but also faster. The concise network structure of YOLOv7 is shown in Figure 1 [15]. More details of the component blocks can be found in [15].
First, the model resizes the input images to (640 × 640) pixels. Then, the images are input to the backbone network for feature extraction. The backbone network of YOLOv7 consists of several CBS blocks, ELAN blocks and MP blocks. The obtained features of different scales are fused by the neck network. The neck network adopts the structure of the path aggregation feature pyramid network (PAFPN). Then, the head (prediction) network adjusts the number of channels of feature maps based on the RepConv blocks. Finally, the bounding box information confidence and category probability are output.
The sizes of the anchors are obtained by clustering the width and height of the ground-truth boxes of the training samples. Whether the anchors are reasonable or not greatly affects the detection performance of the model. Generally speaking, the anchors of YOLOv7 are obtained by clustering based on the VOC dataset or the COCO dataset in the training process. VOC dataset provides 20 classes of targets, including person, horses, bicycles, motorbike and more [28]. The COCO dataset focuses on scene understanding and provides 80 classes of targets. These targets are mainly obtained from everyday scenes [29]. The VOC dataset and the COCO dataset are common large-scale datasets in target detection. However, the sizes of targets in these datasets are significantly different from those in infrared small target datasets.
In this paper, in order to make YOLOv7 converge better and faster, the K-means method is used to re-cluster the sizes of the targets based on the selected dataset. The number K of clustering centers is set to 9. The selected dataset in this paper is described in Section 4. Figure 2 shows the clustering results of the VOC dataset and the selected dataset. It can be seen that the distribution of cluster centers varies greatly. The target size of the VOC dataset can be several hundred pixels, while the target size of the selected dataset is obviously much smaller. Table 1 gives the results of anchors. The anchor update can provide a reasonable prior for the detection model.
Dataset | Anchor (pixels) | ||
VOC dataset | (23, 44), | (61, 58), | (44,128), |
(110,122), | (108,276), | (222,218), | |
(238,457), | (454,320), | (534,555). | |
Selected dataset | (12, 9), | (12, 10), | (13, 14), |
(16, 11), | (16, 13), | (18, 13), | |
(18, 14), | (22, 13), | (21, 16). |
For images or feature maps, the context information of the space can improve the representational capability of the network. In 2018, the Gather-Excite (GE) attention mechanism was proposed by Hu et al. [30]. This mechanism defines two operators: gather operator and excite operator. Figure 3 shows the operation process of the two operators [30]. The gather operator ξG extracts features from local spatial locations, defined as shown in Eq (1). The excite operator ξE maps features to the original scale, defined as shown in Eq (2).
ξG:RH×W×C→RH′×W′×C | (1) |
where H, W and C represent the height, width and channel of any input x, e represents the extent ratio, H′=H/e, W′=W/e. A global extent ratio using global average pooling is used in this paper.
ξE(x,ˆx)=x⊙f(ˆx) | (2) |
f:RH′×W′×C→[0,1]H×W×C | (3) |
where ˆx represents the output after processing by ξG, ⊙ represents the Hadamard product, f represents a map relationship.
In this paper, three GE attention blocks are added at three output branches of the backbone network of YOLOv7 respectively. The diagram is shown in Figure 4. Infrared small targets have the characteristics of small size and dim signal. Therefore, location information is essential for the detection of small targets. By adding GE attention blocks to the backbone feature extraction network of YOLOv7, the model can more efficiently exploit feature context and spatial location information for infrared small targets.
The sensitivity of IoU metric to targets with different scales is quite variant. For small targets, a slight location change may lead to a significant change in IoU. However, for targets with normal size, the change of IoU is slight for the same location deviation [31]. Figure 5 gives a specific analysis. For a small target, a location deviation leads to an IoU drop from 0.47 to 0.02. However, for a normal target, the same location deviation only leads to an IoU drop from 0.83 to 0.49.
Wang et al. [31] proposed a novel metric method based on the Wasserstein distance. Specifically, the bounding box is modeled as the 2D Gaussian distribution, and then the similarity between the corresponding Gaussian distributions is calculated by using the proposed metric, namely the Normalized Wasserstein Distance (NWD). Figure 6 [31] shows the deviation curves of IOU and NWD under different target sizes. As the target size becomes smaller, the IoU-deviation curves decrease faster, while the NWD-deviation curves remain overlapped and smooth. Compared with IOU, NWD is insensitive to location deviations of small targets. Some research has been presented in the literature regarding the theoretical and empirical benefits of using NWD [32,33,34].
Specifically, for a bounding box (cx, cy, w, h), the intrinsic elliptic of the bounding box can be expressed as:
(x−μx)2σ2x+(y−μy)2σ2y=1 | (4) |
where (cx, cy), w and h represent the center coordinate, width and height of the bounding box respectively. (μx, μy), σx and σy represent the center coordinates of the ellipse, the length of the X-axis and the length of the Y-axis respectively. Therefore, μx = cx, μy = cy, σx = w/2 and σy = h/2. The probability density function of the 2D Gaussian distribution is as follows:
f(x∣μ,Σ)=exp(−12(x−μ)TΣ−1(x−μ))2π|Σ|12 | (5) |
where x, μ and Σ represent the coordinate (x, y), mean and co-variance of the distribution respectively. When (x−μ)TΣ−1(x−μ)=1, the bounding box can be modelled as a 2D Gaussian distribution N(μ,Σ) with:
μ=[cxcy], Σ=[w2400h24] | (6) |
For Gaussian distributions Na and Nb which are modeled from bounding boxes (cxa, cya, wa, ha) and (cxb, cyb, wb, hb), the Wasserstein distance is shown in Eq (7). After normalization, the final form of NWD metric is obtained, namely Eq (8).
W22(Na,Nb)=‖([cxa,cya,wa2,ha2]T,[cxb,cyb,wb2⋅hb2]T)‖22 | (7) |
NWD(Na,Nb)=exp(−√W22(Na,Nb)C) | (8) |
In this paper, NWD is integrated into YOLOv7 to replace IoU. The specific improvement part is the loss function of YOLOv7. NWD-based regression loss can not only solve the issue that YOLOv7 is sensitive to the location deviation of small targets, but also still provide gradient to optimize the network in some cases. The improved loss function of YOLOv7 is as follows:
LISTD−YOLOv7=1−NWD(Np,Ng) | (9) |
where Np and Ng represent the Gaussian distribution model of prediction box p and ground-truth box g respectively.
In order to more effectively detect small targets in infrared image data, we propose the ISTD-YOLOv7 model, which can maintain good performance. The diagram of the model is shown in Figure 7. First, the infrared images enter the backbone network consisting of convolution groups to extract features. After that, these features enter designed GE blocks. GE blocks are added at three output branches of the backbone network to exploit feature context and spatial location information. Then, the neck network with PAFPN structure is used for feature fusion, producing better semantic information. Finally, the feature maps of various scales enter the head network to produce the prediction results.
The purpose of the training process is to continuously reduce the difference between the prediction results and ground truth boxes. In this paper, the prediction results are iteratively optimized by the NWD-based loss function. The NWD metric is insensitive to location deviations of small targets. For the testing process, we use the trained model for inference and obtain the prediction results. The size of the small target is re-clustered to obtain anchors. The predicted bounding boxes are adjusted based on updated anchors. Then, the final detection result is obtained after non-maximum suppression (NMS) [35].
All experiments are run on a computer with an Intel(R) Core(TM) i9-12900KF (64 GB DDR5) CPU, one NVIDIA GeForce RTX 3090Ti (24 GB) GPU and the Microsoft Windows 10 system. The deep learning framework is PyTorch 1.7.1. The stochastic gradient descent (SGD) optimizer with an initial learning rate of 0.01, a weight decay of 0.0005 and a momentum of 0.937 is chosen to reduce the loss function. The batch size is 32 and the number of epochs is 300.
The dataset in this paper was published by Fu et al. [36] and has been used in some official competitions. All images in this dataset were taken by a UAV equipped with an infrared camera. The dataset includes 21,750 images, 8 classes and 89,174 targets, where targets are some vehicles under ground background. More details of the dataset are given in Table 2. We randomly divided the training set, validation set and testing set in the ratio of 8:1:1. The main challenges of this dataset focus on the complex environment interference and complex imaging conditions. It can provide material bases for the research of infrared image characteristics, infrared small target detection and tracking.
Resolution | Depth | Format | Memory |
(640 × 480) pixels | 8 bit | .bmp | ≈300 k |
In order to evaluate the detection performance of the model, some evaluation indices are selected in this paper, including: Precision, Recall, F1 score, Average Precision and mean Average Precision. These indices are all in the range of [0, 1], and the larger the values are, the better the results will be. Their equations are as follows [37,38]:
P=TPTP+FP | (10) |
R=TPTP+FN | (11) |
F1=2×P×RP+R | (12) |
AP=∫10P(R)⋅dR | (13) |
mAP=1CC∑iAPi | (14) |
where TP represents true positive, FP represents false positive and FN represents false negative. The confusion matrix is given in Table 3. C represents the number of classes. P represents Precision, R represents Recall, F1 represents F1 score, AP represents Average Precision and mAP represents mean Average Precision. mAP is the mean of APs of all classes and enables the evaluation of the overall detection accuracy of the model.
Predicted result = Positive | Predicted result = Negative | |
Actual result = Ture | TP (True Positive) | FN (False Negative) |
Actual result = False | FP (False Positive) | TN (True Negative) |
The above indices can evaluate the pixel-level performance. Some research [21,22,23,24] has demonstrated that target-level performance is also important for infrared small target with limited shape and texture information. The probability of detection and the false-alarm rate are defined as follows:
Pd=TcorrectTAll | (15) |
Fa=PfalsePAll | (16) |
where Pd represents the probability of detection, Fa represents the false-alarm rate. Tcorerect represents the correctly predicted target number, TAll represents all target number. Targets are correctly predicted if the centroid deviation of the targets is smaller than the threshold Tdistance. In this paper, Tdistance is set to 3 [21,22,23,24]. Pfalse represents the falsely predicted pixels, PAll represents all image pixels. Pixels are incorrectly predicted if the centroid deviation of the targets is larger than the threshold Tdistance.
In this section, the performance of ISTD-YOLOv7 and YOLOv7 is compared from three aspects: training process, verification process and testing process. Before training the two models, data augmentation technologies are used to enhance the data randomly. Taking two data augmentation methods, Mixup and Mosaic, as examples, Figure 8 shows the infrared image results obtained after processing by the two methods. Mixup uses simple linear interpolation on two random infrared images to construct new training samples, as shown in Figure 8(a)–(d). Mosaic randomly intercepts four infrared images and merges them into one infrared image as new training data, as shown in Figure 8(e)–(h). Data augmentation technology can greatly enrich the training data, improve the generalization capability of the model and make the network more robust.
Figure 9 shows the convergence curves of ISTD-YOLOv7 and YOLOv7 on the training set and the verification set respectively. The red line is the original data of ISTD-YOLOv7, the coral line is the original data of YOLOv7, the green line is the smoothed data of ISTD-YOLOv7, and the brown line is the smoothed data of YOLOv7. It can be seen from Figure 9(a) that, in the training process, the convergence curve of ISTD-YOLOv7 is located below the convergence curve of YOLOv7. It shows that the convergence accuracy of ISTD-YOLOv7 is better than that of YOLOv7. In addition, it can be seen that the convergence speed of ISTD-YOLOv7 is better than that of YOLOv7. Specifically, ISTD-YOLOv7 escapes from local optima more quickly, achieving global optima at about 190 iterations, while YOLOv7 needs more than 210 iterations to achieve convergence. Similarly, it can be seen from Figure 9(b) that in the verification process, the convergence curve of ISTD-YOLOv7 is more stable and flatter, and the whole is located below the convergence curve of YOLOv7. It is worth noting that, after 250 iterations, the convergence curve of YOLOv7 shows a significant rise. It means that the YOLOv7 model is overfitting, while the ISTD-YOLOv7 model can better characterize this hard dataset of infrared small targets.
On the basis of comparing the training process and the verification process, the performance of the two models is evaluated on the testing set. The testing set contains 2175 infrared small target images. The number of targets in each class is shown in Figure 10. Table 4 compares the evaluation results of the two models on the testing set. Note that the best result in this paper is marked in bold. From Table 4, it can be found that ISTD-YOLOv7 has improvements compared with YOLOv7 in precision (from 97.52% to 98.80%), recall (from 96.23% to 96.87%), F1 (from 96.87% to 97.83%) and mAP (from 97.44% to 98.43%). These are made possible by the application of improvements enhancing the feature extraction capability of the network for limited information, improving the recall of the model and making ISTD-YOLOv7 detect more precisely.
Model | P (%) | R (%) | F1 (%) | mAP (%) |
YOLOv7 | 97.52 | 96.23 | 96.87 | 97.44 |
ISTD-YOLOv7 | 98.80 | 96.87 | 97.83 | 98.43 |
In this section, ISTD-YOLOv7 are compared with other state-of-the-art detection models. YOLOv3 [12], YOLOv5s [13] and YOLOXs [14] are also from the YOLO family, but they have not been tested on the dataset of this paper. SSD [39] is the anchor-based model. CenterNet [40] and FCOS [41] are the anchor-free models. DETR [42] is the first detection model based on a transformer.
Figure 11 shows the AP value of each class of different models. The ordinate indicates the class and the abscissa indicates the AP value. The AP values of each model are sorted from large to small and then displayed from top to bottom. The index AP comprehensively considers the balance between precision and recall under different confidence levels. ISTD-YOLOv7 is the only model with AP values over 96% in all classes. It proves that our model has a better overall detection effect on the given dataset. In addition, it is not difficult to find that the AP values of the eighth class of all models except FCOS are all the lowest. This is because the number of the eighth-class targets in the training set is fewer, and the models cannot learn the feature information of this class more fully. Nevertheless, the AP value of our model in the eighth class is more than 96%, while the AP value of SSD model in the eighth class is only more than 75%. mAP is the mean of all classes of AP and cannot reflect the above potential results.
More quantitative results are given in Table 5. In terms of precision, ISTD-YOLOv7 obtains the best result of 98.80%. YOLOv3 obtains the best recall of 97.45%, and ISTD-YOLOv7 ranked second. F1 and mAP are two comprehensive indices, and our model significantly outperforms the comparison models. Moreover, in term of the target-level performance, Pd is the ratio of correctly predicted targets and all targets, and Fa is the ratio of false predicted target pixels and all the pixels in the image. Our model achieves 94.66% on Pd and 94.08 × 10-6 on Fa. The performance of SSD is not satisfactory on the given dataset. These findings show that ISTD-YOLOv7 performs better overall than comparison models regarding its capacity to detect infrared small targets. This is attributed to YOLOv7's own network structure and our focused improvements to it. Facing the infrared small targets in complex scenes, the updated anchors, GE attention and NWD-based loss in ISTD-YOLOv7 substantially improve the convergence performance and feature extraction capability of the network and alleviate the sensitivity to the location deviation of small targets.
Model | P (%) | R (%) | F1 (%) | mAP (%) | Pd (%) | Fa (10-6) |
YOLOv3 | 97.15 | 97.45 | 97.30 | 97.27 | 93.93 | 115.88 |
YOLOv5s | 97.72 | 95.00 | 96.35 | 96.91 | 92.77 | 127.63 |
SSD | 92.78 | 41.81 | 57.65 | 87.48 | 77.02 | 1245.72 |
CenterNet | 96.31 | 92.15 | 94.19 | 94.26 | 93.45 | 112.54 |
FCOS | 98.30 | 80.26 | 88.37 | 97.71 | 92.20 | 130.51 |
YOLOXs | 96.90 | 96.25 | 96.60 | 97.37 | 93.19 | 118.65 |
DETR | 97.35 | 96.83 | 97.09 | 97.98 | 93.15 | 119.75 |
ISTD-YOLOv7 | 98.80 | 96.87 | 97.83 | 98.43 | 94.66 | 94.08 |
The ground truths and the qualitative results of all models are provided in Figures 12–15. The qualitative results show the class and the confidence of the detected target in different colors. Here, "Target 1" to "Target 8" respectively represent eight different infrared small vehicles. Limited to space, we only show some typical results of different methods. Image 1 is selected from the day outfield scene, Image 2 is selected from the day infield scene, Image 3 is selected from the night outfield scene and Image 4 is selected from the night infield scene. In Image 1, YOLOXs has obvious false detection cases. In Image 2, only CenterNet and ISTD-YOLOv7 detect all targets, while other models have different degrees of missed detection phenomena. Further analysis of missed detection phenomena shows that, because the "Target 7" is very weak and almost submerged in the background, it is more difficult to detect. In this case, ISTD-YOLOv7 can still detect it with a confidence of 0.78. SSD is the model with the most severe missed detection phenomena, only detecting "Target 1". It can be seen that eight models detect all infrared small vehicles in Image 3. ISTD-YOLOv7 detects targets with significantly high confidence levels. In Image 4, SSD and FCOS have missed detection phenomena. ISTD-YOLOv7 is not affected by white noise in complex scenes during the detection process, and the confidence level of the detection results on "Target 1", "Target 2" and "Target 3" is 1.00. The qualitative results more intuitively prove the superiority of our model.
To further discuss the detection results of our model, we crop and enlarge the obtained targets on Images 1–4, as shown in Figure 16. It is not difficult to find that our model detects all the targets in the four images. The displayed cropped targets are potentially helpful for situation analysis and target attack on the battlefield.
In this section, the model parameters, floating-point operations per second (FLOPs) and frames per second (FPS) are also calculated. Spatial complexity determines the number of parameters in the model, and time complexity can be measured using FLOPs. FPS is used to evaluate the detection speed, which is tested on one 3090Ti GPU. According to Table 6, it can be seen that YOLOv5s has lower parameters, smaller computations, and faster inference speed. YOLOXs ranks second overall. ISTD-YOLOv7 has 37.232 M parameters, 105.234 G FLOPs and 36 FPS. In terms of FPS, YOLOv5s, SSD and YOLOXs have significant advantages. ISTD-YOLOv7 ranks in the middle on various evaluation indices. In summary, our model achieves better detection performance within an acceptable time. However, our model is not lightweight enough and does not have an advantage in complexity, which is the limitation of current work.
Model | Params | FLOPs | FPS |
YOLOv3 | 61.561 M | 155.380 G | 48 |
YOLOv5s | 7.082 M | 16.537 G | 79 |
SSD | 24.547 M | 276.251 G | 72 |
CenterNet | 32.665 M | 109.714 G | 49 |
FCOS | 32.127 M | 161.410 G | 25 |
YOLOXs | 8.968 M | 26.927 G | 77 |
DETR | 36.762 M | 73.642 G | 24 |
ISTD-YOLOv7 | 37.232 M | 105.234 G | 36 |
In this section, ablation studies are carried out to verify the effectiveness of the improved components. Table 7 shows the results of ablation studies. Compared with the baseline model, the detection performance of all four improved other models is improved. Moreover, ISTD-YOLOv7 obtains the best results on all indices. It indicates that the three components improve the performance of the model in small target detection from different aspects, and the gain effect of the hybrid model increases the most. Specifically, resetting anchors of the small target dataset can make the model better adapt to the given task. In this way, the bounding box can fine-tune the high-quality anchor to obtain the detection results. Figure 17 shows heat maps before and after adding the GE attention blocks. Figure 17(a)–(c) are heat maps without attention, and Figure 17(d)–(f) are heat maps with attention. The darker the color, the more significant the target area is. It is not difficult to find that adding the attention mechanism can make the model focus more on the local characteristics of infrared small targets and ignore irrelevant background information. NWD-based loss can better eliminate the performance gap between training and testing, and is suitable for small target detectors. The NWD metric can handle the problem that small targets are easy to be falsely predicted because the IoU metric is sensitive to the location deviation of the small targets.
Model | P (%) | R (%) | F1 (%) | mAP (%) |
YOLOv7 | 97.52 | 96.23 | 96.87 | 97.44 |
YOLOv7+Anchor update | 98.42 | 96.34 | 97.37 | 97.85 |
YOLOv7+GE attention | 98.05 | 96.72 | 97.38 | 98.14 |
YOLOv7+NWD | 97.65 | 96.80 | 97.22 | 98.26 |
ISTD-YOLOv7 | 98.80 | 96.87 | 97.83 | 98.43 |
Infrared small targets are dim and have low signal-to-noise ratio. In complex weather and terrain scenes, infrared vehicles are easily overlooked, and most current models cannot effectively detect them. In this paper, ISTD-YOLOv7 based on YOLOv7 is proposed for infrared small target detection. In order to improve YOLOv7 to adapt this task, we have adopted a series of targeted improvements.
ISTD-YOLOv7 includes anchor update and GE attention as well as the NWD loss function. On a public infrared small target dataset, a series of experimental results reveal that ISTD-YOLOv7 is superior to comparison models (YOLOv3, YOLOv5s, SSD, CenterNet, FCOS, YOLOXs, DETR and YOLOv7), and the improvements are effective. Compared with the baseline model, the mAP of ISTD-YOLOv7 improved from 97.44% to 98.43%. The major causes of the high detection performance are as follows: the update of anchor provides a more reasonable prior. Spatial location is more important for the detection of small targets, so GE attention is chosen to make the model more efficiently exploit feature context information. The NWD loss function contributes to solving the sensitivity of the IoU metric to small target location deviation.
It should be mentioned that there are still limitations to this work. First, there is a problem of the class imbalance in the dataset used. Second, our model is still not lightweight enough. For future research, we will use a Generative Adversarial Network (GAN) [43] to increase samples for training. In addition, we will reduce the parameters and computations of the model as much as possible for deployment applications.
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
This work was supported by the National Natural Science Foundation of China (Grant No. 61473100).
The authors declare there is no conflict of interest.
[1] |
Q. Tang, F. R. Yu, R. Xie, A. Boukerche, T. Huang, Y. Liu, Internet of intelligence: A survey on the enabling technologies, applications, and challenges, IEEE Commun. Surveys Tutor., 24 (2022), 1394–1434. https://doi.org/10.1109/COMST.2017.2691349 doi: 10.1109/COMST.2017.2691349
![]() |
[2] |
I. Abunadi, A. Rehman, K. Haseeb, T. Alam, G. Jeon, A multi-parametric machine learning approach using authentication trees for the healthcare industry, Expert Systems, (2022), e13202. https://doi.org/10.1111/exsy.13202 doi: 10.1111/exsy.13202
![]() |
[3] |
I. Sarrigiannis, K. Ramantas, E. Kartsakli, P.-V. Mekikis, A. Antonopoulos, C. Verikoukis, Online VNF lifecycle management in an MEC-enabled 5G IoT architecture, IEEE Int. Things J., 7 (2019), 4183–4194. https://doi.org/10.1109/JIOT.2019.2944695 doi: 10.1109/JIOT.2019.2944695
![]() |
[4] |
S. H. Alsamhi, F. Afghah, R. Sahal, A. Hawbani, M. A. Al-qaness, B. Lee, et al., Green internet of things using UAVs in B5G networks: A review of applications and strategies, Ad Hoc Networks, 117 (2021), 102505. https://doi.org/10.1016/j.adhoc.2021.102505 doi: 10.1016/j.adhoc.2021.102505
![]() |
[5] |
L. Qiao, Y. Li, D. Chen, S. Serikawa, M. Guizani, Z. Lv, A survey on 5G/6G, AI, and Robotics, Comput. Electr. Eng., 95 (2021), 107372. https://doi.org/10.1016/j.compeleceng.2021.107372 doi: 10.1016/j.compeleceng.2021.107372
![]() |
[6] |
M. A. Matheen, S. Sundar, IoT multimedia sensors for energy efficiency and security: A review of QoS aware and methods in wireless multimedia sensor networks, Int. J. Wireless Inform. Networks, 29 (2022), 407–418. https://doi.org/10.1007/s10776-022-00567-6 doi: 10.1007/s10776-022-00567-6
![]() |
[7] |
M. K. Gupta, P. Chandra, Effects of similarity/distance metrics on k-means algorithm with respect to its applications in IoT and multimedia: A review, Multi. Tools Appl., 81 (2022), 37007–37032. https://doi.org/10.1007/s11042-021-11255-7 doi: 10.1007/s11042-021-11255-7
![]() |
[8] |
L. A.Tawalbeh, F. Muheidat, M. Tawalbeh, M. Quwaider, IoT Privacy and security: Challenges and solutions, Appl. Sci., 10 (2020), 4102. https://doi.org/10.3390/app10124102 doi: 10.3390/app10124102
![]() |
[9] | J. Lloret, M. García, F. Boronat, IPTV: la televisión por Internet, Editorial Vértice, Málaga, España, (2008), 230. |
[10] |
A. Rego, A. Canovas, J. M. Jiménez, J. Lloret, An intelligent system for video surveillance in IoT environments, IEEE Access, 6 (2018), 31580–31598. https://doi.org/10.1109/ACCESS.2018.2842034 doi: 10.1109/ACCESS.2018.2842034
![]() |
[11] |
I. H. Sarker, Machine learning: Algorithms, real-world applications and research directions, SN Computer Sci., 2 (2021), 160. https://doi.org/10.1007/s42979-021-00592-x doi: 10.1007/s42979-021-00592-x
![]() |
[12] |
K. Haseeb, T. Saba, A. Rehman, I. Ahmed, J. Lloret, Efficient data uncertainty management for health industrial internet of things using machine learning, Int. J. Commun. Syst., 34 (2021), e4948. https://doi.org/10.1002/dac.4948 doi: 10.1002/dac.4948
![]() |
[13] | J. Serra, L. Sanabria-Russo, D. Pubill, C. Verikoukis, Scalable and flexible IoT data analytics: When machine learning meets SDN and virtualization, in 2018 IEEE 23rd International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD), 2018, IEEE. https://doi.org/10.1109/CAMAD.2018.8514997 |
[14] |
W. Chen, Intelligent manufacturing production line data monitoring system for industrial internet of things, Computer Commun., 151 (2020), 31–41. https://doi.org/10.1016/j.comcom.2019.12.035 doi: 10.1016/j.comcom.2019.12.035
![]() |
[15] |
A. Rehman, T. Saba, K. Haseeb, R. Singh, G. Jeon, Smart health analysis system using regression analysis with iterative hashing for IoT communication networks, Computers Electr. Eng., 104 (2022), 108456. https://doi.org/10.1016/j.compeleceng.2022.108456 doi: 10.1016/j.compeleceng.2022.108456
![]() |
[16] | L. Sanabria-Russo, J. Alonso-Zarate, C. Verikoukis. SDN-based pro-active flow installation mechanism for delay reduction in IoT, in 2018 IEEE Global Communications Conference (GLOBECOM), 2018, IEEE. https://doi.org/10.1109/GLOCOM.2018.8647382 |
[17] |
B. Zong, C. Fan, X. Wang, X. Duan, B. Wang, J. Wang, 6G technologies: Key drivers, core requirements, system architectures, and enabling technologies, IEEE Vehicular Technol. Mag., 14 (2019), 18–27. https://doi.org/10.1109/MVT.2019.2921398 doi: 10.1109/MVT.2019.2921398
![]() |
[18] | L. Mucchi, S. Jayousi, S. Caputo, E. Paoletti, P. Zoppi, S. Geli, et al., How 6G technology can change the future wireless healthcare, in 2020 2nd 6G wireless summit (6G SUMMIT), 2020, IEEE. https://doi.org/10.1109/6GSUMMIT49458.2020.9083916 |
[19] |
S. A. Dehkordi, K. Farajzadeh, J. Rezazadeh, R. Farahbakhsh, K. Sandrasegaran, M. A. Dehkordi, A survey on data aggregation techniques in IoT sensor networks, Wireless Networks, 26 (2020), 1243–1263. https://doi.org/10.1007/s11276-019-02142-z doi: 10.1007/s11276-019-02142-z
![]() |
[20] |
M. Alam, A. A. Aziz, S. Latif, A. Awang, Error-aware data clustering for in-network data reduction in wireless sensor networks, Sensors, 20 (2020), 1011. https://doi.org/10.3390/s20041011 doi: 10.3390/s20041011
![]() |
[21] |
X. Duan, N. Song, F. Mo, An edge intelligence-enhanced quantitative assessment model for implicit working gain under mobile internet of things, Math. Biosci. Eng., 20 (2023), 7548–7564. https://doi.org/10.3934/mbe.2023326 doi: 10.3934/mbe.2023326
![]() |
[22] |
L. P.Verma, V. K. Sharma, M. Kumar, A. Mahanti, An adaptive multi-path data transfer approach for MP-TCP, Wireless Networks, (2022), 1–28. https://doi.org/10.1007/s11276-022-02958-2 doi: 10.1007/s11276-022-02958-2
![]() |
[23] |
H.-S. Kim, J. Paek, D. E. Culler, S. Bahk, PC-RPL: Joint control of routing topology and transmission power in real low-power and lossy networks, ACM Transact. Sensor Networks (TOSN), 16 (2020), 1–32. https://doi.org/10.1145/3372026 doi: 10.1145/3372026
![]() |
[24] |
N. A. Zardari, R. Ngah, O. Hayat, A. H. Sodhro, Adaptive mobility-aware and reliable routing protocols for healthcare vehicular network, Math. Biosci. Eng., 19 (2022), 7156–7177. https://doi.org/10.1007/s11036-022-02042-1 doi: 10.1007/s11036-022-02042-1
![]() |
[25] |
S. Ksibi, F. Jaidi, A. Bouhoula, A comprehensive study of security and cyber-security risk management within e-Health systems: Synthesis, analysis and a novel quantified approach, Mobile Networks Appl., (2022), 1–21. https://doi.org/10.3934/mbe.2022338 doi: 10.3934/mbe.2022338
![]() |
[26] |
J. Li, D. Greenwood, M. Kassem, Blockchain in the built environment and construction industry: A systematic review, conceptual models and practical use cases, Autom. Construct., 102 (2019), 288–307. https://doi.org/10.1016/j.autcon.2019.02.005 doi: 10.1016/j.autcon.2019.02.005
![]() |
[27] |
G. Fortino, A. Guerrieri, P. Pace, C. Savaglio, G. Spezzano, Iot platforms and security: An analysis of the leading industrial/commercial solutions, Sensors, 22 (2022), 2196. https://doi.org/10.3390/s22062196 doi: 10.3390/s22062196
![]() |
[28] |
D. C. Nguyen, M. Ding, P. N. Pathirana, A. Seneviratne, J. Li, D. Niyato, et al., 6G Internet of Things: A comprehensive survey, IEEE Int. Things J., 9 (2021), 359–383. https://doi.org/10.1109/JIOT.2021.3103320 doi: 10.1109/JIOT.2021.3103320
![]() |
[29] |
M. Banafaa, I. Shayea, J. Din, M. H. Azmi, A. Alashbi, Y. I. Daradkeh, et al., 6G mobile communication technology: Requirements, targets, applications, challenges, advantages, and opportunities, Alexandr. Eng. J., (2022). https://doi.org/10.1016/j.aej.2022.08.017 doi: 10.1016/j.aej.2022.08.017
![]() |
[30] | H. Lu, L. Wu, G. Fortino, S. Dustdar, Introduction to the special section on cognitive robotics on 5G/6G networks, 2021, in ACM Transactions on Internet Technology (TOIT), 21(2021), 1–3. https://doi.org/10.1145/3476466 |
[31] |
W. Shi, W. Xu, X. You, C. Zhao, K. Wei, Intelligent reflection enabling technologies for integrated and green Internet-of-Everything beyond 5G: Communication, sensing, and security, IEEE Wireless Commun., 2022. https://doi.org/10.1109/MWC.018.2100717 doi: 10.1109/MWC.018.2100717
![]() |
[32] |
H. H. H.Mahmoud, A. A. Amer, T. Ismail, 6G: A comprehensive survey on technologies, applications, challenges, and research problems, Transact. Emerg. Telecommun. Technol., 32 (2021), e4233. https://doi.org/10.1002/ett.4233 doi: 10.1002/ett.4233
![]() |
[33] |
G. Rathee, A. Sharma, H. Saini, R. Kumar, R. Iqbal, A hybrid framework for multimedia data processing in IoT-healthcare using blockchain technology, Multi. Tools Appl., 79 (2020), 9711–9733. https://doi.org/10.1007/s11042-019-07835-3 doi: 10.1007/s11042-019-07835-3
![]() |
[34] |
D. Singh, A. K. Maurya, R. K. Dewang, N. Keshari, A review on Internet of Multimedia Things (IoMT) routing protocols and quality of service, Int. Multi. Things (IoMT), (2022), 1–29. https://doi.org/10.1016/B978-0-32-385845-8.00006- doi: 10.1016/B978-0-32-385845-8.00006-
![]() |
[35] |
A. A. Khan, A. A. Laghari, Z. A. Shaikh, Z. Dacko-Pikiewicz, S. Kot, Internet of Things (IoT) security with blockchain technology: A state-of-the-art review, IEEE Access, (2022). https://doi.org/10.1109/ACCESS.2022.3223370 doi: 10.1109/ACCESS.2022.3223370
![]() |
[36] |
M. A. Jan, J. Cai, X.-C. Gao, F. Khan, S. Mastorakis, M. Usman, et al., Security and blockchain convergence with Internet of Multimedia Things: Current trends, research challenges and future directions, J. Network Computer Appl., 175 (2021), 102918. https://doi.org/10.1016/j.jnca.2020.102918 doi: 10.1016/j.jnca.2020.102918
![]() |
[37] |
Moussa, N., D. Benhaddou, A. El Belrhiti El Alaoui, EARP: An Enhanced ACO-Based Routing Protocol for Wireless Sensor Networks with Multiple Mobile Sinks. Int. J. Wireless Inform. Networks, 29 (2022), 118–129. https://doi.org/10.1007/s10776-021-00545-4 doi: 10.1007/s10776-021-00545-4
![]() |
[38] |
N. Hu, Z. Tian, X. Du, M. Guizani, An energy-efficient in-network computing paradigm for 6G, IEEE Transact. Green Commun. Network., 5 (2021), 1722–1733. https://doi.org/10.1109/TGCN.2021.3099804 doi: 10.1109/TGCN.2021.3099804
![]() |
[39] |
A. Kumar, S. Sharma, N. Goyal, S. K. Gupta, S. Kumari, S. Kumar, Energy-efficient fog computing in Internet of Things based on routing protocol for low-power and lossy network with Contiki, Int. J. Commun. Syst., 35 (2022), e5049. https://doi.org/10.1002/dac.5049 doi: 10.1002/dac.5049
![]() |
[40] |
Z. Liao, J. Peng, J. Huang, J. Wang, J. Wang, P. K. Sharma, et al., Distributed probabilistic offloading in edge computing for 6G-enabled massive Internet of Things, IEEE Int. Things J., 8 (2020), 5298–5308. https://doi.org/10.1109/JIOT.2020.3033298 doi: 10.1109/JIOT.2020.3033298
![]() |
[41] |
K. Thangaramya, K. Kulothungan, S. I. Gandhi, M. Selvi, S. S. Kumar, K. Arputharaj, Intelligent fuzzy rule-based approach with outlier detection for secured routing in WSN, Soft Comput., 24 (2020), 16483–16497. https://doi.org/10.1007/s00500-020-04955-z doi: 10.1007/s00500-020-04955-z
![]() |
[42] |
A. Singh, A. Nagaraju, Low latency and energy efficient routing-aware network coding-based data transmission in multi-hop and multi-sink WSN, Ad Hoc Networks, 107 (2020), 102182. https://doi.org/10.1016/j.adhoc.2020.102182 doi: 10.1016/j.adhoc.2020.102182
![]() |
[43] |
Z. Ming, J. Chen, L. Cui, S. Yang, Y. Pan, W. Xiao, et al., Edge-based video surveillance with graph-assisted reinforcement learning in smart construction, IEEE Int. Things J., 9 (2021), 9249–9265. https://doi.org/10.1109/JIOT.2021.3090513 doi: 10.1109/JIOT.2021.3090513
![]() |
[44] |
B. Kizilkaya, , E. Ever, H. Y. Yatbaz, A. Yazici, An effective forest fire detection framework using heterogeneous wireless multimedia sensor networks, ACM Transact. Multi. Comput., Commun. Appl. (TOMM), 18 (2022), 1–21. https://doi.org/10.1145/3473037 doi: 10.1145/3473037
![]() |
[45] | O. Ibrihich, S.-d. Krit, J. Laassiri, S. El Hajji, Study and simulation of protocols of WSN using NS2, Transact. Eng. Technol., 2015, Springer, 467–480. https://doi.org/10.1007/978-94-017-9804-4_32 |
1. | Kangjian Sun, Ju Huo, Heming Jia, Lin Yue, Reinforcement learning guided Spearman dynamic opposite Gradient-based optimizer for numerical optimization and anchor clustering, 2023, 11, 2288-5048, 12, 10.1093/jcde/qwad109 |
Dataset | Anchor (pixels) | ||
VOC dataset | (23, 44), | (61, 58), | (44,128), |
(110,122), | (108,276), | (222,218), | |
(238,457), | (454,320), | (534,555). | |
Selected dataset | (12, 9), | (12, 10), | (13, 14), |
(16, 11), | (16, 13), | (18, 13), | |
(18, 14), | (22, 13), | (21, 16). |
Resolution | Depth | Format | Memory |
(640 × 480) pixels | 8 bit | .bmp | ≈300 k |
Predicted result = Positive | Predicted result = Negative | |
Actual result = Ture | TP (True Positive) | FN (False Negative) |
Actual result = False | FP (False Positive) | TN (True Negative) |
Model | P (%) | R (%) | F1 (%) | mAP (%) |
YOLOv7 | 97.52 | 96.23 | 96.87 | 97.44 |
ISTD-YOLOv7 | 98.80 | 96.87 | 97.83 | 98.43 |
Model | P (%) | R (%) | F1 (%) | mAP (%) | Pd (%) | Fa (10-6) |
YOLOv3 | 97.15 | 97.45 | 97.30 | 97.27 | 93.93 | 115.88 |
YOLOv5s | 97.72 | 95.00 | 96.35 | 96.91 | 92.77 | 127.63 |
SSD | 92.78 | 41.81 | 57.65 | 87.48 | 77.02 | 1245.72 |
CenterNet | 96.31 | 92.15 | 94.19 | 94.26 | 93.45 | 112.54 |
FCOS | 98.30 | 80.26 | 88.37 | 97.71 | 92.20 | 130.51 |
YOLOXs | 96.90 | 96.25 | 96.60 | 97.37 | 93.19 | 118.65 |
DETR | 97.35 | 96.83 | 97.09 | 97.98 | 93.15 | 119.75 |
ISTD-YOLOv7 | 98.80 | 96.87 | 97.83 | 98.43 | 94.66 | 94.08 |
Model | Params | FLOPs | FPS |
YOLOv3 | 61.561 M | 155.380 G | 48 |
YOLOv5s | 7.082 M | 16.537 G | 79 |
SSD | 24.547 M | 276.251 G | 72 |
CenterNet | 32.665 M | 109.714 G | 49 |
FCOS | 32.127 M | 161.410 G | 25 |
YOLOXs | 8.968 M | 26.927 G | 77 |
DETR | 36.762 M | 73.642 G | 24 |
ISTD-YOLOv7 | 37.232 M | 105.234 G | 36 |
Model | P (%) | R (%) | F1 (%) | mAP (%) |
YOLOv7 | 97.52 | 96.23 | 96.87 | 97.44 |
YOLOv7+Anchor update | 98.42 | 96.34 | 97.37 | 97.85 |
YOLOv7+GE attention | 98.05 | 96.72 | 97.38 | 98.14 |
YOLOv7+NWD | 97.65 | 96.80 | 97.22 | 98.26 |
ISTD-YOLOv7 | 98.80 | 96.87 | 97.83 | 98.43 |
Dataset | Anchor (pixels) | ||
VOC dataset | (23, 44), | (61, 58), | (44,128), |
(110,122), | (108,276), | (222,218), | |
(238,457), | (454,320), | (534,555). | |
Selected dataset | (12, 9), | (12, 10), | (13, 14), |
(16, 11), | (16, 13), | (18, 13), | |
(18, 14), | (22, 13), | (21, 16). |
Resolution | Depth | Format | Memory |
(640 × 480) pixels | 8 bit | .bmp | ≈300 k |
Predicted result = Positive | Predicted result = Negative | |
Actual result = Ture | TP (True Positive) | FN (False Negative) |
Actual result = False | FP (False Positive) | TN (True Negative) |
Model | P (%) | R (%) | F1 (%) | mAP (%) |
YOLOv7 | 97.52 | 96.23 | 96.87 | 97.44 |
ISTD-YOLOv7 | 98.80 | 96.87 | 97.83 | 98.43 |
Model | P (%) | R (%) | F1 (%) | mAP (%) | Pd (%) | Fa (10-6) |
YOLOv3 | 97.15 | 97.45 | 97.30 | 97.27 | 93.93 | 115.88 |
YOLOv5s | 97.72 | 95.00 | 96.35 | 96.91 | 92.77 | 127.63 |
SSD | 92.78 | 41.81 | 57.65 | 87.48 | 77.02 | 1245.72 |
CenterNet | 96.31 | 92.15 | 94.19 | 94.26 | 93.45 | 112.54 |
FCOS | 98.30 | 80.26 | 88.37 | 97.71 | 92.20 | 130.51 |
YOLOXs | 96.90 | 96.25 | 96.60 | 97.37 | 93.19 | 118.65 |
DETR | 97.35 | 96.83 | 97.09 | 97.98 | 93.15 | 119.75 |
ISTD-YOLOv7 | 98.80 | 96.87 | 97.83 | 98.43 | 94.66 | 94.08 |
Model | Params | FLOPs | FPS |
YOLOv3 | 61.561 M | 155.380 G | 48 |
YOLOv5s | 7.082 M | 16.537 G | 79 |
SSD | 24.547 M | 276.251 G | 72 |
CenterNet | 32.665 M | 109.714 G | 49 |
FCOS | 32.127 M | 161.410 G | 25 |
YOLOXs | 8.968 M | 26.927 G | 77 |
DETR | 36.762 M | 73.642 G | 24 |
ISTD-YOLOv7 | 37.232 M | 105.234 G | 36 |
Model | P (%) | R (%) | F1 (%) | mAP (%) |
YOLOv7 | 97.52 | 96.23 | 96.87 | 97.44 |
YOLOv7+Anchor update | 98.42 | 96.34 | 97.37 | 97.85 |
YOLOv7+GE attention | 98.05 | 96.72 | 97.38 | 98.14 |
YOLOv7+NWD | 97.65 | 96.80 | 97.22 | 98.26 |
ISTD-YOLOv7 | 98.80 | 96.87 | 97.83 | 98.43 |