Loading [MathJax]/jax/output/SVG/jax.js
Research article

Enhancing Co3O4 nanoparticles: Investigating the impact of nickel doping and high-temperature annealing on NiCo2O4/CoO heterostructures

  • In this study, we investigated the phase transition of cobalt spinel (Co3O4) nanoparticles into Co3-xNixO4/CoO heterostructures by introducing varying amounts of nickel (x = 0.0–0.16) and subjecting the particles to high annealing temperatures of 1000 ℃. X-ray diffraction (XRD) analysis confirmed the Co3-xNixO4CoO structure for all samples. Transmission electron microscopy (TEM) provided further insights into the phase or heterostructure of the samples after annealing, revealing the arrangement of the two phases. Fourier-transform infrared spectroscopy measurements demonstrated a band shift around 537 cm-1 with increasing Ni content, while ultraviolet-visible (UV-Vis) measurements indicated the energy band (Eg). Significant morphological changes were observed in scanning electron microscope (SEM) measurements at 0.16 Ni, displaying irregular agglomerates. Our findings suggest that introducing Ni into the Co3O4 structure and increasing the annealing temperature to 1000 ℃ can lead to the formation of a heterostructured system. Furthermore, our study's significance is highlighted by the streamlined synthesis of NiCo2O4/CoO using the sol-gel method followed by calcination. This departure from complex techniques provides an efficient route to acquiring the NiCo2O4/CoO system, a promissory material for advancing supercapacitor research.

    Citation: Leydi J. Cardenas F., Josep Ma. Chimenos, Luis C. Moreno A., Elaine C. Paris, Miryam R. Joya. Enhancing Co3O4 nanoparticles: Investigating the impact of nickel doping and high-temperature annealing on NiCo2O4/CoO heterostructures[J]. AIMS Materials Science, 2023, 10(6): 1090-1104. doi: 10.3934/matersci.2023058

    Related Papers:

    [1] Zhijing Xu, Jingjing Su, Kan Huang . A-RetinaNet: A novel RetinaNet with an asymmetric attention fusion mechanism for dim and small drone detection in infrared images. Mathematical Biosciences and Engineering, 2023, 20(4): 6630-6651. doi: 10.3934/mbe.2023285
    [2] Yinghong Xie, Biao Yin, Xiaowei Han, Yan Hao . Improved YOLOv7-based steel surface defect detection algorithm. Mathematical Biosciences and Engineering, 2024, 21(1): 346-368. doi: 10.3934/mbe.2024016
    [3] Siyuan Shen, Xing Zhang, Wenjing Yan, Shuqian Xie, Bingjia Yu, Shizhi Wang . An improved UAV target detection algorithm based on ASFF-YOLOv5s. Mathematical Biosciences and Engineering, 2023, 20(6): 10773-10789. doi: 10.3934/mbe.2023478
    [4] Yang Pan, Jinhua Yang, Lei Zhu, Lina Yao, Bo Zhang . Aerial images object detection method based on cross-scale multi-feature fusion. Mathematical Biosciences and Engineering, 2023, 20(9): 16148-16168. doi: 10.3934/mbe.2023721
    [5] Dawei Li, Suzhen Lin, Xiaofei Lu, Xingwang Zhang, Chenhui Cui, Boran Yang . IMD-Net: Interpretable multi-scale detection network for infrared dim and small objects. Mathematical Biosciences and Engineering, 2024, 21(1): 1712-1737. doi: 10.3934/mbe.2024074
    [6] Chen Chen, Guowu Yuan, Hao Zhou, Yi Ma . Improved YOLOv5s model for key components detection of power transmission lines. Mathematical Biosciences and Engineering, 2023, 20(5): 7738-7760. doi: 10.3934/mbe.2023334
    [7] Zheng Zhang, Xiang Lu, Shouqi Cao . An efficient detection model based on improved YOLOv5s for abnormal surface features of fish. Mathematical Biosciences and Engineering, 2024, 21(2): 1765-1790. doi: 10.3934/mbe.2024076
    [8] Yingying Xu, Chunhe Song, Chu Wang . Few-shot bearing fault detection based on multi-dimensional convolution and attention mechanism. Mathematical Biosciences and Engineering, 2024, 21(4): 4886-4907. doi: 10.3934/mbe.2024216
    [9] Lei Yang, Guowu Yuan, Hao Wu, Wenhua Qian . An ultra-lightweight detector with high accuracy and speed for aerial images. Mathematical Biosciences and Engineering, 2023, 20(8): 13947-13973. doi: 10.3934/mbe.2023621
    [10] Wenjie Liang . Research on a vehicle and pedestrian detection algorithm based on improved attention and feature fusion. Mathematical Biosciences and Engineering, 2024, 21(4): 5782-5802. doi: 10.3934/mbe.2024255
  • In this study, we investigated the phase transition of cobalt spinel (Co3O4) nanoparticles into Co3-xNixO4/CoO heterostructures by introducing varying amounts of nickel (x = 0.0–0.16) and subjecting the particles to high annealing temperatures of 1000 ℃. X-ray diffraction (XRD) analysis confirmed the Co3-xNixO4CoO structure for all samples. Transmission electron microscopy (TEM) provided further insights into the phase or heterostructure of the samples after annealing, revealing the arrangement of the two phases. Fourier-transform infrared spectroscopy measurements demonstrated a band shift around 537 cm-1 with increasing Ni content, while ultraviolet-visible (UV-Vis) measurements indicated the energy band (Eg). Significant morphological changes were observed in scanning electron microscope (SEM) measurements at 0.16 Ni, displaying irregular agglomerates. Our findings suggest that introducing Ni into the Co3O4 structure and increasing the annealing temperature to 1000 ℃ can lead to the formation of a heterostructured system. Furthermore, our study's significance is highlighted by the streamlined synthesis of NiCo2O4/CoO using the sol-gel method followed by calcination. This departure from complex techniques provides an efficient route to acquiring the NiCo2O4/CoO system, a promissory material for advancing supercapacitor research.



    Infrared detection technology is one of the main means to obtain modern information. Compared with visible detection systems, the infrared detection system has the advantages of strong penetration, long detection distance and all-weather visibility. Therefore, infrared detection technology attracts more and more researchers and is widely used in military [1], medical [2], meteorological [3] and other fields. With the gradual opening of low-altitude airspace, unmanned aerial vehicles (UAVs) can be used to collect and track ground targets by carrying infrared equipment. How to effectively detect small targets from the aerial view has significant theoretical significance and engineering demand, as well as social value and economic significance.

    In recent years, with the rapid development of deep learning technology, the target detection method has also changed from the traditional method based on manually designed features to the deep neural network (DNN) method based on automatically learned features [4,5]. The deep learning-based target detection methods are generally divided into two-stage methods and one-stage methods [6]. The two-stage methods generate region proposals and then classify them. The classic models are the region-convolutional neural network (R-CNN) series [7], including Fast R-CNN [8], Faster R-CNN [9], Mask R-CNN [10] and so on. They have high detection accuracy, but their detection speed is slow. It is difficult to apply in real-time detection scenarios. The one-stage methods do not have the stage of generating region proposals. They directly generate the final detection results through one stage, so they have a faster detection speed. The classic models are the YOLO series [11], including YOLOv3 [12], YOLOv5 [13], YOLOX [14] and so on.

    YOLOv7 [15] is a novel model of the YOLO series, which surpasses most known target detectors in terms of accuracy and speed. Since 2022, YOLOv7 has been implemented in some real-world detection tasks. Soeb et al. [16] created a leaf image dataset from Bangladesh and used YOLOv7 for disease diagnosis. This study provided a solution for precision agriculture applications. Li et al. [17] improved YOLOv7 by embedding gamma correction, improved convolutional block attention module and Alpha GIOU. The improved model was used for the damages detection of aeroengine blades. Driver abnormal behavior is a serious threat to public safety. Liu et al. proposed the CEAMYOLOv7 model for distraction behavior recognition. The global attention mechanism (GAM) was introduced into YOLOv7 to enhance the network's capability to extract key features. The channel expansion (CE) method was also proposed for data augmentation. Moreover, the lightweight processing made the model easier to be deployed. More projects based on YOLOv7 are still being explored [18].

    Although the above models show impressive performance in related works, the task of infrared small target detection is still a challenge. On the one hand, due to the long observation distance there is little shape and texture information of infrared small targets. On the other hand, due to the complex background infrared small targets may be obscured and overlapped [19,20]. To detect infrared small targets, researchers have developed some pioneering works. Zhang et al. [21] incorporated target shape reconstruction into the detection of infrared small targets and proposed the ISNet model. Based on Taylor finite difference (TFD)-inspired edge block and two-orientation attention aggregation (TOAA) block, the model can effectively extract edge features and aggregate cross-level features. Additionally, the authors established a new large-scale benchmark, IRSTD-1k, to validate the effectiveness of the proposed idea. To handle the problem of the loss of targets in deep layers, Li et al. [22] proposed a dense nested attention network (DNA-Net). Specifically, the dense nested interactive module (DNIM) and the cascaded channel and spatial attention module (CSAM) were designed to achieve repetitive fusion and enhancement between feature layers. Additionally, an infrared small target dataset, namely NUDT-SIRST, was developed. Results on a set of proposed evaluation metrics showed that the proposed method achieved better performance. A multi-level TransUNet (MTU-Net) in [23] was proposed to detect space-based infrared tiny ships. The Vision Transformer (ViT) Convolutional Neural Network (CNN) hybrid can extract multi-level features. Wu et al. also proposed a copy-rotateresize-paste (CRRP) data augmentation method that alleviates the problem of sample imbalance. Additionally, the authors designed a FocalIoU loss to achieve target localization and shape description. Establishing the largest space-based infrared tiny ship detection dataset NUDT-SIRSTSea was a significant work. In 2022, Lin et al. [24] comprehensively considered the detection performance and practical deployment, and proposed a light-weight infrared small target detection network LIRDNet. This model combined cross-scale feature fusion module (CFM) and bottleneck attention module (BAM). The experimental results demonstrated that the CFM and BAM modules further improved the detection performance with a low amount of parameters and computations. Liu et al. [25] proposed a lightweight model for ship detection in SAR images. Authors added the coordinate attention into the backbone of YOLOv7-tiny, and improved the SPP block and the loss function. Compared with the original model, the precision of the proposed model was increased by 4.6%. This work had not yet been deployed on edge devices. Similarly, Guo et al. [26] also proposed a lightweight SAR ship target detection based on YOLO, namely LMSD-YOLO. This model has better multi-scale adaptation capabilities and has been successfully deployed on mobile platforms. However, there are still difficulties in implementing target detection directly from large-scale SAR images. Zhou et al. [27] improved YOLOv5 to make the model to perform the small target detection task. It is worth noting that authors used the Super-Resolution Generative Adversarial Network (SRGAN) to generate super-resolution images and input images into the improved detection model. Experiments verified that the super-resolution reconstruction for images can improve the detection accuracy of small targets. The disadvantage is that the process of super-resolution reconstruction is very time-consuming.

    In this paper, the recent YOLOv7 model as the baseline is used for infrared small target detection. To make the model better adapt to this task domain, we make targeted improvements to YOLOv7 and propose a new detection model namely ISTD-YOLOv7. Our main contributions are summarized as follows:

    1) An improved YOLOv7 model (namely, ISTD-YOLOv7) is proposed for infrared small target detection.

    2) The update of anchors can make the model to converge better and faster. Feature context and spatial location information can be efficiently exploited by GE attention. NWD can alleviate the sensitivity location deviations of small targets.

    3) The performance of ISTD-YOLOv7 is compared with existing models. Ablation studies are performed to investigate the impact of each component. Experiments on a public dataset demonstrate the superiority of the proposed model in infrared small target detection.

    The remainder of this paper is organized as follows: Section 2 briefly introduces the YOLOv7 model. Section 3 describes the mechanism of the improved components and presents the improved model. Experimental results and analysis are given in Section 4. Section 5 summarizes the work of this paper.

    YOLOv7, as one of the latest representative models of the YOLO series, was proposed by Wang et al. [15] in 2022. Compared to previous YOLO series, the main contributions of YOLOv7 are that authors proposed the model re-parameterization, model scaling, extended efficient layer aggregation networks (E-ELAN), etc. This series of architectural alterations makes YOLOv7 not only more accurate, but also faster. The concise network structure of YOLOv7 is shown in Figure 1 [15]. More details of the component blocks can be found in [15].

    Figure 1.  YOLOv7 model [15].

    First, the model resizes the input images to (640 × 640) pixels. Then, the images are input to the backbone network for feature extraction. The backbone network of YOLOv7 consists of several CBS blocks, ELAN blocks and MP blocks. The obtained features of different scales are fused by the neck network. The neck network adopts the structure of the path aggregation feature pyramid network (PAFPN). Then, the head (prediction) network adjusts the number of channels of feature maps based on the RepConv blocks. Finally, the bounding box information confidence and category probability are output.

    The sizes of the anchors are obtained by clustering the width and height of the ground-truth boxes of the training samples. Whether the anchors are reasonable or not greatly affects the detection performance of the model. Generally speaking, the anchors of YOLOv7 are obtained by clustering based on the VOC dataset or the COCO dataset in the training process. VOC dataset provides 20 classes of targets, including person, horses, bicycles, motorbike and more [28]. The COCO dataset focuses on scene understanding and provides 80 classes of targets. These targets are mainly obtained from everyday scenes [29]. The VOC dataset and the COCO dataset are common large-scale datasets in target detection. However, the sizes of targets in these datasets are significantly different from those in infrared small target datasets.

    In this paper, in order to make YOLOv7 converge better and faster, the K-means method is used to re-cluster the sizes of the targets based on the selected dataset. The number K of clustering centers is set to 9. The selected dataset in this paper is described in Section 4. Figure 2 shows the clustering results of the VOC dataset and the selected dataset. It can be seen that the distribution of cluster centers varies greatly. The target size of the VOC dataset can be several hundred pixels, while the target size of the selected dataset is obviously much smaller. Table 1 gives the results of anchors. The anchor update can provide a reasonable prior for the detection model.

    Figure 2.  Results of clustering.
    Table 1.  Results of anchors.
    Dataset Anchor (pixels)
    VOC dataset (23, 44), (61, 58), (44,128),
    (110,122), (108,276), (222,218),
    (238,457), (454,320), (534,555).
    Selected dataset (12, 9), (12, 10), (13, 14),
    (16, 11), (16, 13), (18, 13),
    (18, 14), (22, 13), (21, 16).

     | Show Table
    DownLoad: CSV

    For images or feature maps, the context information of the space can improve the representational capability of the network. In 2018, the Gather-Excite (GE) attention mechanism was proposed by Hu et al. [30]. This mechanism defines two operators: gather operator and excite operator. Figure 3 shows the operation process of the two operators [30]. The gather operator ξG extracts features from local spatial locations, defined as shown in Eq (1). The excite operator ξE maps features to the original scale, defined as shown in Eq (2).

    ξG:RH×W×CRH×W×C (1)

    where H, W and C represent the height, width and channel of any input x, e represents the extent ratio, H=H/e, W=W/e. A global extent ratio using global average pooling is used in this paper.

    ξE(x,ˆx)=xf(ˆx) (2)
    f:RH×W×C[0,1]H×W×C (3)

    where ˆx represents the output after processing by ξG, represents the Hadamard product, f represents a map relationship.

    Figure 3.  GE block [30].

    In this paper, three GE attention blocks are added at three output branches of the backbone network of YOLOv7 respectively. The diagram is shown in Figure 4. Infrared small targets have the characteristics of small size and dim signal. Therefore, location information is essential for the detection of small targets. By adding GE attention blocks to the backbone feature extraction network of YOLOv7, the model can more efficiently exploit feature context and spatial location information for infrared small targets.

    Figure 4.  Diagram of adding location.

    The sensitivity of IoU metric to targets with different scales is quite variant. For small targets, a slight location change may lead to a significant change in IoU. However, for targets with normal size, the change of IoU is slight for the same location deviation [31]. Figure 5 gives a specific analysis. For a small target, a location deviation leads to an IoU drop from 0.47 to 0.02. However, for a normal target, the same location deviation only leads to an IoU drop from 0.83 to 0.49.

    Figure 5.  Sensitivity analysis of IoU.

    Wang et al. [31] proposed a novel metric method based on the Wasserstein distance. Specifically, the bounding box is modeled as the 2D Gaussian distribution, and then the similarity between the corresponding Gaussian distributions is calculated by using the proposed metric, namely the Normalized Wasserstein Distance (NWD). Figure 6 [31] shows the deviation curves of IOU and NWD under different target sizes. As the target size becomes smaller, the IoU-deviation curves decrease faster, while the NWD-deviation curves remain overlapped and smooth. Compared with IOU, NWD is insensitive to location deviations of small targets. Some research has been presented in the literature regarding the theoretical and empirical benefits of using NWD [32,33,34].

    Figure 6.  Deviation curves of IoU and NWD [31].

    Specifically, for a bounding box (cx, cy, w, h), the intrinsic elliptic of the bounding box can be expressed as:

    (xμx)2σ2x+(yμy)2σ2y=1 (4)

    where (cx, cy), w and h represent the center coordinate, width and height of the bounding box respectively. (μx, μy), σx and σy represent the center coordinates of the ellipse, the length of the X-axis and the length of the Y-axis respectively. Therefore, μx = cx, μy = cy, σx = w/2 and σy = h/2. The probability density function of the 2D Gaussian distribution is as follows:

    f(xμ,Σ)=exp(12(xμ)TΣ1(xμ))2π|Σ|12 (5)

    where x, μ and Σ represent the coordinate (x, y), mean and co-variance of the distribution respectively. When (xμ)TΣ1(xμ)=1, the bounding box can be modelled as a 2D Gaussian distribution N(μ,Σ) with:

    μ=[cxcy], Σ=[w2400h24] (6)

    For Gaussian distributions Na and Nb which are modeled from bounding boxes (cxa, cya, wa, ha) and (cxb, cyb, wb, hb), the Wasserstein distance is shown in Eq (7). After normalization, the final form of NWD metric is obtained, namely Eq (8).

    W22(Na,Nb)=([cxa,cya,wa2,ha2]T,[cxb,cyb,wb2hb2]T)22 (7)
    NWD(Na,Nb)=exp(W22(Na,Nb)C) (8)

    In this paper, NWD is integrated into YOLOv7 to replace IoU. The specific improvement part is the loss function of YOLOv7. NWD-based regression loss can not only solve the issue that YOLOv7 is sensitive to the location deviation of small targets, but also still provide gradient to optimize the network in some cases. The improved loss function of YOLOv7 is as follows:

    LISTDYOLOv7=1NWD(Np,Ng) (9)

    where Np and Ng represent the Gaussian distribution model of prediction box p and ground-truth box g respectively.

    In order to more effectively detect small targets in infrared image data, we propose the ISTD-YOLOv7 model, which can maintain good performance. The diagram of the model is shown in Figure 7. First, the infrared images enter the backbone network consisting of convolution groups to extract features. After that, these features enter designed GE blocks. GE blocks are added at three output branches of the backbone network to exploit feature context and spatial location information. Then, the neck network with PAFPN structure is used for feature fusion, producing better semantic information. Finally, the feature maps of various scales enter the head network to produce the prediction results.

    Figure 7.  ISTD-YOLOv7 model.

    The purpose of the training process is to continuously reduce the difference between the prediction results and ground truth boxes. In this paper, the prediction results are iteratively optimized by the NWD-based loss function. The NWD metric is insensitive to location deviations of small targets. For the testing process, we use the trained model for inference and obtain the prediction results. The size of the small target is re-clustered to obtain anchors. The predicted bounding boxes are adjusted based on updated anchors. Then, the final detection result is obtained after non-maximum suppression (NMS) [35].

    All experiments are run on a computer with an Intel(R) Core(TM) i9-12900KF (64 GB DDR5) CPU, one NVIDIA GeForce RTX 3090Ti (24 GB) GPU and the Microsoft Windows 10 system. The deep learning framework is PyTorch 1.7.1. The stochastic gradient descent (SGD) optimizer with an initial learning rate of 0.01, a weight decay of 0.0005 and a momentum of 0.937 is chosen to reduce the loss function. The batch size is 32 and the number of epochs is 300.

    The dataset in this paper was published by Fu et al. [36] and has been used in some official competitions. All images in this dataset were taken by a UAV equipped with an infrared camera. The dataset includes 21,750 images, 8 classes and 89,174 targets, where targets are some vehicles under ground background. More details of the dataset are given in Table 2. We randomly divided the training set, validation set and testing set in the ratio of 8:1:1. The main challenges of this dataset focus on the complex environment interference and complex imaging conditions. It can provide material bases for the research of infrared image characteristics, infrared small target detection and tracking.

    Table 2.  Details of dataset.
    Resolution Depth Format Memory
    (640 × 480) pixels 8 bit .bmp ≈300 k

     | Show Table
    DownLoad: CSV

    In order to evaluate the detection performance of the model, some evaluation indices are selected in this paper, including: Precision, Recall, F1 score, Average Precision and mean Average Precision. These indices are all in the range of [0, 1], and the larger the values are, the better the results will be. Their equations are as follows [37,38]:

    P=TPTP+FP (10)
    R=TPTP+FN (11)
    F1=2×P×RP+R (12)
    AP=10P(R)dR (13)
    mAP=1CCiAPi (14)

    where TP represents true positive, FP represents false positive and FN represents false negative. The confusion matrix is given in Table 3. C represents the number of classes. P represents Precision, R represents Recall, F1 represents F1 score, AP represents Average Precision and mAP represents mean Average Precision. mAP is the mean of APs of all classes and enables the evaluation of the overall detection accuracy of the model.

    Table 3.  Confusion matrix.
    Predicted result = Positive Predicted result = Negative
    Actual result = Ture TP (True Positive) FN (False Negative)
    Actual result = False FP (False Positive) TN (True Negative)

     | Show Table
    DownLoad: CSV

    The above indices can evaluate the pixel-level performance. Some research [21,22,23,24] has demonstrated that target-level performance is also important for infrared small target with limited shape and texture information. The probability of detection and the false-alarm rate are defined as follows:

    Pd=TcorrectTAll (15)
    Fa=PfalsePAll (16)

    where Pd represents the probability of detection, Fa represents the false-alarm rate. Tcorerect represents the correctly predicted target number, TAll represents all target number. Targets are correctly predicted if the centroid deviation of the targets is smaller than the threshold Tdistance. In this paper, Tdistance is set to 3 [21,22,23,24]. Pfalse represents the falsely predicted pixels, PAll represents all image pixels. Pixels are incorrectly predicted if the centroid deviation of the targets is larger than the threshold Tdistance.

    In this section, the performance of ISTD-YOLOv7 and YOLOv7 is compared from three aspects: training process, verification process and testing process. Before training the two models, data augmentation technologies are used to enhance the data randomly. Taking two data augmentation methods, Mixup and Mosaic, as examples, Figure 8 shows the infrared image results obtained after processing by the two methods. Mixup uses simple linear interpolation on two random infrared images to construct new training samples, as shown in Figure 8(a)(d). Mosaic randomly intercepts four infrared images and merges them into one infrared image as new training data, as shown in Figure 8(e)(h). Data augmentation technology can greatly enrich the training data, improve the generalization capability of the model and make the network more robust.

    Figure 8.  Results of data augmentation.

    Figure 9 shows the convergence curves of ISTD-YOLOv7 and YOLOv7 on the training set and the verification set respectively. The red line is the original data of ISTD-YOLOv7, the coral line is the original data of YOLOv7, the green line is the smoothed data of ISTD-YOLOv7, and the brown line is the smoothed data of YOLOv7. It can be seen from Figure 9(a) that, in the training process, the convergence curve of ISTD-YOLOv7 is located below the convergence curve of YOLOv7. It shows that the convergence accuracy of ISTD-YOLOv7 is better than that of YOLOv7. In addition, it can be seen that the convergence speed of ISTD-YOLOv7 is better than that of YOLOv7. Specifically, ISTD-YOLOv7 escapes from local optima more quickly, achieving global optima at about 190 iterations, while YOLOv7 needs more than 210 iterations to achieve convergence. Similarly, it can be seen from Figure 9(b) that in the verification process, the convergence curve of ISTD-YOLOv7 is more stable and flatter, and the whole is located below the convergence curve of YOLOv7. It is worth noting that, after 250 iterations, the convergence curve of YOLOv7 shows a significant rise. It means that the YOLOv7 model is overfitting, while the ISTD-YOLOv7 model can better characterize this hard dataset of infrared small targets.

    Figure 9.  Loss curve of YOLOv7 and ISTD-YOLOv7.

    On the basis of comparing the training process and the verification process, the performance of the two models is evaluated on the testing set. The testing set contains 2175 infrared small target images. The number of targets in each class is shown in Figure 10. Table 4 compares the evaluation results of the two models on the testing set. Note that the best result in this paper is marked in bold. From Table 4, it can be found that ISTD-YOLOv7 has improvements compared with YOLOv7 in precision (from 97.52% to 98.80%), recall (from 96.23% to 96.87%), F1 (from 96.87% to 97.83%) and mAP (from 97.44% to 98.43%). These are made possible by the application of improvements enhancing the feature extraction capability of the network for limited information, improving the recall of the model and making ISTD-YOLOv7 detect more precisely.

    Figure 10.  Information about testing set.
    Table 4.  Evaluation results of YOLOv7 and ISTD-YOLOv7.
    Model P (%) R (%) F1 (%) mAP (%)
    YOLOv7 97.52 96.23 96.87 97.44
    ISTD-YOLOv7 98.80 96.87 97.83 98.43

     | Show Table
    DownLoad: CSV

    In this section, ISTD-YOLOv7 are compared with other state-of-the-art detection models. YOLOv3 [12], YOLOv5s [13] and YOLOXs [14] are also from the YOLO family, but they have not been tested on the dataset of this paper. SSD [39] is the anchor-based model. CenterNet [40] and FCOS [41] are the anchor-free models. DETR [42] is the first detection model based on a transformer.

    Figure 11 shows the AP value of each class of different models. The ordinate indicates the class and the abscissa indicates the AP value. The AP values of each model are sorted from large to small and then displayed from top to bottom. The index AP comprehensively considers the balance between precision and recall under different confidence levels. ISTD-YOLOv7 is the only model with AP values over 96% in all classes. It proves that our model has a better overall detection effect on the given dataset. In addition, it is not difficult to find that the AP values of the eighth class of all models except FCOS are all the lowest. This is because the number of the eighth-class targets in the training set is fewer, and the models cannot learn the feature information of this class more fully. Nevertheless, the AP value of our model in the eighth class is more than 96%, while the AP value of SSD model in the eighth class is only more than 75%. mAP is the mean of all classes of AP and cannot reflect the above potential results.

    Figure 11.  AP of each class of different models.

    More quantitative results are given in Table 5. In terms of precision, ISTD-YOLOv7 obtains the best result of 98.80%. YOLOv3 obtains the best recall of 97.45%, and ISTD-YOLOv7 ranked second. F1 and mAP are two comprehensive indices, and our model significantly outperforms the comparison models. Moreover, in term of the target-level performance, Pd is the ratio of correctly predicted targets and all targets, and Fa is the ratio of false predicted target pixels and all the pixels in the image. Our model achieves 94.66% on Pd and 94.08 × 10-6 on Fa. The performance of SSD is not satisfactory on the given dataset. These findings show that ISTD-YOLOv7 performs better overall than comparison models regarding its capacity to detect infrared small targets. This is attributed to YOLOv7's own network structure and our focused improvements to it. Facing the infrared small targets in complex scenes, the updated anchors, GE attention and NWD-based loss in ISTD-YOLOv7 substantially improve the convergence performance and feature extraction capability of the network and alleviate the sensitivity to the location deviation of small targets.

    Table 5.  Evaluation results of different models.
    Model P (%) R (%) F1 (%) mAP (%) Pd (%) Fa (10-6)
    YOLOv3 97.15 97.45 97.30 97.27 93.93 115.88
    YOLOv5s 97.72 95.00 96.35 96.91 92.77 127.63
    SSD 92.78 41.81 57.65 87.48 77.02 1245.72
    CenterNet 96.31 92.15 94.19 94.26 93.45 112.54
    FCOS 98.30 80.26 88.37 97.71 92.20 130.51
    YOLOXs 96.90 96.25 96.60 97.37 93.19 118.65
    DETR 97.35 96.83 97.09 97.98 93.15 119.75
    ISTD-YOLOv7 98.80 96.87 97.83 98.43 94.66 94.08

     | Show Table
    DownLoad: CSV

    The ground truths and the qualitative results of all models are provided in Figures 1215. The qualitative results show the class and the confidence of the detected target in different colors. Here, "Target 1" to "Target 8" respectively represent eight different infrared small vehicles. Limited to space, we only show some typical results of different methods. Image 1 is selected from the day outfield scene, Image 2 is selected from the day infield scene, Image 3 is selected from the night outfield scene and Image 4 is selected from the night infield scene. In Image 1, YOLOXs has obvious false detection cases. In Image 2, only CenterNet and ISTD-YOLOv7 detect all targets, while other models have different degrees of missed detection phenomena. Further analysis of missed detection phenomena shows that, because the "Target 7" is very weak and almost submerged in the background, it is more difficult to detect. In this case, ISTD-YOLOv7 can still detect it with a confidence of 0.78. SSD is the model with the most severe missed detection phenomena, only detecting "Target 1". It can be seen that eight models detect all infrared small vehicles in Image 3. ISTD-YOLOv7 detects targets with significantly high confidence levels. In Image 4, SSD and FCOS have missed detection phenomena. ISTD-YOLOv7 is not affected by white noise in complex scenes during the detection process, and the confidence level of the detection results on "Target 1", "Target 2" and "Target 3" is 1.00. The qualitative results more intuitively prove the superiority of our model.

    Figure 12.  Visual results of Image 1.
    Figure 13.  Visual results of Image 2.
    Figure 14.  Visual results of Image 3.
    Figure 15.  Visual results of Image 4.

    To further discuss the detection results of our model, we crop and enlarge the obtained targets on Images 1–4, as shown in Figure 16. It is not difficult to find that our model detects all the targets in the four images. The displayed cropped targets are potentially helpful for situation analysis and target attack on the battlefield.

    Figure 16.  Results of obtained targets by ISTD-YOLOv7.

    In this section, the model parameters, floating-point operations per second (FLOPs) and frames per second (FPS) are also calculated. Spatial complexity determines the number of parameters in the model, and time complexity can be measured using FLOPs. FPS is used to evaluate the detection speed, which is tested on one 3090Ti GPU. According to Table 6, it can be seen that YOLOv5s has lower parameters, smaller computations, and faster inference speed. YOLOXs ranks second overall. ISTD-YOLOv7 has 37.232 M parameters, 105.234 G FLOPs and 36 FPS. In terms of FPS, YOLOv5s, SSD and YOLOXs have significant advantages. ISTD-YOLOv7 ranks in the middle on various evaluation indices. In summary, our model achieves better detection performance within an acceptable time. However, our model is not lightweight enough and does not have an advantage in complexity, which is the limitation of current work.

    Table 6.  Params, FLOPs, and FPS of different models.
    Model Params FLOPs FPS
    YOLOv3 61.561 M 155.380 G 48
    YOLOv5s 7.082 M 16.537 G 79
    SSD 24.547 M 276.251 G 72
    CenterNet 32.665 M 109.714 G 49
    FCOS 32.127 M 161.410 G 25
    YOLOXs 8.968 M 26.927 G 77
    DETR 36.762 M 73.642 G 24
    ISTD-YOLOv7 37.232 M 105.234 G 36

     | Show Table
    DownLoad: CSV

    In this section, ablation studies are carried out to verify the effectiveness of the improved components. Table 7 shows the results of ablation studies. Compared with the baseline model, the detection performance of all four improved other models is improved. Moreover, ISTD-YOLOv7 obtains the best results on all indices. It indicates that the three components improve the performance of the model in small target detection from different aspects, and the gain effect of the hybrid model increases the most. Specifically, resetting anchors of the small target dataset can make the model better adapt to the given task. In this way, the bounding box can fine-tune the high-quality anchor to obtain the detection results. Figure 17 shows heat maps before and after adding the GE attention blocks. Figure 17(a)(c) are heat maps without attention, and Figure 17(d)(f) are heat maps with attention. The darker the color, the more significant the target area is. It is not difficult to find that adding the attention mechanism can make the model focus more on the local characteristics of infrared small targets and ignore irrelevant background information. NWD-based loss can better eliminate the performance gap between training and testing, and is suitable for small target detectors. The NWD metric can handle the problem that small targets are easy to be falsely predicted because the IoU metric is sensitive to the location deviation of the small targets.

    Table 7.  Results of ablation studies.
    Model P (%) R (%) F1 (%) mAP (%)
    YOLOv7 97.52 96.23 96.87 97.44
    YOLOv7+Anchor update 98.42 96.34 97.37 97.85
    YOLOv7+GE attention 98.05 96.72 97.38 98.14
    YOLOv7+NWD 97.65 96.80 97.22 98.26
    ISTD-YOLOv7 98.80 96.87 97.83 98.43

     | Show Table
    DownLoad: CSV
    Figure 17.  Comparison of heat maps without and with attention.

    Infrared small targets are dim and have low signal-to-noise ratio. In complex weather and terrain scenes, infrared vehicles are easily overlooked, and most current models cannot effectively detect them. In this paper, ISTD-YOLOv7 based on YOLOv7 is proposed for infrared small target detection. In order to improve YOLOv7 to adapt this task, we have adopted a series of targeted improvements.

    ISTD-YOLOv7 includes anchor update and GE attention as well as the NWD loss function. On a public infrared small target dataset, a series of experimental results reveal that ISTD-YOLOv7 is superior to comparison models (YOLOv3, YOLOv5s, SSD, CenterNet, FCOS, YOLOXs, DETR and YOLOv7), and the improvements are effective. Compared with the baseline model, the mAP of ISTD-YOLOv7 improved from 97.44% to 98.43%. The major causes of the high detection performance are as follows: the update of anchor provides a more reasonable prior. Spatial location is more important for the detection of small targets, so GE attention is chosen to make the model more efficiently exploit feature context information. The NWD loss function contributes to solving the sensitivity of the IoU metric to small target location deviation.

    It should be mentioned that there are still limitations to this work. First, there is a problem of the class imbalance in the dataset used. Second, our model is still not lightweight enough. For future research, we will use a Generative Adversarial Network (GAN) [43] to increase samples for training. In addition, we will reduce the parameters and computations of the model as much as possible for deployment applications.

    The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

    This work was supported by the National Natural Science Foundation of China (Grant No. 61473100).

    The authors declare there is no conflict of interest.



    [1] Wu Z, Zhu Y, Ji X (2014) NiCo2O4-based materials for electrochemical supercapacitors. J Mater Chem A 2: 14759–14772. https://doi.org/10.1039/C4TA02390K doi: 10.1039/C4TA02390K
    [2] Cheng P, Dang F, Wang Y, et al. (2021) Gas sensor towards n-butanol at low temperature detection: Hierarchical flower-like Ni-doped Co3O4 based on solvent-dependent synthesis. Sens Actuators B Chem 328: 129028. https://doi.org/10.1016/j.snb.2020.129028 doi: 10.1016/j.snb.2020.129028
    [3] Shepit M, Paidi VK, Roberts CA, et al. (2021) Unusual magnetism in CuxCo3xO nanoparticles. Phys Rev B 103: 024448. https://doi.org/10.1103/PhysRevB.103.024448 doi: 10.1103/PhysRevB.103.024448
    [4] Wu B, Shan C, Zhang X, et al. (2021) CeO2/Co3O4 porous nanosheet prepared using rose petal as biotemplate for photo-catalytic degradation of organic contaminants. Appl Surf Sci 543: 148677. https://doi.org/10.1016/j.apsusc.2020.148677 doi: 10.1016/j.apsusc.2020.148677
    [5] Li QP, Liu FQ, Mu XL, et al. (2021) Co3O4/CdS energy-storing nanocomposite: A promising photoanode for photoelectrochemical cathodic protection in the dark. J Alloys Compd 870: 159340. https://doi.org/10.1016/j.jallcom.2021.159340 doi: 10.1016/j.jallcom.2021.159340
    [6] V-Niño ED, Díaz Lantada A, Lonne Q, et al. (2018) Manufacturing of polymeric substrates with copper nanofillers through laser stereolithography technique. Polymers 10: 1325. https://doi.org/10.3390/polym10121325 doi: 10.3390/polym10121325
    [7] Keerthana SP, Yuvakkumar R, Senthil Kumar P, et al. (2021) Influence of tin (Sn) doping on Co3O4 for enhanced photocatalytic dye degradation. Chemosphere 277: 130325. https://doi.org/10.1016/j.chemosphere.2021.130325 doi: 10.1016/j.chemosphere.2021.130325
    [8] Abdallah AM, Awad R (2021) Sm and Er partial alternatives of Co in Co3O4 nanoparticles: Probing the physical properties. Physica B 608: 412898. https://doi.org/10.1016/j.physb.2021.412898 doi: 10.1016/j.physb.2021.412898
    [9] Li Q, Zhang Q, Zhou Z, et al. (2021) Boosting Zn-ion storage capability of self-standing Zn-doped Co3O4 nanowire array as advanced cathodes for high-performance wearable aqueous rechargeable Co//Zn batteries. Nano Res 14: 91–99. https://doi.org/10.1007/s12274-020-3046-8 doi: 10.1007/s12274-020-3046-8
    [10] Bao W, Li Y, Zhang J, et al. (2023) Interface engineering of the NiCo2O4@MoS2/TM heterostructure to realize the efficient alkaline oxygen evolution reaction. Int J Hydrogen Energy 48: 12176–12184. https://doi.org/10.1016/j.ijhydene.2022.12.184 doi: 10.1016/j.ijhydene.2022.12.184
    [11] Zhou X, Li Y, Zhao J, et al. (2023) Tailoring the electronic structure of NiMoO4 nanowires with NiCo2O4 nanosheets by constructing heterostructure interfaces for improving oxygen evolution reaction. Ionics 29: 1983–1990. https://doi.org/10.1007/s11581-023-04965-5 doi: 10.1007/s11581-023-04965-5
    [12] Bao W, Xiao L, Zhang J, et al. (2021) Electronic and structural engineering of NiCo2O4/Ti electrocatalysts for efficient oxygen evolution reaction. Int J Hydrogen Energy 46: 10259–10267. https://doi.org/10.1016/j.ijhydene.2020.12.126 doi: 10.1016/j.ijhydene.2020.12.126
    [13] Wu X, Zhou X, Hu L, et al. (2021) Porous NiCo2O4–FeCo2O4 nanowire arrays as advanced electrodes for high-performance flexible asymmetric supercapacitors. Energ Fuel 35: 12680–12687. https://doi.org/10.1021/acs.energyfuels.1c01517 doi: 10.1021/acs.energyfuels.1c01517
    [14] Rashti A, Lu X, Dobson A, et al. (2021) Tuning MOF-derived Co3O4/NiCo2O4 nanostructures for high-performance energy storage. ACS Appl Energy Mater 4: 1537–1547. https://doi.org/10.1021/acsaem.0c02736 doi: 10.1021/acsaem.0c02736
    [15] Chang Q, Liang H, Shi B, et al. (2021) Ethylenediamine-assisted hydrothermal synthesis of NiCo2O4 absorber with controlled morphology and excellent absorbing performance. J Colloid Interf Sci 588: 336–345. https://doi.org/10.1016/j.jcis.2020.12.099 doi: 10.1016/j.jcis.2020.12.099
    [16] Chen C, Su H, Lu L, et al. (2021) Interfacing spinel NiCo2O4 and NiCo alloy derived N-doped carbon nanotubes for enhanced oxygen electrocatalysis. Chem Eng J 408: 127814. https://doi.org/10.1016/j.cej.2020.127814 doi: 10.1016/j.cej.2020.127814
    [17] Cardenas-Flechas LJ, Barba-Ortega J, Joya MR (2020) Copper and iron oxide films deposited in titanium nanotubes. Rev UIS Ing 19: 171–178. https://doi.org/10.18273/revuin.v19n1-2020016 doi: 10.18273/revuin.v19n1-2020016
    [18] Sivakumar P, Vikraman D, Raj CJ, et al. (2021) Hierarchical NiCo/NiO/NiCo2O4 composite formation by solvothermal reaction as a potential electrode material for hydrogen evolutions and asymmetric supercapacitors. Int J Energy Res 45: 19947–19961. https://doi.org/10.1002/er.7065 doi: 10.1002/er.7065
    [19] Wu Z, Zhu Y, Ji X (2019) Study on charge storage mechanism in working electrodes fabricated by sol-gel derived spinel NiMn2O4 nanoparticles for supercapacitor application. Appl Surf Sci 463: 513–525. https://doi.org/10.1016/j.apsusc.2018.08.259 doi: 10.1016/j.apsusc.2018.08.259
    [20] Srinivasa N, Shreenivasa L, Adarakatti PS, et al. (2019) In situ addition of graphitic carbon into a NiCo2O4/CoO composite: Enhanced catalysis toward the oxygen evolution reaction. RSC Adv 9: 24995–25002. http://dx.doi.org/10.1039/C9RA05195C doi: 10.1039/C9RA05195C
    [21] Li Y, Han X, Yi T, et al. (2019) Review and prospect of NiCo2O4-based composite materials for supercapacitor electrodes. J Energy Chem 31: 54–78. https://doi.org/10.1016/j.jechem.2018.05.010 doi: 10.1016/j.jechem.2018.05.010
    [22] Manalu A, Tarigan K, Humaidi S, et al. (2022) Synthesis, microstructure and electrical properties of NiCo2O4/rGO composites as pseudocapacitive electrode for supercapacitors. Int J Electrochem Sci 17: 22036. https://doi.org/10.20964/2022.03.11 doi: 10.20964/2022.03.11
    [23] Peres APS, Lima AC, Barros BS, et al. (2012) Synthesis and characterization of NiCCo2O4 spinel using gelatin as an organic precursor. Mater Lett 89: 36–39. https://doi.org/10.1016/j.matlet.2012.08.044 doi: 10.1016/j.matlet.2012.08.044
    [24] Zhao N, Yang F, Zhao C, et al. (2021) Construction of pH-dependent nanozymes with oxygen vacancies as the high-efficient reactive oxygen species scavenger for oral-administrated antiinflammatory therapy. Adv Healthc Mater 10: e2101618. https://doi.org/10.1002/adhm.202101618 doi: 10.1002/adhm.202101618
    [25] Cardenas-Flechas LJ, Freire PTC, Paris EC, et al. (2021) Temperature-induced structural phase transformation in samples of Co3O4 and Co3-xNixO4 for CoO. Materialia 18: 101155. https://doi.org/10.1016/j.mtla.2021.101155 doi: 10.1016/j.mtla.2021.101155
    [26] Marco JF, Gancedo JR, Gracia M, et al. (2001) Cation distribution and magnetic structure of the ferrimagnetic spinel NiCo2O4. J Mater Chem 11: 3087–3093. https://doi.org/10.1039/B103135J doi: 10.1039/B103135J
    [27] Wang P, Jia C, Huang Y, et al. (2021) Van der waals heterostructures by design: From 1D and 2D to 3D. Matter 4: 552–581. https://doi.org/10.1016/j.matt.2020.12.015 doi: 10.1016/j.matt.2020.12.015
    [28] Liu Y, Fang Y, Yang D, et al. (2022) Recent progress of heterostructures based on two dimensional materials and wide bandgap semiconductors. J Phys: Condens Matter 34: 183001. https://doi.org/10.1088/1361-648x/ac5310 doi: 10.1088/1361-648x/ac5310
    [29] Lakehal A, Bedhiaf B, Bouaza A, et al. (2018) Structural optical and electrical properties of Ni-doped Co3O4 prepared via sol-gel technique. Mat Res 21: 1–7. https://doi.org/10.1590/1980-5373-MR-2017-0545 doi: 10.1590/1980-5373-MR-2017-0545
    [30] Cardenas-Flechas LJ, Xuriguera Martín E, Padilla Sanchez JA, et al. (2021) Experimental comparison of the effect of temperature on the vibrational and morphological properties of NixCo3-xO4 nanostructures. Mater Lett 303: 130477. https://doi.org/10.1016/j.matlet.2021.130477 doi: 10.1016/j.matlet.2021.130477
    [31] Shen G, Chen PC, Ryu K, et al. (2009) Devices and chemical sensing applications of metal oxide nanowires. J Mater Chem 19: 828–839. https://doi.org/10.1039/B816543B doi: 10.1039/B816543B
    [32] Thota S, Kumar A, Kumar J (2009) Optical electrical and magnetic properties of Co3O4 nano crystallites obtained by thermal decomposition of sol-gel derived oxalates. Mater Sci Eng B 164: 30–37. https://doi.org/10.1016/j.mseb.2009.06.002 doi: 10.1016/j.mseb.2009.06.002
    [33] Liu MC, Kong LB, Lu C, et al. (2012) A sol–gel process for fabrication of NiO/NiCo2O4/Co3O4 composite with improved electrochemical behavior for electrochemical capacitors. ACS Appl Mater Interfaces 4: 4631–4636. https://doi.org/10.1021/am301010u doi: 10.1021/am301010u
    [34] Cardenas-Flechas LJ, Raba Paes AM, Joya MR (2020) Synthesis and evaluation of nickel doped Co3O4 produced through hydrothermal technique. DYNA 87: 184–191. https://doi.org/10.15446/dyna.v87n213.84410 doi: 10.15446/dyna.v87n213.84410
  • This article has been cited by:

    1. Kangjian Sun, Ju Huo, Heming Jia, Lin Yue, Reinforcement learning guided Spearman dynamic opposite Gradient-based optimizer for numerical optimization and anchor clustering, 2023, 11, 2288-5048, 12, 10.1093/jcde/qwad109
  • Reader Comments
  • © 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1694) PDF downloads(87) Cited by(1)

Figures and Tables

Figures(8)  /  Tables(1)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog