Research article Special Issues

Improved graph neural network-based green anaconda optimization for segmenting and classifying the lung cancer


  • Normal lung cells incur genetic damage over time, which causes unchecked cell growth and ultimately leads to lung cancer. Nearly 85% of lung cancer cases are caused by smoking, but there exists factual evidence that beta-carotene supplements and arsenic in water may raise the risk of developing the illness. Asbestos, polycyclic aromatic hydrocarbons, arsenic, radon gas, nickel, chromium and hereditary factors represent various lung cancer-causing agents. Therefore, deep learning approaches are employed to quicken the crucial procedure of diagnosing lung cancer. The effectiveness of these methods has increased when used to examine cancer histopathology slides. Initially, the data is gathered from the standard benchmark dataset. Further, the pre-processing of the collected images is accomplished using the Gabor filter method. The segmentation of these pre-processed images is done through the modified expectation maximization (MEM) algorithm method. Next, using the histogram of oriented gradient (HOG) scheme, the features are extracted from these segmented images. Finally, the classification of lung cancer is performed by the improved graph neural network (IGNN), where the parameter optimization of graph neural network (GNN) is done by the green anaconda optimization (GAO) algorithm in order to derive the accuracy maximization as the major objective function. This IGNN classifies lung cancer into normal, adeno carcinoma and squamous cell carcinoma as the final output. On comparison with existing methods with respect to distinct performance measures, the simulation findings reveal the betterment of the introduced method.

    Citation: S. Dinesh Krishnan, Danilo Pelusi, A. Daniel, V. Suresh, Balamurugan Balusamy. Improved graph neural network-based green anaconda optimization for segmenting and classifying the lung cancer[J]. Mathematical Biosciences and Engineering, 2023, 20(9): 17138-17157. doi: 10.3934/mbe.2023764

    Related Papers:

    [1] Christopher M. Kribs-Zaleta . Alternative transmission modes for Trypanosoma cruzi . Mathematical Biosciences and Engineering, 2010, 7(3): 657-673. doi: 10.3934/mbe.2010.7.657
    [2] Xing Zhang, Zhitao Li, Lixin Gao . Stability analysis of a SAIR epidemic model on scale-free community networks. Mathematical Biosciences and Engineering, 2024, 21(3): 4648-4668. doi: 10.3934/mbe.2024204
    [3] Ruixia Zhang, Shuping Li . Analysis of a two-patch SIS model with saturating contact rate and one- directing population dispersal. Mathematical Biosciences and Engineering, 2022, 19(11): 11217-11231. doi: 10.3934/mbe.2022523
    [4] Zhilan Feng, Robert Swihart, Yingfei Yi, Huaiping Zhu . Coexistence in a metapopulation model with explicit local dynamics. Mathematical Biosciences and Engineering, 2004, 1(1): 131-145. doi: 10.3934/mbe.2004.1.131
    [5] Jinliang Wang, Ran Zhang, Toshikazu Kuniya . A note on dynamics of an age-of-infection cholera model. Mathematical Biosciences and Engineering, 2016, 13(1): 227-247. doi: 10.3934/mbe.2016.13.227
    [6] Yan-Xia Dang, Zhi-Peng Qiu, Xue-Zhi Li, Maia Martcheva . Global dynamics of a vector-host epidemic model with age of infection. Mathematical Biosciences and Engineering, 2017, 14(5&6): 1159-1186. doi: 10.3934/mbe.2017060
    [7] Nariyuki Nakagiri, Hiroki Yokoi, Yukio Sakisaka, Kei-ichi Tainaka, Kazunori Sato . Influence of network structure on infectious disease control. Mathematical Biosciences and Engineering, 2025, 22(4): 943-961. doi: 10.3934/mbe.2025034
    [8] Karl Peter Hadeler . Structured populations with diffusion in state space. Mathematical Biosciences and Engineering, 2010, 7(1): 37-49. doi: 10.3934/mbe.2010.7.37
    [9] Xia Wang, Yuming Chen . An age-structured vector-borne disease model with horizontal transmission in the host. Mathematical Biosciences and Engineering, 2018, 15(5): 1099-1116. doi: 10.3934/mbe.2018049
    [10] Chandrani Banerjee, Linda J. S. Allen, Jorge Salazar-Bravo . Models for an arenavirus infection in a rodent population: consequences of horizontal, vertical and sexual transmission. Mathematical Biosciences and Engineering, 2008, 5(4): 617-645. doi: 10.3934/mbe.2008.5.617
  • Normal lung cells incur genetic damage over time, which causes unchecked cell growth and ultimately leads to lung cancer. Nearly 85% of lung cancer cases are caused by smoking, but there exists factual evidence that beta-carotene supplements and arsenic in water may raise the risk of developing the illness. Asbestos, polycyclic aromatic hydrocarbons, arsenic, radon gas, nickel, chromium and hereditary factors represent various lung cancer-causing agents. Therefore, deep learning approaches are employed to quicken the crucial procedure of diagnosing lung cancer. The effectiveness of these methods has increased when used to examine cancer histopathology slides. Initially, the data is gathered from the standard benchmark dataset. Further, the pre-processing of the collected images is accomplished using the Gabor filter method. The segmentation of these pre-processed images is done through the modified expectation maximization (MEM) algorithm method. Next, using the histogram of oriented gradient (HOG) scheme, the features are extracted from these segmented images. Finally, the classification of lung cancer is performed by the improved graph neural network (IGNN), where the parameter optimization of graph neural network (GNN) is done by the green anaconda optimization (GAO) algorithm in order to derive the accuracy maximization as the major objective function. This IGNN classifies lung cancer into normal, adeno carcinoma and squamous cell carcinoma as the final output. On comparison with existing methods with respect to distinct performance measures, the simulation findings reveal the betterment of the introduced method.



    Traffic accidents are the third leading cause of death in the United States according to the Centers for Disease Control and Prevention [1]. An unnoticeable fact is that the quality of tires has caused a lot of traffic accidents. The National Highway Traffic Safety Administration reports an average of nearly 11,000 accidents related to tire failures each year [2]. Undoubtedly, the quality of tires not only affects the safety of riders, but also affects the lives and property of every road traveler. In the tire production process, quality inspection can prevent consumers from using unqualified products. Currently, the main market share is dominated by radial tires. This type of tire not only has multiple layers of belt attached to the crown, but also has a tightly wound wire coil, which significantly improves the strength and stability of the ring [3,4]. Radial tires require a complex production process, which increases the possibility of various quality defects [5,6]. For this type of tire, the most popular nondestructive detection scheme is to place the tire product in an X-ray room for irradiation.The resulting 2-D grayscale image is measured according to enterprise standards. Although a large number of enterprises use artificial visual inspection, such a process has significant disadvantages in terms of efficiency and accuracy. For now, unattended defect detection based on computer vision is the trend.

    To meet the needs of intelligent detection, many researchers have studied tire X-ray images. Based on traditional image processing methods, researchers have proposed corresponding research results from multiple perspectives. Guo et al. [7] firstly evaluated the pixel-level texture similarity. Impurities in the sidewall and crown were then located by threshold treatment. However, pixel-level operation is extremely challenging for computational speed, especially when we consider that the size of X-ray images is quite large. Zhang et al. [8,9,10] have systematically studied the detection methods to detect impurities. Combining the total variation with the Curvelet transformation, the original image was split into texture and cartoon. Ultimately, impurities were located by cartoons [8]. A method of combining Curvelet with Canny operator was designed to extract the defect feature effectively [9]. Wavelet multiscale analysis was proposed in [10]. This method included local analysis and scale feature extraction to separate defect from background. There were obvious differences between the local inverse differential moment characteristics of the normal area and the impurity area. Therefore, [11] used this principle to detect significant impurities. Different tire types are significantly different in the amount of raw material, resulting in differences in the brightness of the grayscale image after X-ray irradiation. The above traditional defect detection methods need to set relevant algorithm parameters according to the detection model. Therefore, these methods are particularly sensitive to the brightness of grayscale images. The first motivation of this paper is to design a tire defect detection mathod. Specifically, it reduces the impact of image brightness on the detection results, while taking into account the defect detection of various types of tires.

    In recent years, deep learning technology has been widely used in image classification [12,13], target recognition [14,15], semantic segmentation [16,17] and the other fields. Some researchers have also applied deep learning technology to a variety of visual detection tasks [18], such as fabric, welding, and tire internal defect detection. In [19], the convolution neural network based on ResNet [20] learned the texture features of the fabric, making it possible to accurately locate small defects on the fabric. Aiming at the target of automatic location of welding defects, [21] proposed an improved U-Net network. The network combines random cropping and preprocessing methods to effectively expand the data set. In [22], an end-to-end tire X-ray image defect classification network called TireNet was proposed. A new network realizes the representation of defect feature as a part of the downstream classification network module. TireNet achieved a recall rate of 94.7% on a data set composed of more than 10 defect types. A convolutional network based on supervised feature embedding was proposed in [23]. This method effectively improved the accuracy of tire X-ray image classification based on AlexNet. According to the above introduction, the application of deep learning model in visual detection tasks mainly focuses on distinguishing the types of defects. Few studies have been done to determine the level of defects. The task of defect detection in tire factory is to judge defect level on the basis of defect identification. Unqualified products are usually classified as defective products or scraps according to the degree of defect. Thus, the second research motivation of this paper is the application of deep learning network for accurate judgment of defective products and scraps.

    Error defects tend to occur in the bead toe. Whether an error defect occurs is determined by the maximum pixel width and minimum pixel width of the bead toe. Moreover, in the actual detection of toe error defects, defective products and scraps need to be distinguished. Therefore, the current method of manual visual inspection requires accurate measurement to judge the defect level after initial assessment. Certainly, it seriously affects the defect detection speed of each tire. Fortunately, semantic segmentation network can realize pixel level classification. That is, it is satisfied to extract the bead toe and measure the pixel width. Naturally, semantic segmentation network is proposed to replace manual task of defect level determination. The idea of semantic segmentation can be traced back to the proposal of FCN [24]. Later, U-Net [25] achieved very high accuracy by using a specific jump connection structure, followed by many improved forms, such as [26,27,28,29]. With the optimization of the speed of semantic segmentation network, the concept of real-time semantic segmentation network was proposed. E-Net [30] designed a relatively small coding layer and a relatively large decoding network. This kind of network was characterized by reducing the number of parameters and greatly increasing the network speed. Bisenet [31] proposed a dual path network structure, in which two paths obtained high-resolution features and enough receptive field. STDC-Net [32] abandoned the method of extracting spatial information and context information separately. Instead, single-stream networks were used. The learning of spatial information was integrated into the bottom layer of the network, which further speeded up the output of the network. However, STDC-Net still has the possibility of improving the accuracy of pixel-level segmentation. Naturally, the third research motivation of this paper is to design a lightweight semantic segmentation network. The network has tradeoffs between segmentation accuracy and inference speed. To be specific, the detection of bread toe error is superior to manual visual inspection in both speed and accuracy.

    Aiming at the above three research motivations, this paper proposes a lightweight semantic segmentation network to realize the detection of bead toe error. In this paper, the shallow and deep texture information is stored in the feature map of each stage. We fuse multi-scale feature information in the decoding module. Finally, using size extensions, the output is decided by sufficient spatial and contextual information. Besides, an auxiliary supervision structure is added to improve the precision of class boundary segmentation without increasing model parameters.

    The main contributions of this paper are summarized as follows:

    1). A semantic segmentation network based on encoder and decoder structure is proposed. In the encoder, a weighted dense connection (WSTDC) module is proposed. In the decoder, a feature fusion structure using a chained residual pooling (CRP) is proposed. This structure uses pooling operation and small convolution kernel to replace the large convolution kernel. Spatial and contextual information is then expressed.

    2). We design an auxiliary boundary supervision method. The labels composed of border and background are designed based on the original three categories of labels. The auxiliary loss function calculates the difference between the feature maps and the boundary labels in the coding stage to correct the attention of the coding operation to the boundary. The main loss and auxiliary loss are super imposed to complete the network training in the end.

    3). Based on the calculation of mIoU, a more accurate index to measure the class boundary accuracy is proposed, that is, L-mIoU. On a self-made tire X-ray dataset, a result of the lightweight semantic segmentation network is impressive. To be more precise, 92.4% L-mIoU and 97.1% mIoU are achieved at 1.0 s.

    4). A deep learning method is applied to bead toe defect detection in this paper. It is the first method to identify whether an error defect occurs by calculating the boundary of the bead toe. The effect of on-line detection was simulated on a data set composed of defect samples and normal samples.

    Inspired by STDC-Net, in this work, we propose a lightweight semantic segmentation network. The height of a complete tire X-ray image is usually thousands of pixels and the width is 2469 pixels. If the original image is directly fed into the network, on the one hand, the operation speed is very bad, on the other hand, large image is more difficult to label. In order to complete the tire toe defect detection and reduce the computational cost, the key is to cut the area containing the left tire toe and the right tire toe. Considering that the width of the bead toe varies with different types, the 512 × 512 image resolution is finally determined to ensure that the clipped subgraph includes bead ring, bead toe and sidewall, as shown in Figure 1. Obviously, each input to the network is a subgraph. This lightweight semantic segmentation network learns the texture feature of each area of the tire X-ray subgraph. As a result, the original input X-ray image is divided into three parts, as shown in Figure 2, representing the bead ring, bead toe, and sidewall respectively. The architecture of the network is mainly composed of an encoder, a decoder and an auxiliary boundary supervision module, as shown in Figure 3. In the network structure, the basis is the encoder, including multiple WSTDCs to achieve feature representation. Each WSTDC consists of a stack of multiple 3 × 3 convolution layers followed by a nonlinear activation unit (Relu) and a batch standardization unit (BN). It is worth noting that the output of each convolution operation in WSTDC will be recorded. The output of this module is the combination of all the convolution operation records. After going through multiple WSTDCs, the image is eventually reduced to 1/32 of its original size. Each time the size is reduced, a new feature map is generated. The encoder outputs a number of feature maps of different sizes to form a complete spatial information semantic information. In the decoder, the expansion mechanism solves the problem of how to fuse different size feature maps from the encoder. The deep feature maps are merged with the shallow feature maps after passing through 5 × 5 pooling layers and 1 × 1 convolution layers. Crucially, the representation ability of deep spatial features affects the segmentation effect of boundary details. Based on this observation, we propose an auxiliary boundary supervision module to guide the deep module in the encoding stage by learning boundary details. The details of each module in the network are shown in the following description.

    Figure 1.  From the original tire X-ray image to the input image of a lightweight semantic segmentation network.
    Figure 2.  Tire images with resolution 512 × 512. (a) A part of the X-ray image on the left side of the crown. (b) A part of the X-ray image on the right side of the crown.
    Figure 3.  A lightweight semantic segmentation network designed in this paper.

    According to the steps described above, the extraction of texture feature requires the use of advanced down-sampling modules. The recently proposed STDC module has gained our attention due to its strong extraction ability and small number of training parameters. So many improved STDC modules are stacked to form a forward feature learning path. There are two WSTDC modules involved in the path, referred to as WSTDC1 and WSTDC2, as shown in Figure 4. Including a convolutional layer with 1 × 1 filters and 4 layers with 3 × 3 filters are the same points of WSTDC1 and WSTDC2. The output image size of WSTDC1 is consistent with the input image size, while the output size of WSTDC2 is half of the input size. In this paper, WSTDC1 and WSTDC2 are combined to form a standard down-sampling module called double feature extraction blocks (DFEB). The key to this semantic segmentation network is that shallow blocks contain a lot of spatial information. As for deep feature maps, they mainly represent contextual information. To speed up the convergence speed of the training parameters in the DFEB and reduce redundant parameters, the output of each convolutional layer is fused to the output layer through jump connections. The weights of the intermediate feature maps are learnable. That is to say, the weight parameters change dynamically during the training process.The source of the output is shown in the following equation:

    [y1y2y3y4]=[w10000w20000w30000w4][x1x2x3x4] (2.1)
    yout=f(y1,y2,y3,y4) (2.2)
    Figure 4.  A lightweight semantic segmentation network designed in this paper. (a) represents the structure of WSTDC1(the output size is the same as the input size). (b) represents the structure of WSTDC2(the output size is reduced to half of the input size).

    where w1,w2,w3,w4 denotes the weights of the intermediate feature maps (i.e., x1,x2,x3,x4) from left to right, yout denote WSTDC modules output, f indicates that the fusion operation is completed by concatenation.

    Therefore, the advantages of the WSTDC structure are mainly reflected in the fewer parameters and expression multi-scale features. The concatenation of multiple intermediate feature maps constitutes the output channel of the WSTDC module. Compared with the conventional channel transform, less convolution computation is required under the requirement of the same number of output channels. Convolution layers of different depths extract texture information of different sizes, that is, the texture contrast of each region is more distinct.

    Considering the relationship between the richness of texture feature, the number of DFEBs is set to 3. The encoder structure is shown in Table 1. On the whole, X-ray image processing with a resolution of 1 × 512 × 512 consists of 6 stages. Stages 1 and 2 use a structure that only contains a convolutional layer followed by a batch normalization and a non-linear activation unit. Such a simple design shows favorable feature extraction capabilities in the following experiments. Stage 3 includes 2 convolutional layers that behave as channel expansions without size changes. From stages 4 to 6, three DFEBs are used to refine the original input feature map, so that the image size is reduced to 1/8, 1/16 and 1/32 in turn. In the end, the number of channels eventually expands to 1024 in stage 6. In the experiment, we found that the combination of WSTDC1 and WSTDC2 can meet the needs of texture expression. More importantly, this structure has more advantages in terms of controlling the number of parameters.

    Table 1.  The structure of each stage in the encoder.
    Stage Operation Output size Kernel size Stride
    stage1 Conv2d 64 × 256 × 256 3 2
    stage2 Conv2d 128 × 128 × 128 3 2
    stage3 Double Conv2d 256 × 128 × 128 3 1
    stage4 WSTDC1/WSTDC2 256 × 64 × 64 3 2/1
    stage5 WSTDC1/WSTDC2 512 × 32 × 32 3 2/1
    stage6 WSTDC1/WSTDC2 1024 × 16 × 16 3 2/1

     | Show Table
    DownLoad: CSV

    Inspired by [33], the optimization of the chained residual pooling module is shown in the reduction of the convolution kernel size. Specifically, a 3 × 3 convolution kernel is replaced with consideration for computational speed. Experiments show that the 1 × 1 convolution does not significantly lose the texture information contained in the feature maps. In fact, the CRP module is a stack of 5 × 5 pooling operation and 1 × 1 convolution operation, as shown in Figure 5. It should be noted that the number of stacked layers is set to 3 in this paper. The 1 × 1 convolution kernel instead of the 3 × 3 convolution kernel loses the local receptive field to a certain extent. However, the use of the large pooling kernel and the jump connection weakenes this kind of loss.

    Figure 5.  A new CRP module.

    The output feature maps of down-sampling encoding stages 1 to 6 are the input source of the up-sampling fusion operation. We recorded the decoder details as shown in Table 2. In summary, the output feature maps of the current stage is fused with the feature maps of the previous stage, that is, the up-sampling decoding operation is realized by constructing a U-shaped structure. It should be mentioned that the output feature maps of the two stages need to be transformed through CRP module after being merged. In our structure, only the output feature maps of stage 1 is not fused by pixel-level addition operation. Instead, pixel-level concatenation operation is used. Add operation and Cat operation represent pixel-level addition and concatenation, respectively. Also, a double-layer 3 × 3 convolution is placed behind the feature fusion layer. When it comes to the benefits of the concatenation, the direct advantage is that the number of channels can be rapidly expanded. Obviously, the higher dimensional space is suitable for separating the details of texture differences. Finally, the pixel-level classifier makes the number of channels to 3.

    Table 2.  Decoding structure embedded in the lightweight network. The two input branches come from the output feature maps of two adjacent encoding stages.
    Input size_1 Operation_1 Input size_2 Operation_2 Fusion Output size
    1024 × 16 × 16 1 × 1 Conv2d 512 × 32 × 32 1 × 1 Conv2d Add/CRP 256 × 64 × 64
    CRP 1 × 1 Conv2d
    Upsample Upsample
    256 × 64 × 64 1 × 1 Conv2d 256 × 64 × 64 * Add/CRP 128 × 128 × 128
    1 × 1 Conv2d
    Upsample
    256 × 128 × 128 1 × 1 × Conv2d 128 × 128 × 128 * Add/CRP 64 × 256 × 256
    1 × 1 Conv2d
    Upsample
    64 × 256 × 256 * 64 × 256 × 256 * Cat 32 × 512 × 512
    3 × 3 Conv2d
    Upsample
    Pixel-level classifier

     | Show Table
    DownLoad: CSV

    In order to improve the accuracy of class boundary supervision in semantic segmentation tasks, many contributors put forward their own ideas. These methods generally train a separate network for boundary supervision. Inspired by MSFNet [34], a module to supervise the background and boundary of the encoder is designed. In detail, an auxiliary supervision module is placed behind the stage 5, where the feature maps have compressed spatial information. Before using the auxiliary boundary supervision module, we use sobel operator to generate labels composed of boundary and background according to the region marker of three textures, as shown in Figure 6. The vertical boundary is concerned while ignoring the horizontal Sobel operator. Subsequently, all boundary distribution maps are obtained. Finally, the value 0.9 is used as the threshold value for transforming boundary a distribution map into a new label, that is, this kind of label containing boundaries and backgrounds can be constructed. When a 64-channel feature map is converted into a single-channel boundary detail map, the two-layer 3 × 3 convolution in the auxiliary boundary supervision model is used. In order to optimize the boundary prediction capability of the encoder, the cross-entropy loss and DICE loss are calculated based on the true boundary and the predicted boundary. What needs to be emphasized is that the auxiliary boundary supervision greatly improves the bias of deep network to boundary difference.

    Figure 6.  (a) Position marker for three textures of 512 × 512 tire image. (b) A label consisting of a boundary and a background that contains two categories.

    The output of the encoder at stage 5 is supervised by the auxiliary loss function to enhance the spatial boundary information in the deep network. In addition, the main loss function supervises the output of the decoder. In detail, the cross entropy is adopted in the main loss function, while the auxiliary loss includes DICE loss and cross entropy loss. As shown in Eq (2.3), a weight coefficient is set to balance the main loss and auxiliary loss. In this paper, a good semantic segmentation effect can be observed when β=0.96.

    loss=Lm(y1,y1)+βLa(y2,y2) (2.3)

    where y1 and y1 represent the labels of three categories and the output of the decoder respectively. The other term represents the border label and the output of the auxiliary supervision module.

    In this section, we conduct experiments on real tire X-ray image data. The main work is to compare the performance of this lightweight network with several classic networks in terms of speed and accuracy. Considering that the accuracy of the bead toe boundary measurement depends entirely on the positioning effect of the toe, semantic segmentation index indirectly reflects the detection effect of toe error defects. Essentially, the output of the semantic segmentation network is pixel-level markup of bead ring, toe and sidewall. As shown in Figure 7, the output subgraphs belonging to the same original tire X-ray image are joined together, where the number of subgraphs is represented by N, thus the coordinates of the left and right toe boundaries can be calculated. When the toe error defect exists, the coordinate difference will be abrupt, as shown in Figure 8. Toe error defects are considered to occur when the coordinate difference between position A and position B is greater than the set standard. It should be noted that the standard of defects is expressed in millimeters, so pixel coordinates are converted to millimeters according to the scale bar. According to the standard, it can be further determined whether the defect level is a defective product or a scrap. Besides, bead toe error defects may occur on both the left and right side of the tire.

    Figure 7.  Method for constructing bead toe boundary coordinates.
    Figure 8.  Calculation principle of toe error defect.

    In this work, all hardware environments are Intel(R) Core (TM) i7-8700K CPU @ 3.70GHz, 16.0 GB RAM and NVIDIA GEFORCE GTX 2080 GPU. Each network participating in training uses a stochastic gradient descent (SGD) with a momentum of 0.9, a batch size of 4, and a weight decay of 1e-5. The initial learning rate is 1e-4. Crucially, the goal of semantic segmentation task is to label the input image into three categories, so it is sufficient that the total training epoch is set to 30.

    The data set is divided into training set and test set, accounting for 1218 and 854, respectively. In the test set, 554 images are used for the evaluation of semantic segmentation metrics. Accordingly, the remaining 300 images are used to measure the accuracy of defect diagnosis. In detail, the left bead toe and the right bead toe each account for one-half of the images. The ground-truth of input image contains three parts, in which the bead ring, the bead toe and the sidewall are represented by class name 0, 1 and 2 respectively. We know that the height of the X-ray image reaches 512 pixels, so using the original image with a resolution of 512 × 512 will cause difficulty in marking the top and bottom when making the label. To this end, 200 redundant pixels were filled at the top and bottom of the original image in the sample making process, as shown in Figure 9. Once the marking is complete, the step of eliminating redundant pixels is very easy.

    Figure 9.  (a) Input image to be marked on the left. (b) Input image to be marked on the right.

    mIoU is to evaluate the prediction accuracy of semantic segmentation network for all output pixels. It has a good effect on the evaluation of conventional pixel-level classification tasks. However, the lightweight semantic segmentation network proposed here is intended to locate the boundaries of the bead toe. In other words, we are more concerned with the classification effect of pixels around the toe boundary. The bead ring is particularly different from the toe and sidewall, which can be distinguished from the gray value. Therefore, pixel-level segmentation of bead ring is not challenging for most semantic segmentation networks. However, the difference in pixel gray levels between the sidewall and the bead toe is no longer obvious, that is, the gray level information cannot be simply considered. In this case, the segmentation accuracy of the toe and sidewall boundary is the key to evaluating the semantic segmentation network. Our local mIoU came into being, it only pays attention to the local real label with a width of 60 pixels. Local real labels come from two boundary regions as does network prediction output. Figure 10 shows the recording rules for the local prediction output.

    Figure 10.  (a) The prediction of the left bead toe by the semantic segmentation network. (b) The prediction of the right bead toe.

    U-Net is a high-precision semantic segmentation network, but the training parameters are more complicated. STDC-Net is a network structure oriented to real time semantic segmentation scenes. It compresses parameters and adversely affects the segmentation effect of the target boundary. We compared the pixel-level classification accuracy of the U-Net, STDC-Net, RefineNet and our lightweight network on the test set. It can be found from Table 3 that the mIoU value is significantly higher than the L-mIoU value in each model. For example, the mIoU in the U-Net model reaches 96.8%, while the L-mIoU is 91.9%. It shows that for the evaluation of class boundaries, L-mIoU is closer to the true capabilities of the network. As a real-time semantic segmentation network, STDC-Net has certain deficiencies in the segmentation of boundaries. Without auxiliary boundary supervision, our model is lower than the U-Net model in both mIoU and L-mIoU. Refinenet-LW-50 is a lightweight semantic segmentation network designed on the basis of RefineNet, which has a good effect on the expression of spatial information. In the network design of this paper, we use the feature fusion module of Refinenet-LW-50 for reference.

    Table 3.  Comparisons with other popular networks on the test set.
    Network mIoU (%) L-mIoU (%) Parameters (MB) Running time (s)
    U-Net[25] 96.8 91.9 65.87 1.8
    STDC-Net[32] 92.5 81.9 52.96 0.8
    RefineNet-LW-50[33] 96.9 91.8 104.18 1.2
    Our method (without supervision) 96.1 90.6 55.58 1.0

     | Show Table
    DownLoad: CSV

    In addition, the size and operating speed of the lightweight network are also evaluated. The running time of the network is averaged by testing 554 X-ray images with a resolution of 8000 × 2469. As for the running time, the original image is first segmented into subgraphs containing the bead toe. Then predicting them at the pixel level. Finally, the boundary coordinates of the bead toe are recorded. As shown in Table 3, U-Net has excellent performance in the accuracy mentioned above, but the large number of parameters leads to the longer running time. Despite the STDC-Net is very fast, the accuracy of pixel-level prediction is not up to our requirements. RefineNet-LW-50 is inferior to the lightweight network proposed in this paper in terms of speed and accuracy. With training parameters of 55.58 MB, this excellent lightweight semantic segmentation network is 40% faster than U-Net. In summary, speed and accuracy can be traded off using a lightweight semantic segmentation network.

    In order to continue to enhance the performance of the new method, as shown in Table 4, the auxiliary boundary supervision module is added. The addition of auxiliary supervision modules to three classic semantic segmentation networks can help improve the pixel-level segmentation effect. However, the classical network does not design a special module for boundary extraction, resulting in a weaker improvement effect than the network structure proposed in this paper. Our method (S5) means that the auxiliary boundary supervision is placed after 1/16 down sampling. At this time, L-mIoU reaches 92.4%, which exceeds the pixel-level segmentation accuracy of U-Net. In the experiment, we test the effect of the auxiliary boundary supervision function at different positions. Adding auxiliary supervision modules after stage 1 to stage 5 will help advance the accuracy of boundary segmentation. Actually, for the encoder, the resolution of the shallow feature maps is relatively large, so that the loss of spatial information is less. Instead, the spatial information in the deep feature maps has severe loss. It is necessary to capture both shallow spatial information and deep semantic information at the same time. Therefore, this paper designs a decoding structure with preference for boundary information. The experimental results shows that the two branches of feature fusion (shallow feature maps and deep feature maps) by deep auxiliary supervision have rich spatial information.

    Table 4.  Position test of auxiliary boundary supervision module. S1, S2, S3, S4 and S5 respectively indicate that the auxiliary supervision module is placed after the output of the encoder from stage 1 to stage 5. In the three classical networks, the auxiliary supervision module is added to the encoder after the feature map with a resolution of 32 × 32.
    Network Input resolution mIoU(%) L-mIoU(%)
    UNet + Boundary loss 1 × 512 × 512 97.0 92.2
    STDC-Net + Boundary loss 1 × 512 × 512 92.7 82.3
    RefineNet-LW-50 + Boundary loss 1 × 512 × 512 97.1 92.0
    Our method + Boundary loss(S1) 1 × 512 × 512 96.6 91.6
    Our method + Boundary loss(S2) 1 × 512 × 512 96.5 91.1
    Our method + Boundary loss(S3) 1 × 512 × 512 96.5 91.4
    Our method + Boundary loss(S4) 1 × 512 × 512 96.9 91.6
    Our method + Boundary loss(S5) 1 × 512 × 512 97.1 92.4

     | Show Table
    DownLoad: CSV

    Of course, the visual effect comparison after adding auxiliary boundary supervision module is indispensable. As shown in Figure 11, overwhelmingly, the results show that after adding the auxiliary supervision module, the segmentation effect of various algorithms in the boundary area has been greatly improved. There are encouraging signs that segmentation effect of our method near the boundary is almost consistent with the marked map.

    Figure 11.  Comparison of visual effects between our method and popular methods.

    Artificial judgment results only distinguish between normal and defect types. However, artificial evaluation result need to further distinguish defect levels on the basis of identifying defects. For a dataset of 300 tire x-ray images, the normal sample is exactly two thirds. Scrap and defective products occupy 20 and 80 respectively.

    CR=HJAJS (3.1)
    PR=HLEALES (3.2)
    RMR=NSd (3.3)

    In Eq (3.1), CR is defined as correct rate. HJ and AJ represent the artificial judgement result and the algorithm judgement result for the input image, respectively. S stands for the total number of input images. In Eq (3.2), PR is defined as Precision ratio. HLE represent the artificial evaluation result for two defect levels. ALE stands for the evaluation result of the algorithm for defect levels. In Eq (3.3), RMR is defined as rate of missing repor. N represents the number of defect samples that were not identified. Sd stands for the number of defect samples.

    After the position of each region is predicted according to the semantic segmentation network, the coordinates of the border of the toe can be obtained. Then, calculated by the ratio of the distance on the image to the actual distance, the defect level is judged against the defect standard. Due to the known error defect distribution of tire X-ray images, the advantages of our proposed algorithm in automatic detection can be directly verified. As shown in Table 5, in the three evaluation indexes of bread toe error detection, the algorithm proposed in this paper is at an advanced level, which are 91.3, 90.4 and 1.4%, respectively.

    Table 5.  Comparison of toe error detection results.
    Network CR (%) PR (%) RMR(%)
    UNet + Boundary loss 89.7 88.3 1.8
    STDC-Net + Boundary loss 75.4 71.6 3.5
    RefineNet-LW-50 + Boundary loss 88.5 87.9 2.2
    Our method + Boundary loss(S5) 91.3 90.4 1.4

     | Show Table
    DownLoad: CSV

    In this paper, we have rediscussed the classical semantic segmentation network, U-Net and real-time semantic segmentation network, STDC-Net. On this basis, the shortcomings of U-Net in speed and STDC-Net in accuracy have been optimized. A new lightweight semantic segmentation network has been proposed, which is similar to U-Net in precision and at the same time similar to STDC-Net in speed. Our encoder is the fine tuning of STDC-Net. Talking about the idea of the decoder, it is to fuse the deep feature maps with the shallow feature maps, so that the feature maps contain multi-scale spatial information. It is key to search the boundary of the bead toe for error defect detect. Therefore, the pixel-level boundary segmentation capability is an inevitable requirement for the network. An auxiliary supervision method on the vertical direction has been proposed to enhance the pixel-level classification performance for the class boundary. Ultimately, the bead toe boundary coordinates have been located by running our lightweight semantic segmentation network. By comparing the difference between the edge coordinates and the standard, it can be judged whether there is a toe error defect and the defect level.

    The authors declare there is no conflict of interest.



    [1] A. Masood, P. Yang, B. Sheng, H. Li, P. Li, J. Qin, et al., Cloud-based automated clinical decision support system for detection and diagnosis of lung cancer in chest CT, IEEE J. Transl. Eng. Health Med., 8 (2020), 1–13, 2020. https://doi.org/10.1109/JTEHM.2019.2955458 doi: 10.1109/JTEHM.2019.2955458
    [2] E. H. Houssein, D. A. Abdelkareem, M. M. Emam, M. A. Hameed, M. Younan, An efficient image segmentation method for skin cancer imaging using improved golden jackal optimization algorithm, Comput. Biol. Med., 149 (2022), 106075. https://doi.org/10.1016/j.compbiomed.2022.106075 doi: 10.1016/j.compbiomed.2022.106075
    [3] O. Ayyildiz, Z. Aydin, B. Yilmaz, S. Karaçavu, K. Senkaya, S. Içer, et al., Lung cancer subtype differentiation from positron emission tomography images, Turk. J. Electr. Eng. Comput. Sci., 28 (2020), 262–274. https://doi.org/10.3906/elk-1810-154 doi: 10.3906/elk-1810-154
    [4] L. Ren, D. Zhao, X. Zhao, W. Chen, L. Li, T. Wu, et al., Multi-level thresholding segmentation for pathological images: Optimal performance design of a new modified differential evolution, Comput. Biol. Med., 148 (2022), 105910. https://doi.org/10.1016/j.compbiomed.2022.105910 doi: 10.1016/j.compbiomed.2022.105910
    [5] C. Zappa, S. A. Mousa, Non-small cell lung cancer: Current treatment and future advances, Transl. Lung Cancer Res., 5 (2016), 288–300. https://doi.org/10.21037/tlcr.2016.06.07 doi: 10.21037/tlcr.2016.06.07
    [6] M. M. Emam, E. H. Houssein, R. M. Ghoniem, A modified reptile search algorithm for global optimization and image segmentation: Case study brain MRI images, Comput. Biol. Med., 152 (2023), 106404. https://doi.org/10.1016/j.compbiomed.2022.106404 doi: 10.1016/j.compbiomed.2022.106404
    [7] V. K. Anagnostou, A. T. Dimou, T. Botsis, E. J. Killiam, M. D. Gustavson, R. J. Homer, et al., Molecular classification of nonsmall cell lung cancer using a 4-protein quantitative assay, Cancer, 118 (2012), 1607–1618. https://doi.org/10.1002/cncr.26450 doi: 10.1002/cncr.26450
    [8] K. M. Hosny, A. M. Khalid, H. M. Hamza, S. Mirjalili, Multilevel segmentation of 2D and volumetric medical images using hybrid Coronavirus Optimization Algorithm, Comput. Biol. Med., 150 (2022), 106003. https://doi.org/10.1016/j.compbiomed.2022.106003 doi: 10.1016/j.compbiomed.2022.106003
    [9] F. Ciompi, K. Chung, S. J. van Riel, A. A. A. Setio, P. K. Gerke, C. Jacobs, et al., Towards automatic pulmonary nodule management in lung cancer screening with deep learning, Sci. Rep., 7 (2017), 1–11. https://doi.org/10.1038/srep46479 doi: 10.1038/srep46479
    [10] W. Zhu, L. Liu, F. Kuang, L. Li, S. Xu, Y. Liang, An efficient multi-threshold image segmentation for skin cancer using boosting whale optimizer, Comput. Biol. Med., 151 (2022), 106227. https://doi.org/10.1016/j.compbiomed.2022.106227 doi: 10.1016/j.compbiomed.2022.106227
    [11] J. J. Chabon, E. G. Hamilton, D. M. Kurtz, M. S. Esfahani, E. J. Moding, H. Stehr, et al., Integrating genomic features for non-invasive early lung cancer detection, Nature, 580 (2020), 245–251. https://doi.org/10.1038/s41586-020-2140-0 doi: 10.1038/s41586-020-2140-0
    [12] A. Masood, B. Sheng, P. Yang, P. Li, D. D. Feng, Automated decision support system for lung cancer detection and classification via enhanced RFCN with multilayer fusion RPN, IEEE Trans. Ind. Inf., 16 (2020), 7791–7801. https://doi.org/10.1109/TⅡ.2020.2972918 doi: 10.1109/TⅡ.2020.2972918
    [13] M. Bicakci, O. Ayyildiz, Z. Aydin, A. Basturk, S. Karacavus, B. Yilmaz, Metabolic imaging based sub-classification of lung cancer, IEEE Access, 8 (2020), 218470–218476. https://doi.org/10.1109/ACCESS.2020.3040155 doi: 10.1109/ACCESS.2020.3040155
    [14] Y. Chen, Y. Wang, F. Hu, L. Feng, T. Zhou, C. Zheng, LDNNET: Towards robust classification of lung nodule and cancer using lung dense neural network, IEEE Access, 9 (2021), 50301–50320. http://doi.org/10.1109/ACCESS.2021.3068896 doi: 10.1109/ACCESS.2021.3068896
    [15] M. Li, X. Ma, C. Chen, Y. Yuan, S. Zhang, Z. Yan, et al., Research on the auxiliary classification and diagnosis of lung cancer subtypes based on histopathological images, IEEE Access, 9 (2021), 53687–53707. https://doi.org/10.1109/ACCESS.2021.3071057 doi: 10.1109/ACCESS.2021.3071057
    [16] E. A. Siddiqui, V. Chaurasia, M. Shandilya, Detection and classification of lung cancer computed tomography images using a novel improved deep belief network with Gabor filters, Chemom. Intell. Lab. Syst., 235 (2023), 104763. https://doi.org/10.1016/j.chemolab.2023.104763 doi: 10.1016/j.chemolab.2023.104763
    [17] A. K. Ajai, A. Anitha, Clustering based lung lobe segmentation and optimization-based lung cancer classification using CT images, Biomed. Signal Process. Control, 78 (2022), 103986. https://doi.org/10.1016/j.bspc.2022.103986 doi: 10.1016/j.bspc.2022.103986
    [18] A. R. Bushara, R. S. Vinod Kumar, S. S. Kumar, LCD-capsule network for the detection and classification of lung cancer on computed tomography images, Multimedia Tools Appl., 2023 (2023), 1–20. https://doi.org/10.1007/s11042-023-14893-1 doi: 10.1007/s11042-023-14893-1
    [19] D. S. Manoharan, A. Sathesh, Improved version of graph-cut algorithm for CT images of lung cancer with clinical property condition, J. Artif. Intell., 2 (2020), 201–206. https://doi.org/10.36548/jaicn.2020.4.002 doi: 10.36548/jaicn.2020.4.002
    [20] A. Alsadoon, G. Al-Naymat, A. H. Osman, B. Alsinglawi, M. Maabreh, M. R. Islam, DFCV: A framework for evaluation deep learning in early detection and classification of lung cancer, Multimedia Tools Appl., 93 (2023), 1–44. https://doi.org/10.1007/s11042-023-15238-8 doi: 10.1007/s11042-023-15238-8
    [21] M. Braveen, S. Nachiyappan, R. Seetha, K. Anusha, A. Ahilan, A. Prasanth, et al., ALBAE feature extraction-based lung pneumonia and cancer classification, Soft Comput., 155 (2023), 1–14. https://doi.org/10.1007/s00500-023-08453-w doi: 10.1007/s00500-023-08453-w
    [22] Y. Chen, C. Liu, W. Huang, S. Cheng, R. Arcucci, Z. Xiong, Generative text-guided 3D vision-language pretraining for unified medical image segmentation, arXiv preprint, (2023), arXiv: 2306.04811. https://doi.org/10.48550/arXiv.2306.04811
    [23] Z. Qin, H. Yi, Q. Lao, K. Li, Medical image understanding with pre-trained vision language models: A comprehensive study, arXiv preprint, (2022), arXiv: 2209.15517. https://doi.org/10.48550/arXiv.2209.15517
    [24] Z. Wan, C. Liu, M. Zhang, J. Fu, B. Wang, S. Cheng, et al., Med-UniC: Unifying cross-lingual medical vision-language pre-training by diminishing bias, arXiv preprint, (2023), arXiv: 2305.19894. https://doi.org/10.48550/arXiv.2305.19894
    [25] M. Lavanya, P. Muthu Kannan, Lung cancer segmentation and diagnosis of lung cancer staging using MEM (modified expectation maximization) algorithm and artificial neural network fuzzy inference system (ANFIS), Biomed. Res., 29 (2018), 2919–2924. https://doi.org/10.4066/biomedicalresearch.29-18-740 doi: 10.4066/biomedicalresearch.29-18-740
    [26] F. Mirzakhani, Detection of lung cancer using multilayer perceptron neural network, Med. Technol. J., 1 (2017), 109. http://doi.org/10.26415/2572-004X-vol1iss4p109 doi: 10.26415/2572-004X-vol1iss4p109
    [27] N. Shukla, A. Narayane, A. Nigade, K. Yadav, H. Mhaske, Lung cancer detection and classification using Support Vector Machine, Int. J. Adv. Trends Comput. Sci. Eng., 4 (2015), 14983–14986. http://doi.org/10.18535/Ijecs/v4i11.20 doi: 10.18535/Ijecs/v4i11.20
    [28] M. Grace John, S. Baskar, Extreme learning machine algorithm-based model for lung cancer classification from histopathological real-time images, Comput. Intell., 2023 (2023). https://doi.org/10.1111/coin.12576 doi: 10.1111/coin.12576
    [29] F. Zhu, Z. Gao, C. Zhao, Z. Zhu, J. Tang, Y. Liu, et al., Semantic segmentation using deep learning to extract total extraocular muscles and optic nerve from orbital computed tomography images, Optik, 244 (2021), 167551. https://doi.org/10.1016/j.ijleo.2021.167551 doi: 10.1016/j.ijleo.2021.167551
    [30] Y. Song, J. Liu, X. Liu, J. Tang, COVID-19 infection segmentation and severity assessment using a self-supervised learning approach, Diagnostics, 12 (2022), 1805. https://doi.org/10.3390/diagnostics12081805 doi: 10.3390/diagnostics12081805
    [31] M. A. Heuvelmans, P. M. A. van Ooijen, S. Ather, C. F. Silva, D. Han, C. P. Heussel, et al., Lung cancer prediction by deep learning to identify benign lung nodules, Lung Cancer, 154 (2021), 1–4. https://doi.org/10.1016/j.lungcan.2021.01.027 doi: 10.1016/j.lungcan.2021.01.027
  • This article has been cited by:

    1. Manuel Adrian Acuña-Zegarra, Daniel Olmos-Liceaga, Jorge X. Velasco-Hernández, The role of animal grazing in the spread of Chagas disease, 2018, 457, 00225193, 19, 10.1016/j.jtbi.2018.08.025
    2. Lauren A. White, James D. Forester, Meggan E. Craft, Thierry Boulinier, Dynamic, spatial models of parasite transmission in wildlife: Their structure, applications and remaining challenges, 2018, 87, 00218790, 559, 10.1111/1365-2656.12761
    3. Bruce Y. Lee, Sarah M. Bartsch, Laura Skrip, Daniel L. Hertenstein, Cameron M. Avelis, Martial Ndeffo-Mbah, Carla Tilchin, Eric O. Dumonteil, Alison Galvani, Ricardo E. Gürtler, Are the London Declaration’s 2020 goals sufficient to control Chagas disease?: Modeling scenarios for the Yucatan Peninsula, 2018, 12, 1935-2735, e0006337, 10.1371/journal.pntd.0006337
    4. Vanessa Steindorf, Norberto Aníbal Maidana, Modeling the Spatial Spread of Chagas Disease, 2019, 81, 0092-8240, 1687, 10.1007/s11538-019-00581-5
    5. Britnee A. Crawford, Christopher M. Kribs-Zaleta, Gaik Ambartsoumian, Invasion Speed in Cellular Automaton Models for T. cruzi Vector Migration, 2013, 75, 0092-8240, 1051, 10.1007/s11538-013-9840-7
    6. Christopher M. Kribs, Christopher Mitchell, Host switching vs. host sharing in overlapping sylvaticTrypanosoma cruzitransmission cycles, 2015, 9, 1751-3758, 247, 10.1080/17513758.2015.1075611
    7. N. El Saadi, A. Bah, T. Mahdjoub, C. Kribs, On the sylvatic transmission of T. cruzi, the parasite causing Chagas disease: a view from an agent-based model, 2020, 423, 03043800, 109001, 10.1016/j.ecolmodel.2020.109001
    8. Cheol Yong Han, Habeeb Issa, Jan Rychtář, Dewey Taylor, Nancy Umana, Marc Choisy, A voluntary use of insecticide treated nets can stop the vector transmission of Chagas disease, 2020, 14, 1935-2735, e0008833, 10.1371/journal.pntd.0008833
    9. Daniel Olmos, Ignacio Barradas, David Baca-Carrasco, On the Calculation of R0
    R 0 Using Submodels, 2017, 25, 0971-3514, 481, 10.1007/s12591-015-0257-7
    10. Md. Abdul Hye, M. A. Haider Ali Biswas, Mohammed Forhad Uddin, Mohammad Saifuddin, Mathematical Modeling of Covid-19 and Dengue Co-Infection Dynamics in Bangladesh: Optimal Control and Data-Driven Analysis, 2022, 33, 1046-283X, 173, 10.1007/s10598-023-09564-7
    11. A. Omame, H. Rwezaura, M. L. Diagne, S. C. Inyama, J. M. Tchuenche, COVID-19 and dengue co-infection in Brazil: optimal control and cost-effectiveness analysis, 2021, 136, 2190-5444, 10.1140/epjp/s13360-021-02030-6
    12. Edem Fiatsonu, Rachel E. Busselman, Gabriel L. Hamer, Sarah A. Hamer, Martial L. Ndeffo-Mbah, Luisa Magalhães, Effectiveness of fluralaner treatment regimens for the control of canine Chagas disease: A mathematical modeling study, 2023, 17, 1935-2735, e0011084, 10.1371/journal.pntd.0011084
    13. H. Rwezaura, S.Y. Tchoumi, J.M. Tchuenche, Impact of environmental transmission and contact rates on Covid-19 dynamics: A simulation study, 2021, 27, 23529148, 100807, 10.1016/j.imu.2021.100807
    14. Malicki Zorom, Babacar Leye, Mamadou Diop, Serigne M’backé Coly, Metapopulation Modeling of Socioeconomic Vulnerability of Sahelian Populations to Climate Variability: Case of Tougou, Village in Northern Burkina Faso, 2023, 11, 2227-7390, 4507, 10.3390/math11214507
    15. Xuan Dai, Xiaotian Wu, Jiao Jiang, Libin Rong, Modeling the impact of non-human host predation on the transmission of Chagas disease, 2024, 00255564, 109230, 10.1016/j.mbs.2024.109230
    16. M. Adrian Acuña-Zegarra, Mayra R. Tocto-Erazo, Claudio C. García-Mendoza, Daniel Olmos-Liceaga, Presence and infestation waves of hematophagous arthropod species, 2024, 376, 00255564, 109282, 10.1016/j.mbs.2024.109282
  • Reader Comments
  • © 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(2390) PDF downloads(185) Cited by(8)

Figures and Tables

Figures(5)  /  Tables(5)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog