Research article

Red and white blood cell classification using Artificial Neural Networks

  • Blood cell classification is a recent topic for scientists working on the diagnosis of blood cell related illnesses. As the number of computer vision (CV) applications is increasing to improve quality of human life, it spreads in the areas of autonomous drive, surveillance, robotic applications, telecommunications and etc. The number of CV applications also increases in the medical sector due the decreasing value of doctors per patient ratio (DPPR) in urban and suburban areas. A doctor working in such areas sometimes would have to interpret thousands of patients’ test results in a day. This condition would result disadvantages such as false diagnosis on patients and break on working motivations for doctors. Some of the tests would probably be interpreted using an application developed by Artificial Neural Networks (ANN). Tests related to blood cells are examined for the patients as a starting point of diagnosis and information obtained about their abnormalities give doctors a preliminary idea about the illnesses. This article issues generation of a CV application that would be used as an assistant of doctors who have domain expertise. The article issues segmentation of blood cells, classification of red and white blood cells containing 6 types such as erythrocyte, lymphocyte, platelets, neutrophil, monocytes and eosinophils using the segmentation results. It also discusses about a method for detection of abnormalities on red blood cells (erythrocyte).

    Citation: Simge Çelebi, Mert Burkay Çöteli. Red and white blood cell classification using Artificial Neural Networks[J]. AIMS Bioengineering, 2018, 5(3): 179-191. doi: 10.3934/bioeng.2018.3.179

    Related Papers:

    [1] Christopher M. Kribs-Zaleta . Alternative transmission modes for Trypanosoma cruzi . Mathematical Biosciences and Engineering, 2010, 7(3): 657-673. doi: 10.3934/mbe.2010.7.657
    [2] Xing Zhang, Zhitao Li, Lixin Gao . Stability analysis of a SAIR epidemic model on scale-free community networks. Mathematical Biosciences and Engineering, 2024, 21(3): 4648-4668. doi: 10.3934/mbe.2024204
    [3] Ruixia Zhang, Shuping Li . Analysis of a two-patch SIS model with saturating contact rate and one- directing population dispersal. Mathematical Biosciences and Engineering, 2022, 19(11): 11217-11231. doi: 10.3934/mbe.2022523
    [4] Zhilan Feng, Robert Swihart, Yingfei Yi, Huaiping Zhu . Coexistence in a metapopulation model with explicit local dynamics. Mathematical Biosciences and Engineering, 2004, 1(1): 131-145. doi: 10.3934/mbe.2004.1.131
    [5] Jinliang Wang, Ran Zhang, Toshikazu Kuniya . A note on dynamics of an age-of-infection cholera model. Mathematical Biosciences and Engineering, 2016, 13(1): 227-247. doi: 10.3934/mbe.2016.13.227
    [6] Yan-Xia Dang, Zhi-Peng Qiu, Xue-Zhi Li, Maia Martcheva . Global dynamics of a vector-host epidemic model with age of infection. Mathematical Biosciences and Engineering, 2017, 14(5&6): 1159-1186. doi: 10.3934/mbe.2017060
    [7] Nariyuki Nakagiri, Hiroki Yokoi, Yukio Sakisaka, Kei-ichi Tainaka, Kazunori Sato . Influence of network structure on infectious disease control. Mathematical Biosciences and Engineering, 2025, 22(4): 943-961. doi: 10.3934/mbe.2025034
    [8] Karl Peter Hadeler . Structured populations with diffusion in state space. Mathematical Biosciences and Engineering, 2010, 7(1): 37-49. doi: 10.3934/mbe.2010.7.37
    [9] Xia Wang, Yuming Chen . An age-structured vector-borne disease model with horizontal transmission in the host. Mathematical Biosciences and Engineering, 2018, 15(5): 1099-1116. doi: 10.3934/mbe.2018049
    [10] Chandrani Banerjee, Linda J. S. Allen, Jorge Salazar-Bravo . Models for an arenavirus infection in a rodent population: consequences of horizontal, vertical and sexual transmission. Mathematical Biosciences and Engineering, 2008, 5(4): 617-645. doi: 10.3934/mbe.2008.5.617
  • Blood cell classification is a recent topic for scientists working on the diagnosis of blood cell related illnesses. As the number of computer vision (CV) applications is increasing to improve quality of human life, it spreads in the areas of autonomous drive, surveillance, robotic applications, telecommunications and etc. The number of CV applications also increases in the medical sector due the decreasing value of doctors per patient ratio (DPPR) in urban and suburban areas. A doctor working in such areas sometimes would have to interpret thousands of patients’ test results in a day. This condition would result disadvantages such as false diagnosis on patients and break on working motivations for doctors. Some of the tests would probably be interpreted using an application developed by Artificial Neural Networks (ANN). Tests related to blood cells are examined for the patients as a starting point of diagnosis and information obtained about their abnormalities give doctors a preliminary idea about the illnesses. This article issues generation of a CV application that would be used as an assistant of doctors who have domain expertise. The article issues segmentation of blood cells, classification of red and white blood cells containing 6 types such as erythrocyte, lymphocyte, platelets, neutrophil, monocytes and eosinophils using the segmentation results. It also discusses about a method for detection of abnormalities on red blood cells (erythrocyte).


    Traffic accidents are the third leading cause of death in the United States according to the Centers for Disease Control and Prevention [1]. An unnoticeable fact is that the quality of tires has caused a lot of traffic accidents. The National Highway Traffic Safety Administration reports an average of nearly 11,000 accidents related to tire failures each year [2]. Undoubtedly, the quality of tires not only affects the safety of riders, but also affects the lives and property of every road traveler. In the tire production process, quality inspection can prevent consumers from using unqualified products. Currently, the main market share is dominated by radial tires. This type of tire not only has multiple layers of belt attached to the crown, but also has a tightly wound wire coil, which significantly improves the strength and stability of the ring [3,4]. Radial tires require a complex production process, which increases the possibility of various quality defects [5,6]. For this type of tire, the most popular nondestructive detection scheme is to place the tire product in an X-ray room for irradiation.The resulting 2-D grayscale image is measured according to enterprise standards. Although a large number of enterprises use artificial visual inspection, such a process has significant disadvantages in terms of efficiency and accuracy. For now, unattended defect detection based on computer vision is the trend.

    To meet the needs of intelligent detection, many researchers have studied tire X-ray images. Based on traditional image processing methods, researchers have proposed corresponding research results from multiple perspectives. Guo et al. [7] firstly evaluated the pixel-level texture similarity. Impurities in the sidewall and crown were then located by threshold treatment. However, pixel-level operation is extremely challenging for computational speed, especially when we consider that the size of X-ray images is quite large. Zhang et al. [8,9,10] have systematically studied the detection methods to detect impurities. Combining the total variation with the Curvelet transformation, the original image was split into texture and cartoon. Ultimately, impurities were located by cartoons [8]. A method of combining Curvelet with Canny operator was designed to extract the defect feature effectively [9]. Wavelet multiscale analysis was proposed in [10]. This method included local analysis and scale feature extraction to separate defect from background. There were obvious differences between the local inverse differential moment characteristics of the normal area and the impurity area. Therefore, [11] used this principle to detect significant impurities. Different tire types are significantly different in the amount of raw material, resulting in differences in the brightness of the grayscale image after X-ray irradiation. The above traditional defect detection methods need to set relevant algorithm parameters according to the detection model. Therefore, these methods are particularly sensitive to the brightness of grayscale images. The first motivation of this paper is to design a tire defect detection mathod. Specifically, it reduces the impact of image brightness on the detection results, while taking into account the defect detection of various types of tires.

    In recent years, deep learning technology has been widely used in image classification [12,13], target recognition [14,15], semantic segmentation [16,17] and the other fields. Some researchers have also applied deep learning technology to a variety of visual detection tasks [18], such as fabric, welding, and tire internal defect detection. In [19], the convolution neural network based on ResNet [20] learned the texture features of the fabric, making it possible to accurately locate small defects on the fabric. Aiming at the target of automatic location of welding defects, [21] proposed an improved U-Net network. The network combines random cropping and preprocessing methods to effectively expand the data set. In [22], an end-to-end tire X-ray image defect classification network called TireNet was proposed. A new network realizes the representation of defect feature as a part of the downstream classification network module. TireNet achieved a recall rate of 94.7% on a data set composed of more than 10 defect types. A convolutional network based on supervised feature embedding was proposed in [23]. This method effectively improved the accuracy of tire X-ray image classification based on AlexNet. According to the above introduction, the application of deep learning model in visual detection tasks mainly focuses on distinguishing the types of defects. Few studies have been done to determine the level of defects. The task of defect detection in tire factory is to judge defect level on the basis of defect identification. Unqualified products are usually classified as defective products or scraps according to the degree of defect. Thus, the second research motivation of this paper is the application of deep learning network for accurate judgment of defective products and scraps.

    Error defects tend to occur in the bead toe. Whether an error defect occurs is determined by the maximum pixel width and minimum pixel width of the bead toe. Moreover, in the actual detection of toe error defects, defective products and scraps need to be distinguished. Therefore, the current method of manual visual inspection requires accurate measurement to judge the defect level after initial assessment. Certainly, it seriously affects the defect detection speed of each tire. Fortunately, semantic segmentation network can realize pixel level classification. That is, it is satisfied to extract the bead toe and measure the pixel width. Naturally, semantic segmentation network is proposed to replace manual task of defect level determination. The idea of semantic segmentation can be traced back to the proposal of FCN [24]. Later, U-Net [25] achieved very high accuracy by using a specific jump connection structure, followed by many improved forms, such as [26,27,28,29]. With the optimization of the speed of semantic segmentation network, the concept of real-time semantic segmentation network was proposed. E-Net [30] designed a relatively small coding layer and a relatively large decoding network. This kind of network was characterized by reducing the number of parameters and greatly increasing the network speed. Bisenet [31] proposed a dual path network structure, in which two paths obtained high-resolution features and enough receptive field. STDC-Net [32] abandoned the method of extracting spatial information and context information separately. Instead, single-stream networks were used. The learning of spatial information was integrated into the bottom layer of the network, which further speeded up the output of the network. However, STDC-Net still has the possibility of improving the accuracy of pixel-level segmentation. Naturally, the third research motivation of this paper is to design a lightweight semantic segmentation network. The network has tradeoffs between segmentation accuracy and inference speed. To be specific, the detection of bread toe error is superior to manual visual inspection in both speed and accuracy.

    Aiming at the above three research motivations, this paper proposes a lightweight semantic segmentation network to realize the detection of bead toe error. In this paper, the shallow and deep texture information is stored in the feature map of each stage. We fuse multi-scale feature information in the decoding module. Finally, using size extensions, the output is decided by sufficient spatial and contextual information. Besides, an auxiliary supervision structure is added to improve the precision of class boundary segmentation without increasing model parameters.

    The main contributions of this paper are summarized as follows:

    1). A semantic segmentation network based on encoder and decoder structure is proposed. In the encoder, a weighted dense connection (WSTDC) module is proposed. In the decoder, a feature fusion structure using a chained residual pooling (CRP) is proposed. This structure uses pooling operation and small convolution kernel to replace the large convolution kernel. Spatial and contextual information is then expressed.

    2). We design an auxiliary boundary supervision method. The labels composed of border and background are designed based on the original three categories of labels. The auxiliary loss function calculates the difference between the feature maps and the boundary labels in the coding stage to correct the attention of the coding operation to the boundary. The main loss and auxiliary loss are super imposed to complete the network training in the end.

    3). Based on the calculation of mIoU, a more accurate index to measure the class boundary accuracy is proposed, that is, L-mIoU. On a self-made tire X-ray dataset, a result of the lightweight semantic segmentation network is impressive. To be more precise, 92.4% L-mIoU and 97.1% mIoU are achieved at 1.0 s.

    4). A deep learning method is applied to bead toe defect detection in this paper. It is the first method to identify whether an error defect occurs by calculating the boundary of the bead toe. The effect of on-line detection was simulated on a data set composed of defect samples and normal samples.

    Inspired by STDC-Net, in this work, we propose a lightweight semantic segmentation network. The height of a complete tire X-ray image is usually thousands of pixels and the width is 2469 pixels. If the original image is directly fed into the network, on the one hand, the operation speed is very bad, on the other hand, large image is more difficult to label. In order to complete the tire toe defect detection and reduce the computational cost, the key is to cut the area containing the left tire toe and the right tire toe. Considering that the width of the bead toe varies with different types, the 512 $ \times $ 512 image resolution is finally determined to ensure that the clipped subgraph includes bead ring, bead toe and sidewall, as shown in Figure 1. Obviously, each input to the network is a subgraph. This lightweight semantic segmentation network learns the texture feature of each area of the tire X-ray subgraph. As a result, the original input X-ray image is divided into three parts, as shown in Figure 2, representing the bead ring, bead toe, and sidewall respectively. The architecture of the network is mainly composed of an encoder, a decoder and an auxiliary boundary supervision module, as shown in Figure 3. In the network structure, the basis is the encoder, including multiple WSTDCs to achieve feature representation. Each WSTDC consists of a stack of multiple 3 $ \times $ 3 convolution layers followed by a nonlinear activation unit (Relu) and a batch standardization unit (BN). It is worth noting that the output of each convolution operation in WSTDC will be recorded. The output of this module is the combination of all the convolution operation records. After going through multiple WSTDCs, the image is eventually reduced to 1/32 of its original size. Each time the size is reduced, a new feature map is generated. The encoder outputs a number of feature maps of different sizes to form a complete spatial information semantic information. In the decoder, the expansion mechanism solves the problem of how to fuse different size feature maps from the encoder. The deep feature maps are merged with the shallow feature maps after passing through 5 $ \times $ 5 pooling layers and 1 $ \times $ 1 convolution layers. Crucially, the representation ability of deep spatial features affects the segmentation effect of boundary details. Based on this observation, we propose an auxiliary boundary supervision module to guide the deep module in the encoding stage by learning boundary details. The details of each module in the network are shown in the following description.

    Figure 1.  From the original tire X-ray image to the input image of a lightweight semantic segmentation network.
    Figure 2.  Tire images with resolution 512 $ \times $ 512. (a) A part of the X-ray image on the left side of the crown. (b) A part of the X-ray image on the right side of the crown.
    Figure 3.  A lightweight semantic segmentation network designed in this paper.

    According to the steps described above, the extraction of texture feature requires the use of advanced down-sampling modules. The recently proposed STDC module has gained our attention due to its strong extraction ability and small number of training parameters. So many improved STDC modules are stacked to form a forward feature learning path. There are two WSTDC modules involved in the path, referred to as WSTDC1 and WSTDC2, as shown in Figure 4. Including a convolutional layer with 1 $ \times $ 1 filters and 4 layers with 3 $ \times $ 3 filters are the same points of WSTDC1 and WSTDC2. The output image size of WSTDC1 is consistent with the input image size, while the output size of WSTDC2 is half of the input size. In this paper, WSTDC1 and WSTDC2 are combined to form a standard down-sampling module called double feature extraction blocks (DFEB). The key to this semantic segmentation network is that shallow blocks contain a lot of spatial information. As for deep feature maps, they mainly represent contextual information. To speed up the convergence speed of the training parameters in the DFEB and reduce redundant parameters, the output of each convolutional layer is fused to the output layer through jump connections. The weights of the intermediate feature maps are learnable. That is to say, the weight parameters change dynamically during the training process.The source of the output is shown in the following equation:

    $ [y1y2y3y4]=[w10000w20000w30000w4][x1x2x3x4]
    $
    (2.1)
    $ yout=f(y1,y2,y3,y4)
    $
    (2.2)
    Figure 4.  A lightweight semantic segmentation network designed in this paper. (a) represents the structure of WSTDC1(the output size is the same as the input size). (b) represents the structure of WSTDC2(the output size is reduced to half of the input size).

    where $ w_1, w_2, w_3, w_4 $ denotes the weights of the intermediate feature maps (i.e., $ x_1, x_2, x_3, x_4 $) from left to right, $ y_{out} $ denote WSTDC modules output, $ f $ indicates that the fusion operation is completed by concatenation.

    Therefore, the advantages of the WSTDC structure are mainly reflected in the fewer parameters and expression multi-scale features. The concatenation of multiple intermediate feature maps constitutes the output channel of the WSTDC module. Compared with the conventional channel transform, less convolution computation is required under the requirement of the same number of output channels. Convolution layers of different depths extract texture information of different sizes, that is, the texture contrast of each region is more distinct.

    Considering the relationship between the richness of texture feature, the number of DFEBs is set to 3. The encoder structure is shown in Table 1. On the whole, X-ray image processing with a resolution of 1 $ \times $ 512 $ \times $ 512 consists of 6 stages. Stages 1 and 2 use a structure that only contains a convolutional layer followed by a batch normalization and a non-linear activation unit. Such a simple design shows favorable feature extraction capabilities in the following experiments. Stage 3 includes 2 convolutional layers that behave as channel expansions without size changes. From stages 4 to 6, three DFEBs are used to refine the original input feature map, so that the image size is reduced to 1/8, 1/16 and 1/32 in turn. In the end, the number of channels eventually expands to 1024 in stage 6. In the experiment, we found that the combination of WSTDC1 and WSTDC2 can meet the needs of texture expression. More importantly, this structure has more advantages in terms of controlling the number of parameters.

    Table 1.  The structure of each stage in the encoder.
    Stage Operation Output size Kernel size Stride
    stage1 Conv2d 64 $ \times $ 256 $ \times $ 256 3 2
    stage2 Conv2d 128 $ \times $ 128 $ \times $ 128 3 2
    stage3 Double Conv2d 256 $ \times $ 128 $ \times $ 128 3 1
    stage4 WSTDC1/WSTDC2 256 $ \times $ 64 $ \times $ 64 3 2/1
    stage5 WSTDC1/WSTDC2 512 $ \times $ 32 $ \times $ 32 3 2/1
    stage6 WSTDC1/WSTDC2 1024 $ \times $ 16 $ \times $ 16 3 2/1

     | Show Table
    DownLoad: CSV

    Inspired by [33], the optimization of the chained residual pooling module is shown in the reduction of the convolution kernel size. Specifically, a 3 $ \times $ 3 convolution kernel is replaced with consideration for computational speed. Experiments show that the 1 $ \times $ 1 convolution does not significantly lose the texture information contained in the feature maps. In fact, the CRP module is a stack of 5 $ \times $ 5 pooling operation and 1 $ \times $ 1 convolution operation, as shown in Figure 5. It should be noted that the number of stacked layers is set to 3 in this paper. The 1 $ \times $ 1 convolution kernel instead of the 3 $ \times $ 3 convolution kernel loses the local receptive field to a certain extent. However, the use of the large pooling kernel and the jump connection weakenes this kind of loss.

    Figure 5.  A new CRP module.

    The output feature maps of down-sampling encoding stages 1 to 6 are the input source of the up-sampling fusion operation. We recorded the decoder details as shown in Table 2. In summary, the output feature maps of the current stage is fused with the feature maps of the previous stage, that is, the up-sampling decoding operation is realized by constructing a U-shaped structure. It should be mentioned that the output feature maps of the two stages need to be transformed through CRP module after being merged. In our structure, only the output feature maps of stage 1 is not fused by pixel-level addition operation. Instead, pixel-level concatenation operation is used. Add operation and Cat operation represent pixel-level addition and concatenation, respectively. Also, a double-layer 3 $ \times $ 3 convolution is placed behind the feature fusion layer. When it comes to the benefits of the concatenation, the direct advantage is that the number of channels can be rapidly expanded. Obviously, the higher dimensional space is suitable for separating the details of texture differences. Finally, the pixel-level classifier makes the number of channels to 3.

    Table 2.  Decoding structure embedded in the lightweight network. The two input branches come from the output feature maps of two adjacent encoding stages.
    Input size_1 Operation_1 Input size_2 Operation_2 Fusion Output size
    1024 $ \times $ 16 $ \times $ 16 1 $ \times $ 1 Conv2d 512 $ \times $ 32 $ \times $ 32 1 $ \times $ 1 Conv2d Add/CRP 256 $ \times $ 64 $ \times $ 64
    CRP 1 $ \times $ 1 Conv2d
    Upsample Upsample
    256 $ \times $ 64 $ \times $ 64 1 $ \times $ 1 Conv2d 256 $ \times $ 64 $ \times $ 64 * Add/CRP 128 $ \times $ 128 $ \times $ 128
    1 $ \times $ 1 Conv2d
    Upsample
    256 $ \times $ 128 $ \times $ 128 1 $ \times $ 1 $ \times $ Conv2d 128 $ \times $ 128 $ \times $ 128 * Add/CRP 64 $ \times $ 256 $ \times $ 256
    1 $ \times $ 1 Conv2d
    Upsample
    64 $ \times $ 256 $ \times $ 256 * 64 $ \times $ 256 $ \times $ 256 * Cat 32 $ \times $ 512 $ \times $ 512
    3 $ \times $ 3 Conv2d
    Upsample
    Pixel-level classifier

     | Show Table
    DownLoad: CSV

    In order to improve the accuracy of class boundary supervision in semantic segmentation tasks, many contributors put forward their own ideas. These methods generally train a separate network for boundary supervision. Inspired by MSFNet [34], a module to supervise the background and boundary of the encoder is designed. In detail, an auxiliary supervision module is placed behind the stage 5, where the feature maps have compressed spatial information. Before using the auxiliary boundary supervision module, we use sobel operator to generate labels composed of boundary and background according to the region marker of three textures, as shown in Figure 6. The vertical boundary is concerned while ignoring the horizontal Sobel operator. Subsequently, all boundary distribution maps are obtained. Finally, the value 0.9 is used as the threshold value for transforming boundary a distribution map into a new label, that is, this kind of label containing boundaries and backgrounds can be constructed. When a 64-channel feature map is converted into a single-channel boundary detail map, the two-layer 3 × 3 convolution in the auxiliary boundary supervision model is used. In order to optimize the boundary prediction capability of the encoder, the cross-entropy loss and DICE loss are calculated based on the true boundary and the predicted boundary. What needs to be emphasized is that the auxiliary boundary supervision greatly improves the bias of deep network to boundary difference.

    Figure 6.  (a) Position marker for three textures of 512 × 512 tire image. (b) A label consisting of a boundary and a background that contains two categories.

    The output of the encoder at stage 5 is supervised by the auxiliary loss function to enhance the spatial boundary information in the deep network. In addition, the main loss function supervises the output of the decoder. In detail, the cross entropy is adopted in the main loss function, while the auxiliary loss includes DICE loss and cross entropy loss. As shown in Eq (2.3), a weight coefficient is set to balance the main loss and auxiliary loss. In this paper, a good semantic segmentation effect can be observed when $ \beta = 0.96 $.

    $ loss=Lm(y1,y1)+βLa(y2,y2)
    $
    (2.3)

    where $ y_1 $ and $ y_{1}^{'} $ represent the labels of three categories and the output of the decoder respectively. The other term represents the border label and the output of the auxiliary supervision module.

    In this section, we conduct experiments on real tire X-ray image data. The main work is to compare the performance of this lightweight network with several classic networks in terms of speed and accuracy. Considering that the accuracy of the bead toe boundary measurement depends entirely on the positioning effect of the toe, semantic segmentation index indirectly reflects the detection effect of toe error defects. Essentially, the output of the semantic segmentation network is pixel-level markup of bead ring, toe and sidewall. As shown in Figure 7, the output subgraphs belonging to the same original tire X-ray image are joined together, where the number of subgraphs is represented by N, thus the coordinates of the left and right toe boundaries can be calculated. When the toe error defect exists, the coordinate difference will be abrupt, as shown in Figure 8. Toe error defects are considered to occur when the coordinate difference between position A and position B is greater than the set standard. It should be noted that the standard of defects is expressed in millimeters, so pixel coordinates are converted to millimeters according to the scale bar. According to the standard, it can be further determined whether the defect level is a defective product or a scrap. Besides, bead toe error defects may occur on both the left and right side of the tire.

    Figure 7.  Method for constructing bead toe boundary coordinates.
    Figure 8.  Calculation principle of toe error defect.

    In this work, all hardware environments are Intel(R) Core (TM) i7-8700K CPU @ 3.70GHz, 16.0 GB RAM and NVIDIA GEFORCE GTX 2080 GPU. Each network participating in training uses a stochastic gradient descent (SGD) with a momentum of 0.9, a batch size of 4, and a weight decay of 1e-5. The initial learning rate is 1e-4. Crucially, the goal of semantic segmentation task is to label the input image into three categories, so it is sufficient that the total training epoch is set to 30.

    The data set is divided into training set and test set, accounting for 1218 and 854, respectively. In the test set, 554 images are used for the evaluation of semantic segmentation metrics. Accordingly, the remaining 300 images are used to measure the accuracy of defect diagnosis. In detail, the left bead toe and the right bead toe each account for one-half of the images. The ground-truth of input image contains three parts, in which the bead ring, the bead toe and the sidewall are represented by class name 0, 1 and 2 respectively. We know that the height of the X-ray image reaches 512 pixels, so using the original image with a resolution of 512 $ \times $ 512 will cause difficulty in marking the top and bottom when making the label. To this end, 200 redundant pixels were filled at the top and bottom of the original image in the sample making process, as shown in Figure 9. Once the marking is complete, the step of eliminating redundant pixels is very easy.

    Figure 9.  (a) Input image to be marked on the left. (b) Input image to be marked on the right.

    mIoU is to evaluate the prediction accuracy of semantic segmentation network for all output pixels. It has a good effect on the evaluation of conventional pixel-level classification tasks. However, the lightweight semantic segmentation network proposed here is intended to locate the boundaries of the bead toe. In other words, we are more concerned with the classification effect of pixels around the toe boundary. The bead ring is particularly different from the toe and sidewall, which can be distinguished from the gray value. Therefore, pixel-level segmentation of bead ring is not challenging for most semantic segmentation networks. However, the difference in pixel gray levels between the sidewall and the bead toe is no longer obvious, that is, the gray level information cannot be simply considered. In this case, the segmentation accuracy of the toe and sidewall boundary is the key to evaluating the semantic segmentation network. Our local mIoU came into being, it only pays attention to the local real label with a width of 60 pixels. Local real labels come from two boundary regions as does network prediction output. Figure 10 shows the recording rules for the local prediction output.

    Figure 10.  (a) The prediction of the left bead toe by the semantic segmentation network. (b) The prediction of the right bead toe.

    U-Net is a high-precision semantic segmentation network, but the training parameters are more complicated. STDC-Net is a network structure oriented to real time semantic segmentation scenes. It compresses parameters and adversely affects the segmentation effect of the target boundary. We compared the pixel-level classification accuracy of the U-Net, STDC-Net, RefineNet and our lightweight network on the test set. It can be found from Table 3 that the mIoU value is significantly higher than the L-mIoU value in each model. For example, the mIoU in the U-Net model reaches 96.8%, while the L-mIoU is 91.9%. It shows that for the evaluation of class boundaries, L-mIoU is closer to the true capabilities of the network. As a real-time semantic segmentation network, STDC-Net has certain deficiencies in the segmentation of boundaries. Without auxiliary boundary supervision, our model is lower than the U-Net model in both mIoU and L-mIoU. Refinenet-LW-50 is a lightweight semantic segmentation network designed on the basis of RefineNet, which has a good effect on the expression of spatial information. In the network design of this paper, we use the feature fusion module of Refinenet-LW-50 for reference.

    Table 3.  Comparisons with other popular networks on the test set.
    Network mIoU (%) L-mIoU (%) Parameters (MB) Running time (s)
    U-Net[25] 96.8 91.9 65.87 1.8
    STDC-Net[32] 92.5 81.9 52.96 0.8
    RefineNet-LW-50[33] 96.9 91.8 104.18 1.2
    Our method (without supervision) 96.1 90.6 55.58 1.0

     | Show Table
    DownLoad: CSV

    In addition, the size and operating speed of the lightweight network are also evaluated. The running time of the network is averaged by testing 554 X-ray images with a resolution of 8000 $ \times $ 2469. As for the running time, the original image is first segmented into subgraphs containing the bead toe. Then predicting them at the pixel level. Finally, the boundary coordinates of the bead toe are recorded. As shown in Table 3, U-Net has excellent performance in the accuracy mentioned above, but the large number of parameters leads to the longer running time. Despite the STDC-Net is very fast, the accuracy of pixel-level prediction is not up to our requirements. RefineNet-LW-50 is inferior to the lightweight network proposed in this paper in terms of speed and accuracy. With training parameters of 55.58 MB, this excellent lightweight semantic segmentation network is 40% faster than U-Net. In summary, speed and accuracy can be traded off using a lightweight semantic segmentation network.

    In order to continue to enhance the performance of the new method, as shown in Table 4, the auxiliary boundary supervision module is added. The addition of auxiliary supervision modules to three classic semantic segmentation networks can help improve the pixel-level segmentation effect. However, the classical network does not design a special module for boundary extraction, resulting in a weaker improvement effect than the network structure proposed in this paper. Our method (S5) means that the auxiliary boundary supervision is placed after 1/16 down sampling. At this time, L-mIoU reaches 92.4%, which exceeds the pixel-level segmentation accuracy of U-Net. In the experiment, we test the effect of the auxiliary boundary supervision function at different positions. Adding auxiliary supervision modules after stage 1 to stage 5 will help advance the accuracy of boundary segmentation. Actually, for the encoder, the resolution of the shallow feature maps is relatively large, so that the loss of spatial information is less. Instead, the spatial information in the deep feature maps has severe loss. It is necessary to capture both shallow spatial information and deep semantic information at the same time. Therefore, this paper designs a decoding structure with preference for boundary information. The experimental results shows that the two branches of feature fusion (shallow feature maps and deep feature maps) by deep auxiliary supervision have rich spatial information.

    Table 4.  Position test of auxiliary boundary supervision module. S1, S2, S3, S4 and S5 respectively indicate that the auxiliary supervision module is placed after the output of the encoder from stage 1 to stage 5. In the three classical networks, the auxiliary supervision module is added to the encoder after the feature map with a resolution of 32 × 32.
    Network Input resolution mIoU(%) L-mIoU(%)
    UNet + Boundary loss 1 $ \times $ 512 $ \times $ 512 97.0 92.2
    STDC-Net + Boundary loss 1 $ \times $ 512 $ \times $ 512 92.7 82.3
    RefineNet-LW-50 + Boundary loss 1 $ \times $ 512 $ \times $ 512 97.1 92.0
    Our method + Boundary loss(S1) 1 $ \times $ 512 $ \times $ 512 96.6 91.6
    Our method + Boundary loss(S2) 1 $ \times $ 512 $ \times $ 512 96.5 91.1
    Our method + Boundary loss(S3) 1 $ \times $ 512 $ \times $ 512 96.5 91.4
    Our method + Boundary loss(S4) 1 $ \times $ 512 $ \times $ 512 96.9 91.6
    Our method + Boundary loss(S5) 1 $ \times $ 512 $ \times $ 512 97.1 92.4

     | Show Table
    DownLoad: CSV

    Of course, the visual effect comparison after adding auxiliary boundary supervision module is indispensable. As shown in Figure 11, overwhelmingly, the results show that after adding the auxiliary supervision module, the segmentation effect of various algorithms in the boundary area has been greatly improved. There are encouraging signs that segmentation effect of our method near the boundary is almost consistent with the marked map.

    Figure 11.  Comparison of visual effects between our method and popular methods.

    Artificial judgment results only distinguish between normal and defect types. However, artificial evaluation result need to further distinguish defect levels on the basis of identifying defects. For a dataset of 300 tire x-ray images, the normal sample is exactly two thirds. Scrap and defective products occupy 20 and 80 respectively.

    $ CR=HJAJS
    $
    (3.1)
    $ PR=HLEALES
    $
    (3.2)
    $ RMR=NSd
    $
    (3.3)

    In Eq (3.1), $ CR $ is defined as correct rate. $ HJ $ and $ AJ $ represent the artificial judgement result and the algorithm judgement result for the input image, respectively. $ S $ stands for the total number of input images. In Eq (3.2), $ PR $ is defined as Precision ratio. $ HLE $ represent the artificial evaluation result for two defect levels. $ ALE $ stands for the evaluation result of the algorithm for defect levels. In Eq (3.3), $ RMR $ is defined as rate of missing repor. $ N $ represents the number of defect samples that were not identified. $ S_d $ stands for the number of defect samples.

    After the position of each region is predicted according to the semantic segmentation network, the coordinates of the border of the toe can be obtained. Then, calculated by the ratio of the distance on the image to the actual distance, the defect level is judged against the defect standard. Due to the known error defect distribution of tire X-ray images, the advantages of our proposed algorithm in automatic detection can be directly verified. As shown in Table 5, in the three evaluation indexes of bread toe error detection, the algorithm proposed in this paper is at an advanced level, which are 91.3, 90.4 and 1.4%, respectively.

    Table 5.  Comparison of toe error detection results.
    Network CR (%) PR (%) RMR(%)
    UNet + Boundary loss 89.7 88.3 1.8
    STDC-Net + Boundary loss 75.4 71.6 3.5
    RefineNet-LW-50 + Boundary loss 88.5 87.9 2.2
    Our method + Boundary loss(S5) 91.3 90.4 1.4

     | Show Table
    DownLoad: CSV

    In this paper, we have rediscussed the classical semantic segmentation network, U-Net and real-time semantic segmentation network, STDC-Net. On this basis, the shortcomings of U-Net in speed and STDC-Net in accuracy have been optimized. A new lightweight semantic segmentation network has been proposed, which is similar to U-Net in precision and at the same time similar to STDC-Net in speed. Our encoder is the fine tuning of STDC-Net. Talking about the idea of the decoder, it is to fuse the deep feature maps with the shallow feature maps, so that the feature maps contain multi-scale spatial information. It is key to search the boundary of the bead toe for error defect detect. Therefore, the pixel-level boundary segmentation capability is an inevitable requirement for the network. An auxiliary supervision method on the vertical direction has been proposed to enhance the pixel-level classification performance for the class boundary. Ultimately, the bead toe boundary coordinates have been located by running our lightweight semantic segmentation network. By comparing the difference between the edge coordinates and the standard, it can be judged whether there is a toe error defect and the defect level.

    The authors declare there is no conflict of interest.

    [1] Shapiro MF, Sheldon G (1987) The Complete Blood Count and Leukocyte Differential Count: An Approach to Their Rational Application. J Emerg Med 106: 65–74.
    [2] Lynch E C. (1990) Peripheral blood smear. Butterworths, Boston: Pubmed, 90: 1373–1377.
    [3] Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57: 97–109. doi: 10.1093/biomet/57.1.97
    [4] Ng HP, Ong SH, Foong KWC, et al. (2006) Medical image segmentation using k-means clustering and improved watershed algorithm. IEEE Southwest Symp Image Anal Interpret 106: 61–65.
    [5] Kurita T, Otsu N, Abdelmalek N (1992) Maximum likelihood thresholding based on population mixture models. Pattern Recogn 25: 1231–1240. doi: 10.1016/0031-3203(92)90024-D
    [6] Danielsson PE (1980) Euclidean distance mapping. Comput GraphImage Process 14: 227–248. doi: 10.1016/0146-664X(80)90054-4
    [7] Sobel I (1990) An isotropic 3 × 3 image gradient operator. Mach vision three dimensional scenes, 376–379.
    [8] Scotti F (2005) Automatic morphological analysis for acute leukemia identification in peripheral blood microscope images. IEEE Int Conf Computl Intelli Meas Syst Appl CIMSA. 25: 96–101
    [9] Sadeghian F, Seman Z, Ramli AR, et al. (2009) A framework for white blood cell segmentation in microscopic blood images using digital image processing. Biol Proced Online 11: 196. doi: 10.1007/s12575-009-9011-2
    [10] Jesus A, Georges F (2003) Automated detection of working area of peripheral blood smears using mathematical morphology. Anal Cell pathol 25: 37–49. doi: 10.1155/2003/642562
    [11] Goswami R, Pi D, Pal J, et al. (2015) Performance evaluation of a dynamic telepathology system (Panoptiq(™)) in the morphologic assessment of peripheral blood film abnormalities. Int J Lab hematol 37: 365–371. doi: 10.1111/ijlh.12294
    [12] D'Ambrosio MV, Bakalar M, Bennuru S, et al. (2015) Point-of-care quantification of blood-borne filarial parasites with a mobile phone microscope. Sci Transl Med 7: 286re4.
    [13] Manik S, Saini LM, Vadera N (2017) Counting and classification of White blood cell using Artificial Neural Network (ANN). IEEE Int Con Power Electron Intell Control Energy Syst 2017:1–5
    [14] Jia YQ, Shelhamer E, Donahue J, et al.(2014) Caffe: Convolutional architecture for fast feature embedding. Proc 22nd ACM Int Conf Multimedia, 675–678
    [15] Das DK, Maiti AK, Chakraborty C (2015) Automated system for characterization and classification of malaria infected stages using light microscopic images of thin blood smears. J Microsc-Oxford 257: 238–252.
    [16] Automatic Peripheral Blood Smear and Slide Scanner Device. Available from: http://www.mantiscope.com
    [17] Devi S, Singha J, Sharma M, et al. (2016) Erythrocyte segmentation for quantification in microscopic images of thin blood smears. J Intell Fuzzy Syst 4: 2847–2856.
    [18] Lee H, Chen YPP (2014) Cell morphology based classification for red cells in blood smear images. Pattern Recogn Lett 49: 155–161. doi: 10.1016/j.patrec.2014.06.010
    [19] Amin MM, Kermani S, Talebi A, et al. (2015) Recognition of acute lymphoblastic leukemia cells in microscopic images using K-means clustering and support vector machine classifier. J Med Signal Sensor 5: 49.
    [20] Li Y, Zhu R, Mi L, et al. (2016) Segmentation of white blood cell from acute Lymphoblastic Leukemia images using dual-threshold method. ComputMathMethod M 2016: 9514707.
    [21] Linder N,Tukki R, Walliander M, et al. (2014) A malaria diagnostic tool based on computer vision screening and visualization of Plasmodium falciparum candidate areas in digitized blood smears. PLos One 9: e104855.
    [22] Zhu C, Zheng Y, Luu K, et al. (2016) CMS-RCNN: contextual multi-scale region-based CNN for unconstrained face detection. In Bhanu B, Kumar A, Deep Learning for Biometrics, 3 Eds., Switzerland: Springer , 57–79.
    [23] Beucher S, Mathmatique CDM (1991) The watershed transformation applied to image segmentation. Scanning Microsc Suppl 6: 299–314
    [24] Dollar P, Wojek C, Schiele B, et al. (2012) Pedestrian detection: An evaluation of the state of the art. IEEE T Pattern Anal 34: 743. doi: 10.1109/TPAMI.2011.155
    [25] Redmon J, Divvala S, Girshick R, et al. (2016) You only look once: Unified, real-time object detection. Comput Vision Pattern Recognit 2016: 779–788.
    [26] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. IEEE conference on Compute Vison and Pattern Recogn, 770–778.
    [27] Govind D, Lutnick B, Tomaszewski JE, et al. (2018). Automated erythrocyte detection and classification from whole slide images. J Med Imag 5: 027501.
  • This article has been cited by:

    1. Manuel Adrian Acuña-Zegarra, Daniel Olmos-Liceaga, Jorge X. Velasco-Hernández, The role of animal grazing in the spread of Chagas disease, 2018, 457, 00225193, 19, 10.1016/j.jtbi.2018.08.025
    2. Lauren A. White, James D. Forester, Meggan E. Craft, Thierry Boulinier, Dynamic, spatial models of parasite transmission in wildlife: Their structure, applications and remaining challenges, 2018, 87, 00218790, 559, 10.1111/1365-2656.12761
    3. Bruce Y. Lee, Sarah M. Bartsch, Laura Skrip, Daniel L. Hertenstein, Cameron M. Avelis, Martial Ndeffo-Mbah, Carla Tilchin, Eric O. Dumonteil, Alison Galvani, Ricardo E. Gürtler, Are the London Declaration’s 2020 goals sufficient to control Chagas disease?: Modeling scenarios for the Yucatan Peninsula, 2018, 12, 1935-2735, e0006337, 10.1371/journal.pntd.0006337
    4. Vanessa Steindorf, Norberto Aníbal Maidana, Modeling the Spatial Spread of Chagas Disease, 2019, 81, 0092-8240, 1687, 10.1007/s11538-019-00581-5
    5. Britnee A. Crawford, Christopher M. Kribs-Zaleta, Gaik Ambartsoumian, Invasion Speed in Cellular Automaton Models for T. cruzi Vector Migration, 2013, 75, 0092-8240, 1051, 10.1007/s11538-013-9840-7
    6. Christopher M. Kribs, Christopher Mitchell, Host switching vs. host sharing in overlapping sylvaticTrypanosoma cruzitransmission cycles, 2015, 9, 1751-3758, 247, 10.1080/17513758.2015.1075611
    7. N. El Saadi, A. Bah, T. Mahdjoub, C. Kribs, On the sylvatic transmission of T. cruzi, the parasite causing Chagas disease: a view from an agent-based model, 2020, 423, 03043800, 109001, 10.1016/j.ecolmodel.2020.109001
    8. Cheol Yong Han, Habeeb Issa, Jan Rychtář, Dewey Taylor, Nancy Umana, Marc Choisy, A voluntary use of insecticide treated nets can stop the vector transmission of Chagas disease, 2020, 14, 1935-2735, e0008833, 10.1371/journal.pntd.0008833
    9. Daniel Olmos, Ignacio Barradas, David Baca-Carrasco, On the Calculation of R0
    R 0 Using Submodels, 2017, 25, 0971-3514, 481, 10.1007/s12591-015-0257-7
    10. Md. Abdul Hye, M. A. Haider Ali Biswas, Mohammed Forhad Uddin, Mohammad Saifuddin, Mathematical Modeling of Covid-19 and Dengue Co-Infection Dynamics in Bangladesh: Optimal Control and Data-Driven Analysis, 2022, 33, 1046-283X, 173, 10.1007/s10598-023-09564-7
    11. A. Omame, H. Rwezaura, M. L. Diagne, S. C. Inyama, J. M. Tchuenche, COVID-19 and dengue co-infection in Brazil: optimal control and cost-effectiveness analysis, 2021, 136, 2190-5444, 10.1140/epjp/s13360-021-02030-6
    12. Edem Fiatsonu, Rachel E. Busselman, Gabriel L. Hamer, Sarah A. Hamer, Martial L. Ndeffo-Mbah, Luisa Magalhães, Effectiveness of fluralaner treatment regimens for the control of canine Chagas disease: A mathematical modeling study, 2023, 17, 1935-2735, e0011084, 10.1371/journal.pntd.0011084
    13. H. Rwezaura, S.Y. Tchoumi, J.M. Tchuenche, Impact of environmental transmission and contact rates on Covid-19 dynamics: A simulation study, 2021, 27, 23529148, 100807, 10.1016/j.imu.2021.100807
    14. Malicki Zorom, Babacar Leye, Mamadou Diop, Serigne M’backé Coly, Metapopulation Modeling of Socioeconomic Vulnerability of Sahelian Populations to Climate Variability: Case of Tougou, Village in Northern Burkina Faso, 2023, 11, 2227-7390, 4507, 10.3390/math11214507
    15. Xuan Dai, Xiaotian Wu, Jiao Jiang, Libin Rong, Modeling the impact of non-human host predation on the transmission of Chagas disease, 2024, 00255564, 109230, 10.1016/j.mbs.2024.109230
    16. M. Adrian Acuña-Zegarra, Mayra R. Tocto-Erazo, Claudio C. García-Mendoza, Daniel Olmos-Liceaga, Presence and infestation waves of hematophagous arthropod species, 2024, 376, 00255564, 109282, 10.1016/j.mbs.2024.109282
  • Reader Comments
  • © 2018 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(8668) PDF downloads(2596) Cited by(7)

Figures and Tables

Figures(10)  /  Tables(1)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog