Decision trees and multi-level ensemble classifiers for neurological diagnostics

Herbert F. Jelinek; Jemal H. Abawajy; Andrei V. Kelarev; Morshed U. Chowdhury; Andrew Stranieri; Herbert F. Jelinek; Jemal H. Abawajy; Andrei V. Kelarev; Morshed U. Chowdhury; Andrew Stranieri

doi:10.3934/medsci.2014.1.1

AIMS Medical Science

2014, Volume 1, Issue 1: 1-12. doi: 10.3934/medsci.2014.1.1

Previous Article Next Article

Research article

Decision trees and multi-level ensemble classifiers for neurological diagnostics

1.
Centre for Research in Complex Systems and School of Community Health, Charles Sturt University, Albury, NSW, Australia;
2.
Biomedical Engineering, Khalifa University of Science, Technology and Research (KUSTAR), Abu Dhabi, United Arab Emirates;
3.
School of Information Technology, Deakin University, 221 Burwood Hwy, Melbourne, Victoria 3125, Australia;
4.
Centre for Informatics and Applied Optimisation, School of Science, Information Technology and Engineering, Federation University, P.O. Box 663, Ballarat, Victoria 3353, Australia

Received: 10 December 2013 Accepted: 23 June 2014 Published: 30 June 2014

Cardiac autonomic neuropathy (CAN) is a well known complication of diabetes leading to impaired regulation of blood pressure and heart rate, and increases the risk of cardiac associated mortality of diabetes patients. The neurological diagnostics of CAN progression is an important problem that is being actively investigated. This paper uses data collected as part of a large and unique Diabetes Screening Complications Research Initiative (DiScRi) in Australia with data from numerous tests related to diabetes to classify CAN progression. The present paper is devoted to recent experimental investigations of the effectiveness of applications of decision trees, ensemble classifiers and multi-level ensemble classifiers for neurological diagnostics of CAN. We present the results of experiments comparing the effectiveness of ADTree, J48, NBTree, RandomTree, REPTree and SimpleCart decision tree classifiers. Our results show that SimpleCart was the most effective for the DiScRi data set in classifying CAN. We also investigated and compared the effectiveness of AdaBoost, Bagging, MultiBoost, Stacking, Decorate, Dagging, and Grading, based on Ripple Down Rules as examples of ensemble classifiers. Further, we investigated the effectiveness of these ensemble methods as a function of the base classifiers, and determined that Random Forest performed best as a base classifier, and AdaBoost, Bagging and Decorate achieved the best outcomes as meta-classifiers in this setting. Finally, we investigated the meta-classifiers that performed best in their ability to enhance the performance further within the framework of a multi-level classification paradigm. Experimental results show that the multi-level paradigm performed best when Bagging and Decorate were combined in the construction of a multi-level ensemble classifier.

Keywords:

Citation: Herbert F. Jelinek, Jemal H. Abawajy, Andrei V. Kelarev, Morshed U. Chowdhury, Andrew Stranieri. Decision trees and multi-level ensemble classifiers for neurological diagnostics[J]. AIMS Medical Science, 2014, 1(1): 1-12. doi: 10.3934/medsci.2014.1.1

Related Papers:

[1]	Jingjing Hai, Xian Ling . Normalizer property of finite groups with almost simple subgroups. Electronic Research Archive, 2022, 30(11): 4232-4237. doi: 10.3934/era.2022215
[2]	Jidong Guo, Liang Zhang, Jinke Hai . The normalizer problem for finite groups having normal $ 2 $-complements. Electronic Research Archive, 2024, 32(8): 4905-4912. doi: 10.3934/era.2024225
[3]	Xiaoyan Xu, Xiaohua Xu, Jin Chen, Shixun Lin . On forbidden subgraphs of main supergraphs of groups. Electronic Research Archive, 2024, 32(8): 4845-4857. doi: 10.3934/era.2024222
[4]	Huani Li, Xuanlong Ma . Finite groups whose coprime graphs are AT-free. Electronic Research Archive, 2024, 32(11): 6443-6449. doi: 10.3934/era.2024300
[5]	Qiuning Du, Fang Li . Some elementary properties of Laurent phenomenon algebras. Electronic Research Archive, 2022, 30(8): 3019-3041. doi: 10.3934/era.2022153
[6]	Guowei Liu, Hao Xu, Caidi Zhao . Upper semi-continuity of pullback attractors for bipolar fluids with delay. Electronic Research Archive, 2023, 31(10): 5996-6011. doi: 10.3934/era.2023305
[7]	Xue Yu . Orientable vertex imprimitive complete maps. Electronic Research Archive, 2024, 32(4): 2466-2477. doi: 10.3934/era.2024113
[8]	Francisco Javier García-Pacheco, María de los Ángeles Moreno-Frías, Marina Murillo-Arcila . On absolutely invertibles. Electronic Research Archive, 2024, 32(12): 6578-6592. doi: 10.3934/era.2024307
[9]	Sang-Eon Han . Digitally topological groups. Electronic Research Archive, 2022, 30(6): 2356-2384. doi: 10.3934/era.2022120
[10]	Kiyoshi Igusa, Gordana Todorov . Picture groups and maximal green sequences. Electronic Research Archive, 2021, 29(5): 3031-3068. doi: 10.3934/era.2021025

Abstract

1. Introduction

The COVID-19 outbreak, which began in 2019, is a viral disease caused by severe acute respiratory syndrome coronavirus type 2 (SARS-COV-2) ^[1,2,3,4]. Most COVID-19 patients have pneumonia, and computed tomography (CT) scans are often used to help doctors diagnose pneumonia in the early stages of COVID-19 outbreaks ^[5,6,7]. Compared with CT, chest X-ray (CXR) is more widely used in clinical practice because it is easier, faster, and less expensive to perform. However, the sheer volume of CXR data and limited number of physicians cannot ensure that the system operates with maximum efficiency to save more patients ^{[8,9,10,11,12]}. A computer-aided system can play a certain auxiliary role ^[13,14], but its efficiency and accuracy cannot meet the requirements. Improving the accuracy of CXR image lesion identification is still a key issue that urgently needs to be solved.

Traditional methods usually use the mathematical calculation of regions and feature extraction to recognize and classify CXR images. Jaeger et al. ^[15] proposed an automated method for tuberculosis detection on posterior-anterior chest radiographs. Lung segmentation was modeled as an optimization problem, integrating lung boundaries, regions, shapes, and other attributes with tight segmentation contours and leakage in some areas. Hogeweg et al. ^[16] combined a texture anomaly detection system running at the pixel level with the clavicle detection system to suppress false-positive reactions, and the pathological structure changed after segmentation, which was detrimental to the judgment of pathology. Candemir et al. ^[17] proposed a robust lung segmentation method driven by nonrigid registration using a patient-specific adaptive lung model based on image retrieval to detect lung boundaries, achieving an average accuracy of 95.4% on the public JSRT database. However, opacity caused by fluid in the lung space prevents correct detection of lung boundaries. Although regional segmentation has been valued, there is a lack of corresponding supervision mechanisms. To train classifiers that can effectively monitor, Livieris et al. ^[18] proposed an SSL algorithm for tuberculosis CXR classification, which combines the individual predictions of three commonly used SSL algorithms applying the CST-voting integration principle and voting method. Statistical accuracy was relatively objective, but the process was too tedious. Faced with the problems of complex mathematical principles and low model robustness existing in traditional methods, more cost-effective methods are required in the CXR recognition field.

In recent years, deep learning models such as convolutional neural networks (CNN) ^{[19,20,21,22,23,24,25,26]} have been rapidly developed, and they have become the preferred technical means in the field of computer vision. Experts in the medical imaging field have also noted the rapid growth and impact of CNNs. For example, Irfan et al. ^[27] developed a hybrid deep neural network (HDNN) that uses computed tomography (CT) and X-ray imaging to predict risks. The classification accuracy reached a very high level by training with a dataset on the web along with a regular dataset. The CoVIRNet method proposed by Almalki et al. ^[28] can automatically diagnose COVID-19 patient images using chest radiographs and alleviate the overfitting problem owing to the small size of the COVID-19 dataset. Most CXR disease recognition methods based on deep-learning technology can be divided into two types. The first type uses a CNN for image segmentation and classification. Shen et al. ^[26] extracted the symptom part of an image as a block and inputted several different CNNs, and the features obtained were spliced into vectors as the final result. Discriminant features were extracted from alternately stacked layers to capture the heterogeneity of pulmonary nodules; however, the location of the disease could not be directly represented. Rajpurkar et al. ^[29] developed a 121-layer CNN named CheXNet, which was tested on the ChestX-Ray14 large pneumonia data set containing 14 diseases and achieved an accuracy of more than 0.7 in the classification of diseases. However, the network stacking method is excessively simple, and the image texture is not used more thoroughly. The second type of methods denoise the data to enhance the recognition effect of the other algorithms. Ucar and Korkmaz ^[30] proposed an architecture based on SqueezeNet to fine-tune COVID-19 diagnosis through raw data enhancement, Bayes optimization, and validation to achieve high accuracy in categorizing COVID-19, pneumonia, and normality. Jiang et al. ^[31] proposed a residual CNN for denoising COVID-19 images. The residual connection and attention mechanism were used to make the network pay more attention to the texture details of the CXR images, and the effect of the denoised images was significantly improved in the COVID-19 recognition task.

Waheed et al. ^[32] used an adversarial network model to synthesize CXR images, which allowed the model to rely on external information to improve the sample quality, thus increasing the number of images of COVID-19 symptoms. However, the recognition task was limited to classification, and the location of the symptoms was not accurately obtained. In a more specialized work, Jaiswal et al. ^[33] applied the target detection algorithm to the RSNA pneumonia data set, and obtained an optimal accuracy by fusing the prediction boundary boxes of multiple models. Because of its large memory occupation and use of a single category of dataset, the detection results of diversified diseases cannot be determined. This provides the motivation and references for our work.

Although the accuracy of the above methods continues to improve, there are still some obvious shortcomings: 1) most of them are classification and segmentation methods, and lack intuitive target boxes to directly indicate the location of symptoms, so the evaluation efficiency needs to be improved. 2) The ideal accuracy usually requires the fusion of the detection effects of multiple models, which occupies a large memory and is not realistic in practical applications. 3) The CXR data volume of COVID-19 is small, and it is a single category, so generalization results cannot be obtained. To solve the above problems, a lesion detection method based on a scalable attention residual CNN is proposed in this paper. A variety of convolution kernel sizes are used to obtain a variety of resolutions, and adaptive global attention is used to extract the spatial features of each resolution and connect them. A feature fusion method with different tree-structure depths is designed. Finally, the attention mechanism is used to fuse the spatial and channel information. All the effects were tested in a single model using the VinDr-CXR dataset. The main contributions of this study are summarized as follows:

● To improve the ability of the deep learning model in CXR target detection, we propose a CNN model and construct a structure based on the model that can effectively improve the accuracy of CXR location detection and improve the sensitivity of the model to CXR features from the aspects of attention and feature fusion.

● We use zero-based training to customize our CNN model for the data set, and the training effect of a single model can outperform most of the classical deep learning models. The development of CXR data sets containing target location information became the focus, proving the necessity and advance of our work and providing the corresponding references for future work.

● The three modules designed in this study, multi-convolution feature fusion block (MFFB), tree-structured aggregation module (TSAM), and scalable channel and spatial attention (SCSA), can effectively improve the detection effect of the deep learning model in CXR, and the accuracy increases from 12.83 to 15.75% after the addition of the modules, higher than that obtained by the existing mainstream target detection model.

2. Materials and methods

We summarized our research into three parts: encoder, multiple feature blocks, and decoder, simplifying the complex model structure. The encoder in Figure 1 represents the scalable attention residual CNN (SAR-CNN), whereas the decoder is a simple convolutional layer. As many as four feature blocks, from Features 1–4, are input into the decoder in the form of pyramids, and finally, the positioning of the detection frame and judgment of the lesion category are performed. In the following sections, we introduce the proposed CXR target detection network, SAR-CNN, in detail, including the MFFB, TSAM and SCSA modules. These modules are independent and embeddable and can be migrated to common convolutional networks. The MFFB is designed to interpret CXR image information from a multi-resolution perspective, whereas the TSAM utilizes the characteristics of the tree structure to perform left-right branch and multi-level feature fusion, and finally the SCSA is used for spatial and channel attention integration. Residual modules ^[34] are used in the rest of the network to improve its learnability. Details of the SAR-CNN training are covered in the next section.

Figure 1. Diagram of the proposed research.

DownLoad: Full-Size Img PowerPoint

2.1. Network architecture

The proposed SAR-CNN network structure composed of MFFB, TSAM and SCSA is shown in Figure 2. The VGG ^[35] straight-cylinder module stacking method was adopted to improve the embeddability of modules. The predictive image (input) was input to the CNN in 512 $ \times $ 512 size. Four feature graphs are output and sent for feature post-processing through the modules mentioned in the following sections. It can be observed from the CAM graph that the four feature maps have different degrees of performance in characterization feature fitting. The neck and head parts in RefineDet ^[36] have both first-stage and second-stage advantages, so we replace the backbone network of RefineDet with SAR-CNN and finally obtain the final detection result (output) of our predicted image. Such a framework will not only help improve the localization performance, but also, by means of attention, provide a way to explain visually the model decisions, both of which are important for the clinical deployment of deep learning models.

Figure 2. Architecture structure of scalable attention residual CNN (SAR-CNN).

DownLoad: Full-Size Img PowerPoint

2.2. Multi-convolution features fusion block

As shown in Figure 3, we propose an MFBB that is different from the simple connection and fusion of existing algorithms. $ M_{1} $, $ M_{2} $ and $ M_{3} $ were obtained by 3 $ \times $ 3, 5 $ \times $ 5 and 7 $ \times $ 7 convolution kernel extractions of $ M_{i n} \in R^{\mathrm{C} \times \mathrm{H} \times \mathrm{W}} $, where $ M_{1} \in R^{\mathrm{C} \times \mathrm{H} \times \mathrm{W}} $ was extracted using the same mode convolution. We used ECA-Net ^[37], which is a simple and effective attention processing method. The difference is that we used the convolution of more receptive fields to extract richer feature scales and used global average pooling (GAP) to extract features along channel dimensions from $ M_{2} $ and $ M_{3} $ with different resolutions. Next, 1D convolution was used for adaptive feature extraction. After the sigmoid, channel attention $ S_{2} \in R^{\mathrm{1} \times \mathrm{1} \times \mathrm{C}} $ and $ S_{3} \in R^{\mathrm{1} \times \mathrm{1} \times \mathrm{C}} $ were obtained, and the fusion function F(.) defined in Formula (2.1) is adopted. The final output feature of the module $ M_{out} \in R^{\mathrm{C} \times \mathrm{H} \times \mathrm{W}} $ was obtained.

$\begin{equation} \begin{aligned} M_{out} & = F\left(M_{1} \otimes S_{2} \otimes S_{3}\right) \\ & = \sigma\left(\text{ BatchNorm }\left(\text{ Conv }^{3 \times 3}\left(M_{1} \otimes S_{2} \otimes S_{3}\right)\right)\right), \end{aligned} \end{equation}$ $

(2.1)

Figure 3. Structure of the multi-convolution feature fusion block.

DownLoad: Full-Size Img PowerPoint

where F(.) represents a combination of three layers: the same mode 3 $ \times $ 3 convolution layer, the BatchNorm layer ^[38], and the nonlinear activation function ReLU ^[39]. $ \otimes $ is the element-wise multiplication, $ \sigma $ represents the ReLU layer, and $ \operatorname{Conv}^{3 \times 3} $ is the same-mode convolution layer of 3 $ \times $ 3 size. The convolution layer is used to fuse feature graphs and channel attention of two different resolutions to avoid the feature mismatch of related images caused by simple multiplication. BatchNorm is a normalization method that solves the phenomenon of inconsistent distribution of input data, highlights the relative difference in distribution between them, and speeds up training. The ReLU layer adds a nonlinear relationship to the feature layer to avoid gradient disappearance and over-fitting, which ensures that our neural network can complete complex tasks.

2.3. Tree structured aggregation module

We designed the TSAM by summarizing the feature fusion methods of DenseNet ^[40] and FPN ^[41] and drawing inspiration from the DLA structure ^[42], as shown in Figure 4.

Figure 4. The internal relationship between TSAM, dense connections and feature pyramids: (a) structure of the TSAM. (b) Dense connections. (c) feature pyramids. The proposed module (a) has the advantages of both (b) and (c), but avoids the problems of (b) over-intensive fusion resulting in excessive use of memory and (c) over-simple fusion.

DownLoad: Full-Size Img PowerPoint

In contrast to DLA, we adopt a fixed number of layers and use feature layers with more details to avoid the overfitting problems caused by excessively deep iterative networks. Each structural unit corresponds to a residual module. For simplicity and resource saving, this residual module contains only two 3 $ \times $ 3 convolution and BatchNorm layers. Each aggregation node adopts a 3 $ \times $ 3 convolution, BatchNorm layer, and ReLU, and the features of the left and right branches are fused to obtain the feature graph:

$\begin{equation} \begin{aligned} A\left(x_{1}, x_{2}\right) & = \sigma\left(Batch Norm \left(W_{1} x_{1}+W_{2} x_{2}+b\right)\right) \\ & = \sigma\left(Batch Norm \left(W_{1} x_{1}+W_{2}\left(W_{0} x_{1}+b_{0}\right)+b\right)\right), \end{aligned} \end{equation}$ $

(2.2)

where $ x_{1} $ and $ x_{2} $ represent the left- and right-branch features of the binary tree before fusion, respectively. $ \sigma $ represents a nonlinear ReLU. W and b respectively represent the weights and offsets of convolution, and $ x_{2} $ is obtained by $ x_{1} $ through a structural unit. The TSAM combines layers of different depths to learn richer combinations that span more feature layers.

2.4. Scalable channel and spatial attention (SCSA)

This module utilizes both the channel and spatial dimensions of attention, and we use SCSA as a transitional stage between the two modules, as shown in Figure 5. In the channel dimension module, $ M_{input} $ is input into two paths to obtain $ M_{S} \in R^{\mathrm{1} \times \mathrm{H} \times \mathrm{W}} $ and $ M_{C} \in R^{\mathrm{C} \times \mathrm{1} \times \mathrm{1}} $, respectively, and $ M_{SCSA} \in R^{\mathrm{C} \times \mathrm{H} \times \mathrm{W}} $ is obtained by combining them. Then, residual fusion is performed between $ M_{input} \in R^{\mathrm{C} \times \mathrm{H} \times \mathrm{W}} $ $ M_{SCSA} $, respectively.

$\begin{equation} \begin{aligned} M_{\text{output }} & = M_{\text{input }}+M_{\text{input }} \otimes M_{\text{SCSA }} \\ & = M_{\text{input }}+M_{\text{input }} \otimes \sigma\left(M_{S}+M_{C}\right), \end{aligned} \end{equation}$ $

(2.3)

Figure 5. The schematic diagram of SCSA.

DownLoad: Full-Size Img PowerPoint

where $ \sigma $ is the sigmoid function and $ \otimes $ is element-wise multiplication. In the channel attention module, the channel information encoded in two different ways is obtained through global average pooling and global maximum pooling. In addition, a full-connection layer is set as Share FC to interact with the information of the two channels, and the feature size obtained is $ R^{C / r \times 1 \times 1} $. After the BatchNorm layer, the two are added.

$\begin{equation} M c = B N(\text{ ShareFC }(\text{ GAP }(M_{\text{input }})))+B N(\text{ ShareFC }(\text{ GMP }(M_{\text{input }}))), \end{equation}$ $

(2.4)

where BN is the batch normalization layer. The spatial-dimension module compresses $ M_{input} \in R^{\mathrm{C} \times \mathrm{H} \times \mathrm{W}} $ to $ R^{C / r \times H \times W} $ through a layer of 1 $ \times $ 1 convolution. A layer of 3 $ \times $ 3 dilated convolution is set to expand the receptive field to utilize more contextual information, and then a layer of the spatial attention map $ M_{S} \in R^{\mathrm{1} \times \mathrm{H} \times \mathrm{W}} $ is obtained through a layer of 1 $ \times $ 1 convolution. Finally, a batch normalization layer is used to adjust the search space:

$\begin{equation} M_{S} = B N\left(\operatorname{Conv}^{1 \times 1}\left(\operatorname{Conv}^{3 \times 3}\left(\operatorname{Conv}^{1 \times 1}\left(M_{\text{input }}\right)\right)\right)\right), \end{equation}$ $

(2.5)

where BN is the batch normalization layer, Conv is the convolution layer, and the superscript is the size of the convolution kernel. In addition, the SCSA module is followed by MaxPooling of a layer with a size of 2 $ \times $ 2 and a stride of 2 as the lower sampling layer. SCSA has a working principle similar to that of BAM ^[43], but it uses the different global pooling features of GAP and GMP to extract channel attention and has more types of channel feature maps. Compared to CBAM ^[44], the original features were added after the fusion of the channel and spatial attention, rather than in sequence. SCSA prefers to combine the advantages of BAM and CBAM, and discard unnecessary parts.

2.5. Other modules

We observed that the residual structure plays a key role in medical image detection tasks. Therefore, in addition to the three modules proposed in this study, residual modules were used in the rest of the network to sort out the features obtained by fitting the above three modules and further improve the robustness of the network. The residual module is derived from ResNet, which adds jump connections to the convolutional module to solve the problems of gradient disappearance and gradient explosion during the training of deep neural networks, as shown in Figure 6. In addition, we observed that the residual module could correlate features of different scales. In special cases, the disease regions in CXR will cross and even overlap, and the use of a residual module can help correlate features of overlapping target regions.

Figure 6. Resblock structure diagram.

DownLoad: Full-Size Img PowerPoint

3. Experiments and results

3.1. Dataset

As the COVID-19 dataset is not publicly available on a large scale, it is too small and of poor quality, even if it is partially curated and annotated. For the above reasons, we selected the common CXR dataset that meets the requirements, that is, the VinDr-CXR dataset. VinDr-CXR is a chest radiography dataset published by the Vingroup Big Data Institute (VinBigdata) ^[45] and is currently the largest public CXR dataset with radiologist-generated annotations in both the training and test sets. Collected from the Hospital 108 (H108) and the Hanoi Medical University Hospital (HMUH), 18000 CXR data were manually annotated by 17 professional radiologists. Furthermore, the VinDr-CXR dataset was divided into a 15,000 image training set and a 3000 image test set. The training set was independently annotated by three doctors and the test set was jointly annotated by five doctors. Because of the low number of images in eight of the 22 categories containing local location information, we incorporated these eight minor categories into the other lesions, and thus our task was defined as the target detection problem of 14 lesions. The names and quantity statistics of the different categories in the VinDr-CXR dataset are shown in Figure 7.

Figure 7. Quantity statistics of different categories in VinDr-CXR dataset.

DownLoad: Full-Size Img PowerPoint

3.2. Implementation details

The proposed method was trained and tested on the Pytorch framework ^[46], and the relative hyperparameters of the network were based on ScratchDet ^[47], with a learning rate of 0.05, using SGD with 0.0005 weight decay and 0.9 momentum. The batch size was set to eight, and the training wheel number adaptive adjustment mechanism was adopted. The experiment was interrupted when the accuracy of more than 150 epochs did not exceed the highest accuracy for five consecutive epochs. The number of decreased rounds was 50, 100 and 150, respectively, and the reduction at each time was 1/10. The resolution of the image input was 512 $ \times $ 512, and the SSD was used for data enhancement (random expansion, clipping, inversion, random photometric distortion, etc.). All the convolution layers were initialized using the Xavier uniform method. The comparative experimental algorithms were tested using pre-training weights in the Pytorch framework, and the number of training rounds was uniformly set to 300. The first 20 rounds frozen the trunk network. In addition, each round was verified and the weight was saved. Every 50 rounds, a weight was selected for the accuracy test, and the highest accuracy was used as the comparative experimental data.

3.3. Loss function

Network training involves loss functions and optimizers using RefineDet parameter requirements. To imitate the prediction process of the two-stage target detection algorithm, the loss function $ L_{SAR} $ consists of $ L_{A} $ and $ L_{B} $, where $ L_{A} $ corresponds to the stage of the position and size of the target in the returned image and $ L_{B} $ determines the category of the target according to the returned target. $ N_{ARM} $ and $ N_{ODM} $ in the formula are the numbers of positive anchors in ARM and ODM, respectively. In particular, i is the index of each anchor box and the smooth L1 loss is used as $ L_{S} $, and $ s_{i} $ is used to judge whether the predicted category is consistent with the ground truth label; the match is 1; otherwise, it is 0, and the ground truth is represented by g^*i.

In Formula (3.1), $ p_{i} $ and $ x_{i} $ are the probability and corresponding position coordinates of the target in ARM anchor boxes i, respectively, and $ L_{b} $ uses the cross-entropy loss over two classes as a dichotomous loss function.

$\begin{equation} L_{A} = \frac{1}{N_{A R M}}\left(\sum L_{b}\left(p_{i}, S_{i}\right)+\sum s_{i} L_{s}\left(x_{i}, g{ }^{*}\right)\right). \end{equation}$ $

(3.1)

In Formula (3.2), $ c_{i} $ and $ t_{i} $ are the categories of the ODM anchor boxes i and the corresponding coordinates of the bounding box, respectively, whereas $ l_{i} $ is the ground truth class label of the anchor. $ L_{m} $ uses the softmax loss over multiple class confidences as a multiclass classification loss.

$\begin{equation} L_{B} = \frac{1}{N_{O D M}}\left(\sum L_{m}\left(c_{i}, l_{i}\right)+\sum S_{i} L_{s}\left(t_{i}, g^{*}{ }_{i}\right)\right). \end{equation}$ $

(3.2)

The final value of the entire loss function can be obtained by adding the values of the two aforementioned loss functions.

$\begin{equation} L_{S A R} = L_{A}+L_{B}. \end{equation}$ $

(3.3)

3.4. Ablation study

3.4.1. Backbone architecture

To prove the validity of the trunk network we designed, we performed several comparative experiments on the trunk network: 1) VGG-16 in RefineDet source code; 2) on the basis of 1), a BN layer was added to each convolutional layer as a combination; and 3) the original VGG-16 was replaced with other common backbone networks. The experimental results are listed in Table 1. The precision of the trunk network designed by us was higher than that of other experiments. Although the number of parameters was reduced after the replacement of some trunk networks, the corresponding precision was too low and did not have the functional effect required by the task. After the analysis, we believe that although VGG, ResNet, DLA, and other trunk networks are frequently used as a means to improve the task effect, they are essentially designed for classification, which is different from our CXR target detection task, resulting in poor performance. The trunk network we designed increased the number of parameters within the allowed range, thus obtaining a 15.75% mAP, which is higher than the accuracy of the other versions, proving the effectiveness of our network in handling this task.

Table 1. Performance comparison of different backbone networks.

Backbone	mAP@0.5(%)	Inference speed (fps)	Params (M)
VGG-16	12.83	9.96	34.27
VGG-16+BN	13.67	8.50	34.28
ResNet-18	8.05	4.62	22.75
ResNet-34	8.25	3.32	32.85
DLA-60	8.54	7.98	33.90
DLA-102	9.00	5.86	45.30
SAR-CNN (Ours)	15.75	2.60	56.48

| Show Table

DownLoad: CSV

3.4.2. Module contribution

According to the modules proposed in Section 2, we conducted the corresponding combined experiments to show the contribution and role of each module to the whole. A tick indicates that the network applies to the module. The experimental results are listed in Table 2. For single-module embedding, two-module combination, or three-module embedding, the accuracy can be further improved. To fit the information of the finishing module, ResBlock was added, and the accuracy of the network was finally improved to 15.75%. We believe that the three specially designed modules improve the robustness of the overall network for the following reasons: First, features of multiple resolutions are conducive to the formation of relatively rich image information, especially for sites with subtle lesions. Second, a simple and effective topological structure is needed to fuse medical image features with insufficient information. Third, medical images require the network to apply attention mechanisms from different angles.

Table 2. Impact of the different components.

Component	SAR-CNN
MFFB	$\checkmark$	$\checkmark$	$\checkmark$	$\checkmark$		$\checkmark$
TSAM	$\checkmark$	$\checkmark$	$\checkmark$		$\checkmark$		$\checkmark$
SCSA	$\checkmark$	$\checkmark$		$\checkmark$	$\checkmark$			$\checkmark$
ResBlock	$\checkmark$
mAP@0.5(%)	15.75	14.20	13.18	13.49	13.15	13.11	13.12	13.08	12.83

| Show Table

DownLoad: CSV

3.5. Model comparisons

To prove the superiority of our method, we used the PASCAL VOC 2010 standard ^[48] and IoU > 0.4 (0.5, 0.6, 0.7 and 0.8), and compared the mAP index with the mainstream target detection model. The experimental results are presented in Table 3. In CenterNet, owing to the special setting of non-maximum suppression (NMS), the value of IoU does not affect the accuracy of the algorithm; therefore, it is uniformly 0.5. It can be observed that the performance of mainstream target detection algorithms on CXR image datasets is not significant, with most attaining approximately 11% and EfficientDet even less than 8%. The special design of RefineDet allows it to perform better than most models, 12.83%. Yolov3 also shows excellent detection ability in this task, but it is still lower than that of our algorithm under different IoU standards. The accuracy of SAR-CNN is improved by 2.92% compared with the RefineDet benchmark, which exceeds most mainstream algorithms and is crucial for assisting physicians in detection, proving the effectiveness of our module in the field of medical image target detection.

Table 3. Detection results of different methods on the VinDr-CXR test set.

Methods	Backbone	mAP@0.5 (%)	mAP@0.6 (%)	mAP@0.7 (%)	mAP@0.8 (%)
SSD	VGG-16	10.01	9.47	8.80	7.66
Faster RCNN	VGG-16	11.27	10.34	8.74	6.49
Faster RCNN	ResNet-50	10.47	9.58	8.10	5.93
EfficientDet	EfficientNet-b5	7.37	7.08	6.57	5.96
RetinaNet	ResNet-50	11.24	10.55	9.28	7.16
Yolov3	DarkNet-53	15.21	14.56	13.17	11.25
CenterNet	Hourglass	11.76	–	–	–
CenterNet	ResNet-50	11.38	–	–	–
RefineDet	VGG-16	12.83	12.33	10.66	7.54
Ours	SAR-CNN	15.75	15.34	14.57	11.87

| Show Table

DownLoad: CSV

3.6. Other comparisons

3.6.1. Resolution differences

As can be observed in Table 4, the SAR-CNN maintains an accuracy of more than 10% for each size. In images with a resolution of 320, the SAR-CNN is slightly less accurate than the benchmark algorithm, but the accuracy increases from 448. We believe this is because images with a resolution that is too low struggle to provide sufficient lesion features for network fitting, which results in weak performance. The benefit of our approach progressively became apparent as the resolution increased, peaking at 16.25% when the image resolution was 768 $ \times $ 768.

Table 4. Comparison of algorithm accuracy for different resolution images.

Resolution	Base (RefineDet)	SAR-CNN (Ours)
320 $ \times $ 320	12.02%	11.54%
448 $ \times $ 448	12.59%	15.12%
512 $ \times $ 512	12.83%	15.75%
640 $ \times $ 640	13.87%	15.32%
768 $ \times $ 768	13.59%	16.25%

| Show Table

DownLoad: CSV

3.6.2. Effect of separate categories

We used the PASCAL VOC 2010 dataset to evaluate the criterion, that is, the highest per-category accuracy (AP value) obtained during training, set the IoU value to 0.4, and compared it with the benchmark. As listed in Table 5, the benchmark performs slightly higher than our algorithm in detecting the categories of aortic enlargement, cardiomegaly, pulmonary fibrosis and infiltration, but all other categories exceeded the benchmark, and the benchmark value in pneumothorax was $ - $100 (i.e., failed to detect this category). This illustrates the effectiveness of our targeted design network structure in enhancing the fitting of lung X-ray images.

Table 5. AP values for each category under the criteria of using PASCAL VOC 2010.

Method	Aortic Enlargement	Atelectasis	Calcification	Cardiomegaly	Consolidation	ILD	Pleural Effusion
Base	56.56%	10.20%	0.35%	44.18%	15.48%	16.50%	38.72%
SAR-CNN	54.46%	16.20%	9.13%	42.66%	25.97%	24.93%	39.39%
Method	Infiltration	Lung Opacity	Nodule Mass	Other Lesion	Pleural Thickening	Pneumothorax	Pulmonary Fibrosis
Base	28.21%	17.36%	12.18%	0.24%	16.03%	-	24.06%
SAR-CNN	22.92%	22.89%	13.13%	9.63%	19.44%	19.95%	23.32%

| Show Table

DownLoad: CSV

3.6.3. Training set performance

We present the performance effects of the benchmark algorithm and SAR-CNN on the training set in the form of mean AP and loss. The accuracy of the algorithm using the PASCAL VOC 2010 dataset evaluation criterion was tested every five epochs, the APs of each category were obtained, and the final mean AP was obtained. Figure 8(a) shows that the overall mAP of our algorithm was higher than that of the benchmark algorithm during the training process, and the fit of the features was always better. In Figure 8(b), the loss of the benchmark algorithm decreases slowly at the beginning of the training phase, and the loss is always higher than that of our algorithm at a later stage of training (e.g., iteration is in the range of 6500–7000), indicating that our algorithm converges faster than the benchmark algorithm.

Figure 8. (a) and (b) represent the comparison of the trend of mean AP and loss when SAR-CNN is trained with the benchmark algorithm using images with a resolution with 512.

DownLoad: Full-Size Img PowerPoint

3.7. Analysis of detection results

3.7.1. Detection box effects

Figure 9 shows a comparison between the detection effect of the benchmark model RefineDet and that of SAR-CNN (image resolution 512 $ \times $ 512), where Figure 9(a) is the position of the real frame on the image in the test set, Figure 9(b) is the performance effect of benchmark model RefineDet in this task, and Figure 9(c) is the performance effect of our proposed model. It can be accurately observed that RefineDet is unable to detect the diseased areas at the margins of the lungs in the results from columns 1, 2 and 5 on the left. RefineDet also has the problem of error verification. For example, the error results of lung opacity, ILD and calcification were detected in the second and fourth columns on the left. However, the number, category, and position of the predicted boxes of the SAR-CNN are close to those of the label, and the confidence of the label box is as high as 0.99. In addition, the SAR-CNN can still maintain its detection accuracy under the complex intersection and superposition of multiple lesion areas, such as the detection results from the fourth and sixth columns on the left. In the case of high confidence, the SAR-CNN labeling frame is even more concise and intuitive than the real frame, for example, in the detection result from the fifth column on the left.

Figure 9. Comparison of detection effects between RefineDet and SAR-CNN: (a) Ground truth boxes position of test set, (b) RefineDet post-training testing effect, (c) SAR-CNN post-training testing effect.

DownLoad: Full-Size Img PowerPoint

3.7.2. Application of algorithms

As shown in Figure 10, we produced comparative maps of the lesion areas for the four sets of images based on the labels of the dataset and the recommendations of the physician. The four sets of images contained the original image, the focus area judged by the doctor, and the algorithm detection effect. The area judged by the doctor overlaps to a high degree with the results of our algorithm, and our detection frame does not obscure the original area, which is conducive to secondary analysis and examination.

Figure 10. Comparison of the original image with the results of physician detection and SAR-CNN results.

DownLoad: Full-Size Img PowerPoint

4. Conclusions

In this paper, we propose a new SAR-CNN algorithm for disease localization detection in CXR images to improve the efficiency of physicians in diagnosing chest image. Three unique modules were proposed to help the CNN improve the sensitivity of the model to CXR features in terms of attention and feature fusion, and a training strategy from scratch was used to make the network more targeted. We tested the mAP, AP per category, and loss of the training set for different IoU values, and concluded that our algorithm is superior to mainstream target detection models. Using target detection technology to carry out AI medical research on CXR medical images can promote not only the application of deep learning technology and computer-aided diagnosis systems in the field of imaging examination but also the innovative intersection of information fields and biomedical research work. More importantly, it can reduce the workload of doctors and help promote the implementation of a national plan for the prevention and treatment of COVID-19, which has important theoretical and practical significance.

During our experiments, we found that lung X-ray images of COVID-19 for target detection were far less mature than those in the large CXR dataset and the corresponding target detection labels had less content. Most of these labels only indicate that the image has a certain disease classification or segmentation area. A mapping needs to be established between the CXR dataset from the previous CXR dataset and the COVID-19 dataset, similar to the approach of pre-training using a specific algorithm. However, we performed a targeted design in the network structure of feature processing and combined it with our proposed method for a comprehensive evaluation.

Acknowledgments

This work is supported by the National Key R & D Program of China (No. 2021ZD0111000), Hainan Provincial Natural Science Foundation of China (621MS019), Major Science and Technology Project of Haikou (Grant: 2020-009), Innovative Research Project of Postgraduates in Hainan Province (Qhyb2021-10), Guangdong University Student Science and Technology Innovation Cultivation Special Fund Support Project (pdjh2023 a0243) and Key R & D Project of Hainan province (Grant: ZDYF2021SHFZ243).

Conflict of interest

The authors declare there is no conflict of interest.

References

[1]	Colagiuri S, Colagiuri R, Ward J (1998) National diabetes strategy and implementation plan. Canberra: Paragon Printers.
[2]	Pop-Busui R, Evans GW, Gerstein HC, et al. (2010) The ACCORD Study Group. Effects of cardiac autonomic dysfunction on mortality risk in the Action to Control Cardiovascular Risk in Diabetes (ACCORD) Trial. Diab Care 33: 1578-84.
[3]	Spallone V, Ziegler D, Freeman R, et al. (2011) Cardiovascular autonomic neuropathy in diabetes: clinical impact, assessment, diagnosis, and management. Diab Metab Res Rev 27:639-53. doi: 10.1002/dmrr.1239
[4]	Jelinek HF, Imam HM, Al-Aubaidy H, et al. (2013) Association of cardiovascular risk using nonlinear heart rate variability measures with the Framingham risk score in a rural population. Front Physiol 4: 186.
[5]	Ziegler D (1994) Diabetic cardiovascular autonomic neuropathy: prognosis, diagnosis and treatment. Diab Metabol Rev 10: 339-83. doi: 10.1002/dmr.5610100403
[6]	Gerritsen J, Dekker JM, TenVoorde BJ, et al. (2001) Impaired autonomic function is associated with increased mortality, especially in subjects with diabetes, hypertension or a history of cardiovascular disease. Diab Care 24: 1793-8. doi: 10.2337/diacare.24.10.1793
[7]	Johnston SC, Easton JD (2003) Are patients with acutely recovered cerebral ischemia more unstable? Stroke 4: 24-46.
[8]	Ko SH, Kwon HS, Lee JM, et al. (2006) Cardiovascular autonomic neuropathy in patients with type 2 diabetes mellitus. J Korean Diab Assoc 30: 226-35. doi: 10.4093/jkda.2006.30.3.226
[9]	Agelink MW, Malessa R, Baumann B, et al. (2001) Standardized tests of heart rate variability: normal ranges obtained from 309 healthy humans, and effects of age, gender and heart rate. Clin Auton Res 11: 99-108. doi: 10.1007/BF02322053
[10]	Ewing DJ, Martyn CN, Young RJ, et al. (1985) The value of cardiovascular autonomic functions tests: 10 years experience in diabetes. Diab Care 8: 491-8. doi: 10.2337/diacare.8.5.491
[11]	Pumprla J, Howorka K, Groves D, et al. (2002) Functional assessment of HRV variability: physiological basis and practical applications. Int J Cardiol 84: 1-14. doi: 10.1016/S0167-5273(02)00057-8
[12]	Stern S, Sclarowsky S (2009) The ECG in diabetes mellitus. Circulation 120: 1633-6. doi: 10.1161/CIRCULATIONAHA.109.897496
[13]	Reilly RB, Lee TC (2010) Electrograms (ECG, EEG, EMG, EOG). Technol Heal Care 18:443-58.
[14]	Baumert M, Schlaich MP, Nalivaiko E, et al. (2011) Relation between QT interval variability and cardiac sympathetic activity in hypertension. Am J Physiol Heart Circ Physiol 300: H1412-7. doi: 10.1152/ajpheart.01184.2010
[15]	Kelarev AV, Dazeley R, Stranieri A, et al. (2012) Detection of CAN by ensemble classifiers based on ripple down rules. Lect Notes Artif Int 7457: 147-59.
[16]	Fang ZY, Prins JB, Marwick TH (2004) Diabetic cardiomyopathy: evidence, mechanisms, and therapeutic implications. Endocrinol Rev 25: 543-67. doi: 10.1210/er.2003-0012
[17]	Huda S, Jelinek HF, Ray B, et al. (2010) Exploring novel features and decision rules to identify cardiovascular autonomic neuropathy using a Hybrid of Wrapper-Filter based feature selection. Marusic S, Palaniswami M, Gubbi J, et al, editors. Intelligent sensors, sensor networks and information processing, ISSNIP 2010. Sydney: IEEE Press, 297-302.
[18]	Witten H, Frank E (2011) Data mining: practical machine learning tools and techniques with java implementations. 3Eds, Sydney: Morgan Kaufmann, 2011.
[19]	Hall M, Frank E, Holmes G, et al. (2009). The WEKA data mining software: an update. SIGKDD Explor 11(1): 10-8. doi: 10.1145/1656274.1656278
[20]	Freund Y, Mason L (1999) The alternating decision tree learning algorithm. Proceedings of the sixteenth international conference on machine learning, 124-33.
[21]	Kotsiantis SB (2007) Supervised machine learning: A review of classification techniques. Informatica 31: 249-68.
[22]	Kohavi R (1996) Scaling up the accuracy of Naive-Bayes classifiers: a decision-tree hybrid. Proceedings of the 2nd international conference on knowledge discovery and data mining,202-7.
[23]	Breiman L, Friedman JH, Olshen RA, et al. (1984) Classification and regression trees. California: Wadsworth International Group.
[24]	Dazeley R, Yearwood J, Kang B, et al. (2010) Consensus clustering and supervised classification for profiling phishing emails in internet commerce secu+rity. In: Kang BH, Richards D, editors. Knowledge management and acquisition for smart systems and services, PKAW 2010. Daegu: Springer Verlag, 235-46.
[25]	Yearwood J, Webb D, Ma L, et al. (2009) Applying clustering and ensemble clustering approaches to phishing profiling. Proceedings of the 8th Australasian data mining conference, AusDM 2009. Curr Res Prac Inf Technol 101: 25-34.
[26]	Kang B, Kelarev A, Sale A, et al. (2006) A new model for classifying DNA code inspired by neural networks and FSA. Advances in Knowledge Acquisition and Management, 19th Australian Joint Conference on Artificial Intelligence, AI06. Lect Notes Comp Sci 4303:187-98.
[27]	Breiman L (1996) Bagging predictors. Mach Learn 24: 123-40.
[28]	Freund Y, Schapire R (1996) Experiments with a new boosting algorithm. Proceedings of 13th International Conference of Machince Learning 148-56.
[29]	Webb G (2000) Multiboosting: A technique for combining boosting and wagging. Mach Learn 40: 159-96. doi: 10.1023/A:1007659514849
[30]	Wolpert D (1992) Stacked generalization. Neural Networks 5: 241-59. doi: 10.1016/S0893-6080(05)80023-1
[31]	Melville P, Mooney R (2005) Creating diversity in ensembles using artificial data. Inf Fusion6: 99-111.
[32]	Ting K, Witten I (1997) Stacking bagged and dagged models. Fourteenth International Conference Machine Learning, 367-75.
[33]	Seewald AK, Furnkranz J (2001) An evaluation of grading classifiers. Hoffmann F, Adams N, Fisher D, et al, editors. Advances in intelligent data analysis, IDA 2001. Heidelberg: Springer,115-24.
[34]	Kelarev A, Stranieri A, Yearwood J, et al. (2012) Empirical study of decision trees and ensemble classifiers for monitoring of diabetes patients in pervasive healthcare, 15th International Conference on Networked-Based Information System, NBiS-2012. Melbourne: CPS, 441-6.
[35]	Huehn J, Huellermeier E (2009) FURIA: An algorithm for unordered fuzzy rule induction. DMKD 19: 293-319.
[36]	Kelarev A, Stranieri A, Yearwood J, et al. (2012) Improving classifications for cardiac autonomic neuropathy using multi-level ensemble classifiers and feature selection based on random forest. In: Zhao Y, Li J, Kennedy PJ, et al, editors. Data mining and analytics, 11th Australasian Data Mining Conference, AusDM 2012. Sydney: CRPIT, 134: 93-102.

This article has been cited by:

1.	Julian Kaspczyk, Fawaz Aseeri, New criteria for $\sigma$ -subnormality in $\sigma$ -solvable finite groups, 2024, 0035-5038, 10.1007/s11587-024-00855-8
2.	A. Ballester-Bolinches, S. F. Kamornikov, V. Pérez-Calabuig, O. L. Shemetkova, An arithmetic criterion for $\sigma$ -subnormality in finite groups, 2025, 0026-9255, 10.1007/s00605-025-02072-3

Reader Comments

Your name:*

Email:*
© 2014 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

AIMS Medical Science

0.4

Metrics

Article views(6886) PDF downloads(1125) Cited by(9)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Figures(4) / Tables(2)

AIMS Medical Science

Decision trees and multi-level ensemble classifiers for neurological diagnostics

Related Papers:

Abstract

1. Introduction

2. Materials and methods

2.1. Network architecture

2.2. Multi-convolution features fusion block

2.3. Tree structured aggregation module

2.4. Scalable channel and spatial attention (SCSA)

2.5. Other modules

3. Experiments and results

3.1. Dataset

3.2. Implementation details

3.3. Loss function

3.4. Ablation study

3.4.1. Backbone architecture

3.4.2. Module contribution

3.5. Model comparisons

3.6. Other comparisons

3.6.1. Resolution differences

3.6.2. Effect of separate categories

3.6.3. Training set performance

3.7. Analysis of detection results

3.7.1. Detection box effects

3.7.2. Application of algorithms

4. Conclusions

Acknowledgments

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

AIMS Medical Science

Decision trees and multi-level ensemble classifiers for neurological diagnostics

Related Papers:

Abstract

1. Introduction

2. Materials and methods

2.1. Network architecture

2.2. Multi-convolution features fusion block

2.3. Tree structured aggregation module

2.4. Scalable channel and spatial attention (SCSA)

2.5. Other modules

3. Experiments and results

3.1. Dataset

3.2. Implementation details

3.3. Loss function

3.4. Ablation study

3.4.1. Backbone architecture

3.4.2. Module contribution

3.5. Model comparisons

3.6. Other comparisons

3.6.1. Resolution differences

3.6.2. Effect of separate categories

3.6.3. Training set performance

3.7. Analysis of detection results

3.7.1. Detection box effects

3.7.2. Application of algorithms

4. Conclusions

Acknowledgments

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog