Research article Special Issues

Factors determining generalization in deep learning models for scoring COVID-CT images

  • Received: 04 August 2021 Accepted: 12 October 2021 Published: 27 October 2021
  • The COVID-19 pandemic has inspired unprecedented data collection and computer vision modelling efforts worldwide, focused on the diagnosis of COVID-19 from medical images. However, these models have found limited, if any, clinical application due in part to unproven generalization to data sets beyond their source training corpus. This study investigates the generalizability of deep learning models using publicly available COVID-19 Computed Tomography data through cross dataset validation. The predictive ability of these models for COVID-19 severity is assessed using an independent dataset that is stratified for COVID-19 lung involvement. Each inter-dataset study is performed using histogram equalization, and contrast limited adaptive histogram equalization with and without a learning Gabor filter. We show that under certain conditions, deep learning models can generalize well to an external dataset with F1 scores up to 86%. The best performing model shows predictive accuracy of between 75% and 96% for lung involvement scoring against an external expertly stratified dataset. From these results we identify key factors promoting deep learning generalization, being primarily the uniform acquisition of training images, and secondly diversity in CT slice position.

    Citation: Michael James Horry, Subrata Chakraborty, Biswajeet Pradhan, Maryam Fallahpoor, Hossein Chegeni, Manoranjan Paul. Factors determining generalization in deep learning models for scoring COVID-CT images[J]. Mathematical Biosciences and Engineering, 2021, 18(6): 9264-9293. doi: 10.3934/mbe.2021456

    Related Papers:

    [1] Yong Li, Yang Wang . Temporal convolution attention model for sepsis clinical assistant diagnosis prediction. Mathematical Biosciences and Engineering, 2023, 20(7): 13356-13378. doi: 10.3934/mbe.2023595
    [2] Hongyang Chang, Hongying Zan, Shuai Zhang, Bingfei Zhao, Kunli Zhang . Construction of cardiovascular information extraction corpus based on electronic medical records. Mathematical Biosciences and Engineering, 2023, 20(7): 13379-13397. doi: 10.3934/mbe.2023596
    [3] Zhihao Zhang, Ting Zeng, Yijia Wang, Yinxia Su, Xianghua Tian, Guoxiang Ma, Zemin Luan, Fengjun Li . Prediction Model of hospitalization time of COVID-19 patients based on Gradient Boosted Regression Trees. Mathematical Biosciences and Engineering, 2023, 20(6): 10444-10458. doi: 10.3934/mbe.2023459
    [4] Xiaoqing Lu, Jijun Tong, Shudong Xia . Entity relationship extraction from Chinese electronic medical records based on feature augmentation and cascade binary tagging framework. Mathematical Biosciences and Engineering, 2024, 21(1): 1342-1355. doi: 10.3934/mbe.2024058
    [5] Feiyan Ruan, Xiaotong Ding, Huiping Li, Yixuan Wang, Kemin Ye, Houming Kan . Back propagation neural network model for medical expenses in patients with breast cancer. Mathematical Biosciences and Engineering, 2021, 18(4): 3690-3698. doi: 10.3934/mbe.2021185
    [6] Jingchi Jiang, Xuehui Yu, Yi Lin, Yi Guan . PercolationDF: A percolation-based medical diagnosis framework. Mathematical Biosciences and Engineering, 2022, 19(6): 5832-5849. doi: 10.3934/mbe.2022273
    [7] Feng Li, Mingfeng Jiang, Hongzeng Xu, Yi Chen, Feng Chen, Wei Nie, Li Wang . Data governance and Gensini score automatic calculation for coronary angiography with deep-learning-based natural language extraction. Mathematical Biosciences and Engineering, 2024, 21(3): 4085-4103. doi: 10.3934/mbe.2024180
    [8] Bo An . Construction and application of Chinese breast cancer knowledge graph based on multi-source heterogeneous data. Mathematical Biosciences and Engineering, 2023, 20(4): 6776-6799. doi: 10.3934/mbe.2023292
    [9] Ivan Izonin, Nataliya Shakhovska . Special issue: Informatics & data-driven medicine. Mathematical Biosciences and Engineering, 2021, 18(5): 6430-6433. doi: 10.3934/mbe.2021319
    [10] Kunli Zhang, Bin Hu, Feijie Zhou, Yu Song, Xu Zhao, Xiyang Huang . Graph-based structural knowledge-aware network for diagnosis assistant. Mathematical Biosciences and Engineering, 2022, 19(10): 10533-10549. doi: 10.3934/mbe.2022492
  • The COVID-19 pandemic has inspired unprecedented data collection and computer vision modelling efforts worldwide, focused on the diagnosis of COVID-19 from medical images. However, these models have found limited, if any, clinical application due in part to unproven generalization to data sets beyond their source training corpus. This study investigates the generalizability of deep learning models using publicly available COVID-19 Computed Tomography data through cross dataset validation. The predictive ability of these models for COVID-19 severity is assessed using an independent dataset that is stratified for COVID-19 lung involvement. Each inter-dataset study is performed using histogram equalization, and contrast limited adaptive histogram equalization with and without a learning Gabor filter. We show that under certain conditions, deep learning models can generalize well to an external dataset with F1 scores up to 86%. The best performing model shows predictive accuracy of between 75% and 96% for lung involvement scoring against an external expertly stratified dataset. From these results we identify key factors promoting deep learning generalization, being primarily the uniform acquisition of training images, and secondly diversity in CT slice position.



    Craniotomy is a common treatment of neurosurgical diseases. Patients undergoing craniotomy should enter the ICU for monitoring and treatment because of surgical trauma and the development of postoperative complications [1]. The increase in the length of ICU stay (LoICUS) of these patients results in high medical costs while suffering from physical pain. LoICUS is a common outcome used as an indicator of quality of care and resource use [2]. Therefore, from the perspective of personnel cost and resource management, predicting the LoICUS, especially for patients with long ICU hospitalization is an inevitable task, as the length of stay (LOS) outliers accounted for most ICU cost [3].

    The vigorous development of machine learning (ML) or deep learning (DL) in the past decade has made ML- or DL-based predictive models (PM) widely used in various health care areas, which proves the feasibility of ML or DL methods in this field [4]. However, based on different problem definitions and medical scenarios, whether such methods are better than traditional models is a controversial issue. The prediction of the onset of diseases or medical outcomes, such as mortality, based on electronic medical records (EMRs) using DL or ML has been widely used in recent years [5]. ML- or DL-based methods are often implemented to ICU data for its rarity and value to predict outcomes [6]. Many scholars have focused on LoICUS prediction. Recent studies translated LoICUS prediction into a classification task but not a regression task. For example, Gentimis et al. [7] believed that day 5 has the greatest impact on LoICUS, and they made a classification model of whether the LoICUS is greater than 5 days based on neural network (NN) algorithm with 80% accuracy. In 2019, Harutyunyan et al. [8] made a PM to predict whether the LoICUS is greater than 7 days based on linear regression model and long short-term memory, and the accuracy of the classification model was 84%. They also noted that the task of LoICUS prediction is more difficult than other outcome prediction tasks, such as in-hospital mortality prediction, and they believed that predicting whether a patient would have an extended LoICUS (longer than 7 days) from only the first 24 hours of data might be more reasonable than other time points [8]. In 2021, Khalid et al. [9] applied six ML-based PMs for binary classification using the median patient population ICU stay of 2.64 days, and they considered that random forest (RF) is a good predictive model in this task with 65% accuracy. In conclusion, based on the research above and the sample size of our study, we decided to apply the classification method (classification boundary is the median of LoICUS) to predict the LoICUS in our research.

    In recent years, the decision-making process of the PM is always the key point in interpretable learning, especially in the area of medical research. Miller defined interpretability as "the degree to which a human can understand the cause of a decision" [10]. In the field of medical research, such as the prediction tasks of medical outcome, establishing an interpretable model instead of a black-box one is more acceptable.

    Gradient boosting-based tree models, such as gradient boosting decision tree (GBDT), XGBoost, or LightGBM [11,12], have presented many achievements and unusually brilliant results in data science competition, such as Kaggle, especially in the field of tabular data competition. Through this type of algorithm, we can obtain the decision-making process of the model conveniently in the form of decision rules, which are also known as cross features or feature interactions. In medical research, these rules can be understood as the combination of some clinical variables or biomarkers called multi-biomarkers. Previous studies have shown a higher prognostic accuracy using multi-biomarkers than an individual one [13]. Therefore, we believe that discovering some key decision rules or multi-biomarkers instead of a single biomarker according to self-interpretable methods, such as RF or GBDT, is a helpful method for medical workers.

    In our study, we propose a supervised self-interpretable medical knowledge discovery framework, including a PM called Rules-TabNet (RTN), to predict the LoICUS, aiming to discover some medical rules that may affect the LoICUS of a patient undergoing craniotomy. RTN consists of a gradient boosting-based tree model to generate decision rules and a TabNet, the self-interpretable and NN-based model proposed by Sercan et al. [14] in 2019, to select some important decision rules. We also validated the medical rules discovered by RTN through risk analysis.

    The rest of our paper is organized as follows. In the next section, the materials of our study and the methodology of our framework are described. In the third section, we present the statistics and experimental results of our study. The discussion and conclusion are stated in the last section.

    We aimed to discover some medical rules that may affect the LoICUS of a patient undergoing craniotomy according to our framework. We handled the prediction and discovery tasks based on the following steps, and the framework is shown in Figure 1.

    Figure 1.  Study framework. The framework has three parts. The Input module contains data preprocessing and feature selection. The RTN module has four steps: the construction of GBDT and Tree Decomposing, Rules Embedding, TabNet construction, and Global Rules importance computation. The last part of the framework is Rules Validation.

    • First, the dataset of patients undergoing craniotomy from real-world data is retrieved from the EMR. The data is dichotomized according to the median of LoICUS, which is also the outcome of our study. Positive examples are the patients with longer LoICUS (above the median) after craniotomy, and negative examples are the patients with shorter LoICUS (less than the median). The data was entered into the Input module, and the preoperative and postoperative features were obtained after data preprocessing and feature selection.

    • Second, we trained a supervised gradient boosting-based classification model, and the label is the outcome using the dataset of Step 1. After the hyperparameters of the model were tuned, we constructed a bunch of decision trees based on the model. Some coarse-grained rules can be generated by the tree-decomposing process. We then applied a rule-embedding method to generate rule features. A TabNet model was trained according to the rule features. TabNet has two major modules: Rule Transformer (RT) and Attentive Transformer (AT). The output of TabNet was entered into a Softmax for outcome prediction. During the training process of TabNet, the global importance of these rules can be calculated to evaluate them.

    • Finally, a rule validation step, which includes case–control matching and risk analysis, is subsequently applied to validate the rules to analyze some risk factors that might affect the LoICUS of patients undergoing craniotomy.

    In this study, we retrospectively retrieved the EMR from the surgical ICU data platform of a hospital of China from 2005 to 2018. The variables were extracted from the surgical ICU special disease database, and the demographic information, vital signs, laboratory diagnosis, and other variables of patients from 6 hours before to 24 hours after entering the ICU were used as candidate variables for modeling. During the ICU stay of the patients, some variables, such as vital signs (blood pressure, heart rate, etc.) and laboratory results (platelets, white blood cells, etc.) were continuously changing in a short time. Therefore, these dynamic variables were represented by the first (the first collected value in ICU), maximum (the maximum value of the variable during ICU stay), and minimum (the minimum value of the variable during ICU stay) values of patients in the ICU as shown in Table S1.

    Adult patients (age ≥ 18 years) undergoing craniotomy and requiring ICU treatments were recruited according to the inclusion and exclusion criteria of our study. We only retained data for at least one in-ICU test. For patients who entered the ICU multiple times, we only collected the data of the first ICU admission. Patients whose LoICUS is null or incorrectly recorded, too long (more than 365 days), or too short (less than 24 hours) were excluded. Based on the above criteria, we also excluded patients with an in-hospital outcome of death and patients whose discharge status is automatic discharge. The automatically discharged patient might die after discharge, and the therapeutic importance for such patient and his/her family members is little. Moreover, family members are unwilling to let the patient die in the hospital because of local culture and customs. In our previous study, we made a PM to predict the mortality of patients undergoing craniotomy in the ICU and discovered some high-risk factors (such as heart rate, temperature, etc.) that are closely related to the mortality of patients undergoing craniotomy [15]. Therefore, we excluded the dead samples in our dataset, which means that we removed the "death situation" in input features to reduce the impact of confounding factors. Besides, we performed an odds ratio (OR) test to analyze the relation between death situation and the LoICUS of such patients and a chi-square test to validate the OR and its 95% confidence interval (CI). We also tested the sensitivity of death situation to LoICUS. The main outcome of our study is the LoICUS. Through the investigation and other research [9], we divided the LoICUS into two categories based on the median LoICUS (15 days) to build a classification model, and we also made some regression models.

    This study was approved by the Ethics Committee of Fujian Provincial Hospital, and all procedures performed in this study involving human participants were in accordance with its ethical standards. This study obtained the informed consent of all the participants.

    After data retrieval, we selected the variables with filling rate greater than 70%. A total of 146 variables consisting of 20 preoperative and 126 postoperative variables were extracted from the data platform. The detailed information and statistical results are shown in Table S1.

    In consideration of the feature engineering technology in ML, selecting all the variables as the input matrix for the predictive model is not a wise option, but variables related to clinical practices and medical facts need to be taken into account. We divided the patients into two groups according to whether the LoICUS is more than 15 days, which means the endpoint of our study is LoICUS groups. We made assumptions to see whether a variable is a statistical difference between the two groups. Chi-square test was applied to discrete variables, such as gender, and student t-test was used for continuous variables that conform to normal distribution in both groups (group 1: LoICUS ≥ 15 days; group 2: LoICUS < 15 days). Otherwise, Wilcoxon test was used to judge the statistical difference between two groups. For discrete variables, we applied Yates's correction for chi-square tests according to the number of samples and frequency of variables. We performed 100 permutation tests to correct the multiple testing for each continuous variable, and adjusted P < .05 was used as the standard to express statistical significance. Feature selection was conducted according to the adjusted P-value, and variables related to clinical practices and medical facts were also considered.

    The imputation of missing value is an inevitable process for many predictive models. Missing values were imputed by mean for continuous variables and mode for discrete variables.

    The PMs were constructed using the scikit-learn (version 0.23.0) and PyTorch (version 1.3.1) packages in Python (version 3.7.3 Python Software Foundation), and statistical analysis was conducted using the SciPy (version 1.5.0) package in Python (version 3.7.3). The statistical values and ranges of different variables were different. Continuous variables that conformed to normal distribution were analyzed by mean and standard deviation (mean ± std), whereas other continuous variables were analyzed using median and interquartile range (IQR). The discrete variables are presented as the percentages of positive examples (%).

    A GBDT algorithm was adopted to train the predictive model by generating multiple additive trees because of the good interpretability and advantage in training time of tree-based models compared with NN-based black-box models, .

    Figure 2 illustrates the framework of the GBDT model. The first step is to initialize f0(X)=argminmi=1L(yi,c). N additive trees are trained iteratively. The input of the later tree (Treei) is (Xm,Residuali1). The residual is simulated approximately by negative gradient. The terminal region corresponding to each tree n is Rnj,j=1,2,,J, where J is the number of leaf nodes. As for the J leaf nodes, cnj=argminXiRnjL(yi,fn1(Xi)+c) is the minimum loss of Rnj. I(XRnj) is the indicating function, I=1 when XRnj; otherwise, I=0. The output of the model is shown in Eq (1). The final classification result is y=1/{1+exp[f(X)]}.

    f(X)=f0(x)+Nn=1Jj=1cnjI(xϵRnj) (1)
    Figure 2.  Framework of the gradient boosting-based tree model. N additive trees that are not independent are trained, and the later tree (Treei) learns the residual error based on the previous one (Treei1), which means (Xm,Residuali1) is the input of Treei. f0 is the initial value. R is the terminal region to each additive tree, and c is the minimum square loss of R. I represents the indicating function. f(x) is the output.

    After the parameter tuning of the model, we generated the medical rules from the model. All the additive trees in the model can be denoted as decision trees, and a tree model can be represented as nodes (NR is the root node, NI is the internal node, and NL is the leaf node) and edges (E), Ti={NR,NI,NL,E}. We decomposed the trees into decision rules by connecting NI or NL by E from NR. Any path through the NR can be converted into a decision rule, and the coarse-grained medical rule base RK is obtained and consisted of K rules. The strategy in [16] was adopted to limit the tree size to reduce the complexity of the model. If the maximum depth of the model is set to 2 and the three types of rules in RK as shown in Eq (2). N is the number of trees, K is the number of rules derived from the model, and tn is the number of NL within Treen.

    RK={NRENINRENIENLNRENL;K=Nn=12(tn1) (2)

    We take an example to illustrate the process of tree decomposing. As shown in Figure 1, T1={1×NR,1×NI,3×NL,4×E}. The tree has 3 layers, and four rules can be generated as follows:

    ·NRENI: x0 < a0;

    ·NRENL: x0 ≥ a0;

    ·NRENIENL: x0 < a0 & x2 = a2;

    ·NRENIENL→: x0 < a0 & x2 ≠ a2.

    x0 denotes the split node of the first layer, and x2 denotes the split node of the second layer. a0 and a2 represent the range of split nodes of x0 and x1, respectively.

    Given the K coarse-grained medical rules derived by GBDT, we transfered each rule R in RK into an embedding vector r(i)1×n=[r1,r2,,rm,,rn], where n is the embedding size, also known as the samples of the dataset, and rm is the embedding representation of the m-th patient. The rule features of the data can be denoted as Xn×K.

    rkm=pj=1ˆI[rkm(j)] (3)

    Rule R can also be described as a cross feature, which meaning that R had p features, such as x0 and x2 of r (1) in Figure 1. ˆI(.) is the indicator function and is 0 or 1 like I(.) in [17]. rkm is the representation of the kth rule in rm as shown in Eq (3).

    We applied a TabNet encoder, which is an additive model consisting of several steps to predict the outcome and compute the global importance of all rules. The embedding vector of all rules was input into the TabNet, and we merely selected a batch rule embedding (BRE), XBREB×K, into the rule transformer (RT) and every step, where B is the batch size.

    The RT module has four F-B-G layers, and each of which is composed of a fully-connected (FC) layer, BRE, and a gated linear unit (GLU) [18]. A skip-connection process was applied to each F-B-G, and a normalization with 0.5 after the last three F-B-G was used to stabilize the learning process of the network [19]. (A[i],E[i]) is the output of RT after splitting in stepi, A[i] is the input of the fellow module, and E[i] is used to generate the outcome of stepi.

    A[i] is the input of attentive transformer (AT), and the Mask[i] module of stepi, which is shown in Eq (4), was employed for rule selection. Mask[i] has the same dimension as XBREB×K(i), and the value range of Mask[i] is [0, 1].

    Mask[i]=Sparemax(i1j=1(γMask[j])·hi(A[i1])),Mask[i][0,1] (4)

    Compared with Softmax, Sparemax is a normalization method that can obtain more sparse results [20]. γ is the relaxation parameter, which means that the feature can be used again with higher weight in subsequent steps if γ>1. hi(A[i1]) is the output of the F-B of AT.

    After the AT to Mask[i] to RT in stepi, A[i] was input into the AT of stepi+1, and E[i] was input into the ReLU of stepi to get the output of the i-th estimator.

    Finally, the output of RTN is the sum of the results of all estimators, and the output of the i-th estimator is ReLU(E[i]). The final output of RTN is shown in Eq (5). Softmax was used to classify the LoICUS of patients undergoing craniotomy.

    RTNoutput=FC[Ni=1ReLU(E[i])] (5)

    The GBDT architect and Mask module in our study make RTN have a stronger self-interpretability compared with the traditional NN-based black-box model. The calculation method of rule importance is described below.

    As for a sample b of stepi, Eb[i] is one of the outputs of RT, and the dimension of Eb[i] is B×NE. The output of stepi is (E[i])=0 when Eb,j[i]0, where j is a one-dimensional rule-feature in NE. The contribution of sample b in stepi cb[i] is shown in Eq (6)

    cb[i]=NEj=1ReLU(Eb,j[i]) (6)

    The larger the cb[i], the more obvious its impact on the outcome. cb[i] is also known as the weight of stepi in RTN. Therefore, the global importance of rule j in sample b is the sum of the weights of Masks in all steps. The normalized representation of rule importance if RTN has N steps is shown in Eq (7).

    Ib,j=Ni=1cb[i]·Maskb,j[i]Kj=1Ni=1cb[i]·Maskb,j[i] (7)

    According to the global importance of all rules computed by RTN, the rules RRTN with nonzero global importance will be left for the subsequent validation. For each rule r(f) in RRTN, f represents the feature set contained in rule r. We adopted an analogous case-control strategy, which is similar to the case-control study that is commonly used in medical research to construct the validation data of the risk factor. We performed a risk analysis to validate the correlation between the rule and outcome.

    Case-control study is widely used in risk factor detection [21]. It is very unfriendly, especially for small sample data, such as in our research, because of the precise inclusion and exclusion criteria of such study. We conducted an analogous case-control matching method of each rule automatically in terms of Propensity Score Matching (PSM), which is a statistical method that deals with biases and confounding factors [22]. We constructed two groups for each rule: group A (patients complying to r(f)) and group B (patients not complying to r(f)). Propensity score (PS) was calculated for each patient in terms of the predicted probability of the logistic regression (LR) model shown in Eq (8). F is the representation of all original feature set, and (Ff) means the different sets of the original feature set and the features contained in the rule r(f).

    PS=11+eβ(Ff) (8)

    Two patients from groups A and B with the closest PS were selected and entered to a new patient cluster. Case and control groups were generated from the new cluster according to the outcome [23]. The number of people in the case group is at least 100, and the number of people in the control group is greater than or equal to that in the case group.

    The OR and its 95% CI were computed to analyze the risk of each rule. Here, we made an example to illustrate the process of risk analysis. As for a rule r(f)=minPCT0.16&tracheotomy in RRTN, which means the patient has a minimum PCT of less than 0.16 and a tracheotomy is either performed. We built a fourfold table to calculate the OR value as shown in Table 1. The case–control pair of the rule will be divided into four parts: patients that comply with r(f) and the outcome is LoICUS ≥ 15 (A); patients that do not comply with r(f) and the outcome is LoICUS ≥ 15 (B); patients comply with r(f) and the outcome is LoICUS < 15 (C); and patients do that not comply with r(f) and the outcome is LoICUS < 15 (D). The calculation process of the 95% CI of the OR is shown in Eq (9).

    OR=A/BC/D=5.04;95%CI=eln(OR)±(1.961A+1B+1C+1D)=(2.87,8.84) (9)
    Table 1.  Fourfold table of r(f).
    Case Control Total
    LoICUS 15 A = 119 B = 73 192
    LoICUS < 15 C = 22 D = 68 90
    Total 141 141 282

     | Show Table
    DownLoad: CSV

    We eliminated the rules with vague OR (the left and right CIs of OR strides across 1). The higher the OR corresponding to a rule, the higher the risk impact of the rule on the LoICUS.

    We separated the experiments into two groups (preoperative group and postoperative group) according to the operation time of the patients. We split the dataset into training and testing sets in the proportion of 9:1, and a 10-fold cross validation is performed on the training set in our experiment to find the optimized hyperparameters, and the optimized hyperparameters of models are shown in Table S3. After rule-embedding, in the process of DL model training, we split the rule-features in to training, validation, and testing set in the proportion of 8:1:1. The classification result is measured by accuracy, precision, recall, f1-score, and AUC of the receiver operating characteristic (ROC) curve.

    There are two types of parameters in RTN, including the parameters of tree-based model and the hyper-parameters of TabNet. Table 2 presents the parameters applied to preoperative model and postoperative model.

    Table 2.  Parameters of RTN in the experiments.
    Parameters Preoperative model Postoperative model
    Tree-based model
    learning rate 0.1 0.01
    n_estimators 20 40
    max_depth 2 2
    min_samples_split 124 158
    min_samples_leaf 9 57
    max_features 'sqrt' 'sqrt'
    TabNet model
    batch size 100 100
    epochs 71 48
    Gamma 1.8 1.4
    NA 56 40
    NE 56 40
    n_steps 9 6

     | Show Table
    DownLoad: CSV

    Specifically, as for tree-based model, the learning rate is tuned first to increase the convergence speed. n_estimators denotes the number of additive trees that are trained to generate the rules, max_depth is the limitation of the depth of the tree model, min_samples_split represents the minimum number of samples required to split NI, min_samples_leaf means the minimum number of samples required to be at a NL, and max_features is the number of features to consider when looking for the best split.

    As for the hyperparameters for TabNet, gamma is the coefficient for feature reusage in Mask[i] to increase the attention of features that the model has not been focused on in the next step. NA and NE are the dimensions of the outputs of RT, and NA=NE is usually a good choice. n_steps means the number of steps in the architecture. TabNet was trained using gradient descent-based optimization, the optimizer is Adaptive Moment Estimation (Adam), and the learning rates are 0.02 for the postoperative model and 0.001 for the preoperative model. All rule-features were mapped to a single-dimensional trainable scalar with a learnable embedding, the loss function of the classification is softmax cross entropy, and the training process will stop until convergence. All the hyperparameters of TabNet were optimized on the validation set.

    We constructed some baseline models to prove the effectiveness of our model. In the rule generation process, we implemented two self-interpretable models, namely, LR and RF, which can generate some rules from the baseline model compared with GBDT. LR is widely used in many medical research and is considered a baseline model for classification task [24,25]. Ensembling methods, such as RF and GBDT, are a good choice for tabular data in recent years. We applied some evaluation metrics, including the area under the receiver operating characteristic curve (AUC), accuracy, precision, recall, and F1 score to evaluate the performance of the rule generation model.

    RuleFit (GBDT based) was used as the baseline model compared with RTN in the fitting of the rule-based predictive model. RuleFit is a composite and self-interpretable model that can fit the rules generated by a tree-based model according to a linear model. It is a compound model that includes a rule generation model (GBDT) and a LR to screen the rules by global importance. The structure of RuleFit, including rule generation and rule fitting, has high similarity with our model. Therefore, we chose RuleFit as our main comparison model.

    We also performed supplementary regression experiments to predict LoICUS using some regression models. We implemented some traditional models, such as linear regression, poisson regression, and hurdle regression, and some ML-based models, such as random forest regressor (RFR), gradient boosting regressor (GBR), and RuleFit (GBR based), to predict the number of days. Fitting predictive models for a dichotomous endpoint is much easier then predicting for time to event outcomes. Thus, we set supplementary experiments to illustrate the difficulty of regression in our tasks and why we defined the LoICU prediction task as a classification problem.

    The results of OR test to analyze the relation between death situation and the LoICUS (OR = 0.78 [0.46, 1.32], p = 0.421) revealed no significant difference between the death situation and LoICUS of such patients. Sixty-one death samples were excluded by our criteria from the original data. We focused on these data and extracted 61 non-death samples from the collected data through down-sampling. Based on the 122 samples, we trained a traditional LR model. The input features were consistent with the feature selection results of this study. The outcome is whether the LoICUS is greater than 15, and the evaluation metrics were defined as the AUC, sensitivity, and specificity of the model. We only focused on the model performance of the 61 death samples but did not split the data into training and testing sets. The experiment results are shown in Table S5. We compared whether the "death situation" was added to the model as a feature. We found that the "death situation" did not increase the sensitivity of the model but decreased the performance of the model. Hence, we excluded the dead samples in our study.

    After data collection, 631 non-hospital dead patients (63.87% male) were enrolled into the study. The average age of the patients was 55.94 ± 15.15 years. The statistical information of the LoICUS is shown in Table 3. The positive examples are the patients with a LoICUS of more than 15 days after craniotomy (group one: 319 patients), and the negative examples are patients with a LoICUS of less than 15 days (group two: 312 patients). A total of 146 variables were extracted from the dataset. The variables were divided into the following categories: demographic information, previous medical history, brain injury inducements, infection sources, vital signs, Glasgow Coma Scale (GCS) score, hematoma properties, laboratory test, and therapeutic index.

    Table 3.  Statistical information of LoICUS.
    Statistical information Days
    Mean of LoICUS 19.87
    Std of LoICUS 19.87
    Min of LoICUS 1.01
    Q1 of LoICUS 5.53
    Median of LoICUS 15.00
    Q3 of LoICUS 28.02
    Max of LoICUS 182.42
    Note: Std: Standard deviation; Min: Minimum; Q1: Lower quantile; Q3: Higher quantile; Max: Maximum

     | Show Table
    DownLoad: CSV

    The feature selection procedure revealed that 91 variables are statistically significant as shown in Table S1. The bold parts indicate the variables that are significantly different from the outcome. Six of them are preoperative variables (age, epilepsy, cerebral contusion and laceration, brain tumor, vascular diseases, and intracerebral hematoma), and 85 of them are postoperative variables. Patients in group one (58.14 ± 15.44) were older than those in group two (53.69 ± 14.53), and the two groups had a remarkable difference in age. The statistical results revealed that the LoICUS of men is longer than that of women. The statistically substantial variables are distributed in GCS scores, vital signs, and laboratory test indicators besides age. The detailed statistical results of the dataset are shown in Table S2.

    The dataset was split into training and testing sets with a ratio of 9: 1 (training set: 567, testing set: 64). After the hyperparameters were tuned, the rules were generated based on GBDT. The results of GBDT compared with other baseline models are shown in Table 4. The optimized hyperparameters of the models are shown in Table S3.

    Table 4.  Evaluation metrics of models.
    PM Accuracy Precision Recall F1 AUC
    Preoperative
    LR 0.59 (0.50, 0.73) 0.66 (0.44, 0.92) 0.50 (0.16, 0.90) 0.53 (0.24, 0.75) 0.61 (0.47, 0.77)
    RF 0.58 (0.50, 0.69) 0.64 (0.43, 0.90) 0.51 (0.10, 0.88) 0.53 (0.17, 0.73) 0.60 (0.46, 0.74)
    GBDT 0.56 (0.44, 0.69) 0.68 (0.39, 0.92) 0.38 (0.04, 0.83) 0.43 (0.07, 0.70) 0.60 (0.44, 0.73)
    Postoperative
    LR 0.78 (0.67, 0.88) 0.82 (0.68, 0.93) 0.77 (0.53, 0.90) 0.79 (0.66, 0.88) 0.85 (0.72, 0.95)
    RF 0.80 (0.70, 0.89) 0.82 (0.70, 0.93) 0.79 (0.66, 0.93) 0.80 (0.70, 0.89) 0.88 (0.80, 0.95)
    GBDT 0.81 (0.70, 0.88) 0.83 (0.70, 0.93) 0.79 (0.56, 0.93) 0.81 (0.68, 0.89) 0.91 (0.84, 0.97)

     | Show Table
    DownLoad: CSV

    For the preoperative model, we found that the overall effect of the prediction models is not good (the AUC of all models is lower than 0.70). Twenty estimators and two depths were set in GBDT, and we generated 38 rules according to the model. For the postoperative model, all the evaluation metrics of GBDT were higher than those of the other baseline models. We set 40 estimators and two depths in GBDT and generated 116 medical rules according to the model.

    After rule embedding, we split the rule-feature into training, validation, and testing sets with a ratio of 8:1:1 (training set: 510, validation set: 57, testing set: 64). Figure 3 depicts the training process of RTN based on the constructed datasets. The horizontal axis is the number of epochs, and the vertical axis represents the AUCs of the training and validation sets.

    Figure 3.  Training process of RTN. The blue curve stands for the training loss of RTN, the orange and green curves are the variation of the training and validation AUCs, respectively.

    We found that the preoperative model is difficult to fit and the loss is about 0.60 after 30 epochs. The postoperative model can converge quickly after 60 epochs, and the final loss is close to zero. We can draw the conclusion that the performances of the preoperative and postoperative models on the validation set is poor according to the figure. The 10 times 10-fold cross validation results of the different models are shown in Table 5. We used the mean and 95% confidence interval of all indicators. The RTN performed better on the training set compared with RuleFit, and all the evaluation metrics exceeded the baseline model.

    Table 5.  Ten times 10-fold cross validation results of different models.
    PM Accuracy Precision Recall F1 AUC
    Preoperative
    RuleFit 0.57 (0.56, 0.58) 0.57 (0.56, 0.59) 0.60 (0.59, 0.62) 0.58 (0.57, 0.60) 0.59 (0.58, 0.60)
    RTN 0.65 (0.64, 0.66) 0.71 (0.70, 0.72) 0.56 (0.55, 0.58) 0.60 (0.59, 0.61) 0.71 (0.70, 0.73)
    Postoperative
    RuleFit 0.73 (0.72, 0.74) 0.73 (0.72, 0.74) 0.75 (0.74, 0.78) 0.73 (0.73, 0.74) 0.80 (0.79, 0.81)
    RTN 0.79 (0.78, 0.80) 0.79 (0.78, 0.80) 0.80 (0.79, 0.81) 0.79 (0.78, 0.80) 0.90 (0.89, 0.91)
    Note: Bold font indicates the best value for each metric.

     | Show Table
    DownLoad: CSV

    Table 6 shows the performance on the test set of our model comparing to RuleFit (GBDT based). It is noticeable that the performance of postoperative models is much better than baseline model, and the five evaluating indicators of postoperative models have more superiority.

    Table 6.  Performance of RTN on the test set compared with the baseline model.
    PM Accuracy Precision Recall F1 AUC
    preoperative
    RuleFit 0.56 (0.47, 0.66) 0.57 (0.46, 0.71) 0.68 (0.14, 0.92) 0.60 (0.22, 0.73) 0.51 (0.38, 0.67)
    RTN 0.54 (0.41, 0.66) 0.61 (0.35, 0.88) 0.43 (0.01, 0.90) 0.46 (0.16, 0.71) 0.57 (0.40, 0.71)
    postoperative
    RuleFit 0.73 (0.63, 0.81) 0.86 (0.73, 0.96) 0.70 (0.46, 0.85) 0.76 (0.61, 0.85) 0.83 (0.72, 0.92)
    RTN 0.76 (0.64, 0.86) 0.86 (0.72, 0.95) 0.67 (0.40, 0.88) 0.74 (0.55, 0.87) 0.85 (0.75, 0.95)
    Note: Bold font indicates the best value for each metric.

     | Show Table
    DownLoad: CSV

    For the preoperative models, the advantage of RTN is not obvious compared with the baseline model. Compared with baseline model, RTN had an increased in AUC (6%, 0.57) and precision (4%, 0.61). RuleFit (GBDT-based model) had 0.56 accuracy, 0.68 recall, and 0.60 F1 score, which is higher than that of RTN.

    We can see the advantage of our postoperative model (RTN) compared with the baseline model. Compared with previous studies on self-interpretable models, we can find remarkable performance improvements in our model.

    The accuracy (0.76) and AUC (0.85) of the RTN exceeded the performance of the baseline model, and the precision (0.86) of the RTN was equal to that of the baseline model. Its recall and F1 score are slightly lower than those in the baseline model. Compared with RuleFit, which is also a rule-based prediction model, our model had few limitations, such as the premise assumption of data distribution. Therefore, the RTN proposed in our study performed better and had better robustness and applicability.

    The results of regression models are shown in Table S4. The evaluation metrics are explained variance score, mean squared error, root mean squared error, mean absolute error, R2 score, and adjusted R2 score. Overall, according to the results and performance in supplementary experiments, we drew the conclusion that regression predictive models are harder to fit compared with classification models on preoperative and postoperative tasks. Besides, considering that the follow-up tasks need the strong reliability of the prediction model, we believe that using a dichotomized endpoint rather than time to event data as the outcome of the study is a better choice. Therefore, we chose classification instead of regression methods to predict the LoICUS.

    Based on the outstanding performance of our postoperative model, we extracted and validated the medical rules generated by RTN. A total of 116 rules imported into the TabNet were extracted for subsequent verification. According to the global rule importance computed by RTN, we filtered out 57 rules with global importance of zero, and 59 rules were left. We match the case–control pairs of each rule according to the case–control matching strategy. After the rules with vague OR were eliminated, 24 rules were left, and 13 of which are risk factors. We listed seven representative decision rules in Table 7. All rules are sorted in descending order by OR value.

    Table 7.  Medical rules discovered by our framework.
    No. Medical decision rules OR 95% CI low 95% CI high I
    1 max ALP > 160.0 & min RBC 2.52 5.12 1.59 16.42 0.0330
    2 min WBC 5.33 & first Ca 2.25 4.29 2.05 8.96 0.0034
    3 min PCT 0.30 & max PCT > 0.31 4.17 2.39 7.28 0.0069
    4 min PT 12.65 & max PT 14.15 3.95 1.82 8.57 0.0594
    5 min urine volume 32.5 & min DBIL 3.58 3.82 2.25 6.48 0.0741
    6 min PCT 0.30 & max PCT > 0.44 3.54 2.12 5.91 0.0002
    7 max GLO > 36.35 3.29 1.31 8.27 0.0001
    Note: OR: odds ratio; 95% CI low: the lower limit of confidence interval of OR; 95% CI high: the higher limit of confidence interval of OR; I: global importance of the rule computed by RTN; ALP: alkaline phosphatase; RBC: red blood cells; WBC: white blood cells; Ca: calcium; PCT: procalcitonin; PT: prothrombin time; GLO: globulin.

     | Show Table
    DownLoad: CSV

    Based on the experiments results, we can draw the conclusion that in the LoICUS prediction task of patients undergoing craniotomy, postoperative variables, such as the vital signs and laboratory information of patients, may have a greater impact on the outcome compared with preoperative variables.

    From the experiment result of medical rules discovery and validation in Table 6, we found that the global importance of the rules is not completely positively correlated with the results of risk analysis. Some interesting findings and inspirations can be analyzed from the results.

    No. 1 is the rule with the highest OR value (5.12) and high global feature importance (0.0330). No. 1 means that within 24 hours after the patient enters the ICU, we need pay more attention to patients with an ALP higher than 160 U/L and an RBC below 2.52 1012/L, because they might have a longer LoICUS. Research found that the gradual elevation in ALP might prolong hospitalization [26]. A study found that the postoperative length of hospitalization increases by 0.837% (95% CI, 0.249–1.425%) per RBC unit transfused [27]. Therefore, the LoICUS might increase when RBC is too low to require red blood cell transfusion.

    No. 2 implies that when the patient's serum ionized calcium is lower than 2.25 mmol/L when he first entered the ICU, we also need to pay attention to whether the WBC of patient will drop below 5.33 109/L, which might increase the LoICUS for patients undergoing craniotomy. Satoshi et al. indicated that for patients after cardiopulmonary bypass in the ICU, ionized calcium can make a remarkable difference in LoICUS [28]. To the best of our knowledge, no study has reported on the effect of ionized calcium on the LoICUS of patients undergoing craniotomy when they first entered the ICU. A study found that for patients undergoing craniotomy, infection can be predicted within 4 days according to standard blood count data, such as WBC [29]. However, within 24 hours after the patient undergoing craniotomy enters the ICU, the effect of decreased WBC on LoICUS has not been studied before, and its rule needs to be paid attention.

    No. 3 and No. 6 focuses on the minimum and maximum values of PCT for patients undergoing craniotomy, meaning that too low or too high PCT within 24 hours after the patient undergoing craniotomy enters the ICU will have a certain impact on the LoICUS. A study found that PCT is a valuable marker for patients undergoing craniotomy [30]. Therefore, high and low PCT values should be concerned for patients undergoing craniotomy, especially the LoICUS.

    No. 4 indicates that short prothrombin time may affect the LoICUS after patients enter the ICU. It has the second highest global importance with an OR of 3.95. Traumatic injury is associated with coagulopathy [31]; thus, we consider that it might be a risk factor of patients undergoing craniotomy that has not been studied before.

    No. 5 possesses the highest global importance (0.0741) among the rules. It means that the minimum urine volume of the patient is lower than 32.5 and the minimum direct bilirubin (DBiL) is lower than 3.58 μmol/L. The urine outcome is routinely measured in the ICU and may be associated with hospital mortality [32]. Hyperbilirubinemia is a common postoperative complication, and DBiL is related to some pathophysiologies [33]. However, research on declined DBiL, especially for patients undergoing craniotomy is few. We believe that for patients undergoing craniotomy who entered the ICU within 24 hours, the cross influence of declined DBiL and urine volume on LoICUS might need to be paid more attention.

    No. 7 means that when the serum globulin (GLO) is higher than 36.35 g/L, the rule is a risk factor of LoICUS for patients undergoing craniotomy. The increase in serum GLO may be the enhancement of immune response caused by infection. We found that each infection after craniotomy is statistically substantial and might affect the LoICUS. Therefore, we should pay attention to whether patients have elevated serum GLO within 24 hours after entering the ICU.

    Our study still has some limitations and improvements despite the compelling results. First, in the study of e-healthcare systems using health electronics records, privacy is an important factor that needs to be considered, and we need to integrate privacy-preserving methods as a future work for the proposed method [34,35]. Second, the lack of preoperative variables in our study leads to the poor effect of our preoperative model. However, considering the allocation of medical resources, discovering the preoperative rules of patients undergoing craniotomy may be more helpful to reduce the medical burden in the real world. Third, our model achieved a better performance than the baseline model, and it has a strong interpretability, especially the interpretation of feature interaction. However, its performance is slightly lower compared with the gradient boosting-based method, such as GBDT. Our research shed light on the results of [36], which considered that tree-based models still outperform DL on tabular data, especially on medium-sized data (<10 K samples). They also found that NNs are not robust to uninformative features, which we could not remove, because the feature selection method is combined with the statistical results and recommendations of domain experts. In our study, we considered GBDT is a basic, typical, and common method in gradient boosting, and we might attempt to apply other ML algorithms. such as faster implementation methods, including LightGBM, instead of GBDT in rule generation procedure in the future. Besides, other ML algorithms, except for tree-based method (such as k-NN, SVM, etc.) are also used in medical research [37,38], and the rule generation framework based on these methods are expected to be studied. We considered that our approaches disturb the spatial continuity of the original structure of the trees to some extent in the process of tree decomposing and rule embedding. In the future work, we can handle this problem by regarding trees as directed acyclic graphs, and the representation of rules is the combination of vertices and edges of graphs, such as the Bayesian network. Moreover, missing data is an inevitable phenomenon in real-world study, and many of the ML packages we applied in our research do not accept missing data. According to the previous research, missing values have to be imputed using the population mean. However, mean imputation is more likely to introduce bias in the model; thus, we might build a model using an algorithm that is less affected by missing values in the future. Finally, verifying a piece of knowledge in medical research is difficult, especially on small-sample datasets. In our study, some possible high-risk rules may be filtered out because the experimental samples are too few.

    Aiming at discovering some medical decision rules affecting the LoICUS of patients undergoing craniotomy, this paper proposes an interpretable framework including a PM (RTN) and a rule validation process based on real-world EMR data. The medical decision rules were generated and screened preliminary by RTN, and the validation procedure was implemented to verify and make further selection of the rules. The experiment results indicate that postoperative features have a greater impact on the LoICUS of patients undergoing craniotomy compared with preoperative features. The results also proved that RTN can achieve good performance and outperform other postoperative baseline models. The medical decision rules discovered by our framework are valuable in providing some clinical decision supports in the ICU to shorten the LoICUS and reduce the medical expenses of patients undergoing craniotomy.

    This work was supported by the Fujian Province Intensive Medical Center Construction Project (2017-510) and the National Natural Science Foundation of China (Grant Nos. 11832003) and Major special project of "scientific and technological innovation 2025" in Ningbo (2021Z021). We sincerely appreciate Yidu Cloud (Beijing) Technology Co., Ltd. for providing technical support in data extracting and model developing.

    All authors declare no conflicts of interest in this paper.



    [1] STATISTA, Coronavirus Deaths Worldwide by Country, 2021. Available from: https://www.statista.com/statistics/1093256/novel-coronavirus-2019ncov-deaths-worldwide-by-country/.
    [2] U. S. CDC., About Variants of the Virus that Causes COVID-19, 2021. Available from: https://www.cdc.gov/coronavirus/2019-ncov/transmission/variant.html.
    [3] Global Preparedness Monitoring Board, A World in Disorder, 2021. Available from https://www.gpmb.org/annual-reports/overview/item/2020-a-world-in-disorder.
    [4] A. Ulhaq, J. Born, A. Khan, D. P. S. Gomes, S. Chakraborty, M. Paul, COVID-19 control by computer vision approaches: A survey, IEEE Access, 8 (2020), 179437-179456. doi: 10.1109/ACCESS.2020.3027685
    [5] C. Butt, J. Gill, D. Chun, B. A. Babu, Deep learning system to screen coronavirus disease 2019 pneumonia, Appl. Intell., 1 (2020), 1-7. doi: 10.48185/jaai.v1i1.30
    [6] J. Chen, L. Wu, J. Zhang, L. Zhang, D. Gong, Y. Zhao, et al., Deep learning-based model for detecting 2019 novel coronavirus pneumonia on high-resolution computed tomography, Sci. Rep., 10 (2020), 19196. doi: 10.1038/s41598-020-76282-0
    [7] H. Gunraj, L. Wang, A. Wong, COVIDNet-CT: A tailored deep convolutional neural network design for detection of COVID-19 cases from chest CT images, Front. Med., 7 (2020), 1025.
    [8] R. Kumar, S. Zhang, W. Wang, W. Amin, J. Kumar, Blockchain-federated-learning and deep learning models for COVID-19 detection using CT imaging, preprint, arXiv: 2007.06537.
    [9] Z. Li, Z. Zhong, Y. Li, T. Zhang, L. Gao, D. Jin, et al., From community-acquired pneumonia to COVID-19: A deep learning-based method for quantitative analysis of COVID-19 on thick-section CT scans, Eur. Radiol., 30 (2020), 6828-6837. doi: 10.1007/s00330-020-07042-x
    [10] Q. Ni, Z. Y. Sun, L. Qi, W. Chen, Y. Yang, L. Wang, et al., A deep learning approach to characterize 2019 coronavirus disease (COVID-19) pneumonia in chest CT images, Eur. Radiol., 30 (2020), 6517-6527. doi: 10.1007/s00330-020-07044-9
    [11] T. D. Pham, A comprehensive study on classification of COVID-19 on computed tomography with pretrained convolutional neural networks, Sci. Rep., 10 (2020), 16942-16942. doi: 10.1038/s41598-020-67245-6
    [12] M. Polsinelli, L. Cinque, G. Placidi, A light CNN for detecting COVID-19 from CT scans of the chest, Pattern Recognit. Lett., 140 (2020), 95-100. doi: 10.1016/j.patrec.2020.10.001
    [13] P. Silva, E. Luz, G. Silva, G. Moreira, R. Silva, D. Lucio, et al., COVID-19 detection in CT images with deep learning: A voting-based scheme and cross-datasets analysis, Inform. Med. Unlocked., 20 (2020), 100427-100427. doi: 10.1016/j.imu.2020.100427
    [14] S. Wang, B. Kang, J. Ma, X. Zeng, M. Xiao, J. Guo, et al., A deep learning algorithm using CT images to screen for Corona virus disease (COVID-19), Eur. Radiol., 31 (2021), 6096-6104. doi: 10.1007/s00330-021-07715-1
    [15] X. Wang, X. Deng, Q. Fu, Q. Zhou, J. Feng, H. Ma, et al., A Weakly-Supervised Framework for COVID-19 Classification and Lesion Localization From Chest CT, IEEE Trans. Med. Imaging, 39 (2020), 2615-2625. doi: 10.1109/TMI.2020.2995965
    [16] T. Akram, M. Attique, S. Gul, A. Shahzad, M. Altaf, S. S. R. Naqvi, et al., A novel framework for rapid diagnosis of COVID-19 on computed tomography scans, Pattern Anal. Appl., 24 (2021), 951-964. doi: 10.1007/s10044-020-00950-0
    [17] M. Mohammadpoor, M. S. Karizaki, M. S. Karizaki, A deep learning algorithm to detect coronavirus (COVID-19) disease using CT images, PeerJ. Comp. Sci., 7 (2021), e345. doi: 10.7717/peerj-cs.345
    [18] J. Zhang, Y. Xie, Y. Li, C. Shen, Y. Xia, COVID-19 screening on chest X-ray images using deep learning based anomaly detection, preprint, arXiv: 2003.12338.
    [19] F. Ucar, D. Korkmaz, COVIDiagnosis-Net: Deep Bayes-SqueezeNet based diagnosis of the coronavirus disease 2019 (COVID-19) from X-ray images, Med. Hypotheses, 140 (2020), 109761. doi: 10.1016/j.mehy.2020.109761
    [20] Y. Oh, S. Park, J. C. Ye, Deep learning COVID-19 features on CXR using limited training data sets, IEEE Trans. Med. Imaging, 39 (2020), 2688-2700. doi: 10.1109/TMI.2020.2993291
    [21] S. H. Yoo, H. Geng, T. L. Chiu, S. K. Yu, D. C. Cho, J. Heo, et al., Deep learning-based decision-tree classifier for COVID-19 diagnosis from chest X-ray imaging, Front. Med., 7 (2020), 427. doi: 10.3389/fmed.2020.00427
    [22] J. Civit-Masot, F. Luna-Perejón, A. Civit, Deep learning system for COVID-19 diagnosis aid using X-ray pulmonary images, Appl. Sci., 10 (2020), 4640. doi: 10.3390/app10134640
    [23] M. Blain, M. T Kassin, N. Varble, X. Wang, Z. Xu, D. Xu, et al., Determination of disease severity in COVID-19 patients using deep learning in chest X-ray images, Diagn. Interv. Radiol., 27 (2020), 20-27.
    [24] J. P. Cohen, L. Dao, K. Roth, P. Morrison, Y. Bengio, A. F. Abbasi, et al., Predicting COVID-19 pneumonia severity on chest X-ray with deep learning, Cureus, 12 (2020), e9448.
    [25] B. Liu, Y. Zhou, Y. Yang, Y. Zhang, Experiments of federated learning for COVID-19 chest X-ray images, preprint, arXiv: 2007.0559.
    [26] M. E. Karar, E. E. D. Hemdan, M. A. Shouman, Cascaded deep learning classifiers for computer-aided diagnosis of COVID-19 and pneumonia diseases in X-ray scans, Complex Intell. Syst., 7 (2021), 235-247. doi: 10.1007/s40747-020-00199-4
    [27] H. Amin, A. Darwish, A. E. Hassanien, Classification of COVID19 X-ray images based on transfer learning InceptionV3 deep learning model, in Digital Transformation and Emerging Technologies for Fighting COVID-19 Pandemic: Innovative Approaches, Springer International Publishing, (2021), 111-119.
    [28] K. Shankar, E. Perumal, A novel hand-crafted with deep learning features based fusion model for COVID-19 diagnosis and classification using chest X-ray images, Complex Intell. Syst., 7 (2020), 1277-1293.
    [29] O. M. Elzeki, M. Shams, S. Sarhan, M. Abd Elfattah, A. E. Hassanien, COVID-19: A new deep learning computer-aided model for classification, PeerJ. Comp. Sci., 7 (2021), e358. doi: 10.7717/peerj-cs.358
    [30] H. S. Alghamdi, G. Amoudi, S. Elhag, K. Saeedi, J. Nasser, Deep learning approaches for detecting COVID-19 from chest X-ray images: A survey, IEEE Access, 9 (2021), 20235-20254. doi: 10.1109/ACCESS.2021.3054484
    [31] J. Born, N. Wiedemann, M. Cossio, C. Buhre, G. Brändle, K. Leidermann, et al., Accelerating detection of lung pathologies with explainable ultrasound image analysis, Appl. Sci., 11 (2021), 672. doi: 10.3390/app11020672
    [32] S. Roy, W. Menapace, S. Oei, B. Luijten, E. Fini, C. Saltori, et al., Deep learning for classification and localization of COVID-19 markers in point-of-care lung ultrasound, IEEE Trans. Med. Imaging, 39 (2020), 2676-2687. doi: 10.1109/TMI.2020.2994459
    [33] T. Ai, Z. Yang, H. Hou, C. Zhan, C. Chen, W. Lv, et al., Correlation of chest CT and RT-PCR testing for coronavirus disease 2019 (COVID-19) in China: A report of 1014 cases, Radiology, 296 (2020), E32-E40. doi: 10.1148/radiol.2020200642
    [34] A. Bernheim, X. Mei, M. Huang, Y. Yang, Z. A. Fayad, N. Zhang, et al., Chest CT findings in coronavirus disease-19 (COVID-19): Relationship to duration of infection, Radiology, 295 (2020), 200463. doi: 10.1148/radiol.2020200463
    [35] M. Roberts, D. Driggs, M. Thorpe, J. Gilbey, M. Yeung, S. Ursprung, et al., Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans, Nat. Mach. Intell., 3 (2021), 199-217. doi: 10.1038/s42256-021-00307-0
    [36] R. F. Wolff, K. G. M. Moons, R. D. Riley, P. F. Whiting, M. Westwood, G. S. Collins, et al., Probast: A tool to assess the risk of bias and applicability of prediction model studies, Ann. Intern. Med., 170 (2019), 51-58. doi: 10.7326/M18-1376
    [37] Y. Ji, Z. Ma, M. P. Peppelenbosch, Q. Pan, Potential association between COVID-19 mortality and health-care resource availability, Lancet Glob. Health, 8 (2020), e480. doi: 10.1016/S2214-109X(20)30068-1
    [38] E. Tartaglione, C. A. Barbano, C. Berzovini, M. Calandri, M. Grangetto, Unveiling COVID-19 from chest X-ray with deep learning: A hurdles race with small data, Int. J. Environ. Res. Public Health, 17 (2020), 1-17.
    [39] OpenCV, OpenCV: Histograms-2: Histogram equalization, 2021. Available from: https://docs.opencv.org/master/d5/daf/tutorial_py_histogram_equalization.html.
    [40] K. Zuiderveld, Contrast limited adaptive histogram equalization, in Graphics gems IV: Academic Press Professional, Academic Press, (1994), 474-485.
    [41] Z. Al-Ameen, G. Sulong, A. Rehman, A. Al-Dhelaan, T. Saba, M. Al-Rodhaan, An innovative technique for contrast enhancement of computed tomography images using normalized gamma-corrected contrast-limited adaptive histogram equalization, Eurasip J. Adv. Sig. Pr., 2015 (2015), 32. doi: 10.1186/s13634-015-0214-1
    [42] A. Alekseev, A. Bobe, Gabornet: Gabor filters with learnable parameters in deep convolutional neural network, preprint, arXiv: 1904.13204.
    [43] S. P. Morozov, A. E. Andreychenko, I. A. Blokhin, P. B. Gelezhe, A. P. Gonchar, A. E. Nikolaev, et al., MosMedData: data set of 1110 chest CT scans performed during the COVID-19 epidemic, Dig. Diagnostics, 1 (2020), 49-59. doi: 10.17816/DD46826
    [44] V. Guarrasi, N. C. D'Amico, R. Sicilia, E. Cordelli, P. Soda, Pareto optimization of deep networks for COVID-19 diagnosis from chest X-rays, Pattern Recognit., 121 (2022), 108242. doi: 10.1016/j.patcog.2021.108242
    [45] J. R. Zech, M. A. Badgeley, M. Liu, A. B. Costa, J. J. Titano, E. K. Oermann, Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study, PLoS Med., 15 (2018), e1002683. doi: 10.1371/journal.pmed.1002683
    [46] P. Mooney, Chest X-ray images (pneumonia). Available from: https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia.
    [47] G. Maguolo, L. Nanni, A critic evaluation of methods for COVID-19 automatic detection from X-ray images, Inf. Fusion, 76 (2021), 1-7. doi: 10.1016/j.inffus.2021.04.008
    [48] J. Cohen, P. Morrison, L. Dao, COVID-19 image data collection, preprint, arXiv: 2003.11597.
    [49] A. J. DeGrave, J. D. Janizek, S. I. Lee, AI for radiographic COVID-19 detection selects shortcuts over signal, Nat. Mach. Intell., 3 (2021), 610-619. doi: 10.1038/s42256-021-00338-7
    [50] X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, R. M. Summers, ChestX-Ray8: Hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 3462-3471.
    [51] J. Saborit, J. Montell, A. Pertusa, A. Bustos, M. Cazorla, J. Galant, et al., BIMCV COVID-19+: A large annotated dataset of RX and CT images from COVID-19 patients, preprint, arXiv: 2006: 01174.
    [52] A. Bustos, A. Pertusa, J. M. Salinas, M. de la Iglesia-Vayá, Padchest: A large chest X-ray image dataset with multi-label annotated reports, Med. Imag. Anal., 66 (2020), 101797. doi: 10.1016/j.media.2020.101797
    [53] K. B. Ahmed, G. M. Goldgof, R. Paul, D. B. Goldgof, L. O. Hall, Discovery of a generalization gap of convolutional neural networks on COVID-19 X-rays classification, IEEE Access, 9 (2021), 72970-72979. doi: 10.1109/ACCESS.2021.3079716
    [54] P. R. Bassi, R. Attux, COVID-19 detection using chest X-rays: Is lung segmentation important for generalization?, preprint, arXiv: 2104.06176.
    [55] M. Elgendi, M. U. Nasir, Q. Tang, D. Smith, J.-P. Grenier, C. Batte, et al., The effectiveness of image augmentation in deep learning networks for detecting COVID-19: A geometric transformation perspective, Frontiers Med., 8 (2021).
    [56] J. Shuja, E. Alanazi, W. Alasmary, A. Alashaikh, COVID-19 open source data sets: A comprehensive survey, Appl. Intell., 51 (2020), 1296-1325.
    [57] M. Jun, G. Cheng, W. Yixin, A. Xingle, G. Jiantao, Y. Ziqi, et al., COVID-19 CT lung and infection segmentation dataset (verson 1.0), Zenodo, 2020. Available from https://doi.org/10.5281/zenodo.3757476.
    [58] F. Shan, Y. Gao, J. Wang, W. Shi, N. Shi, M. Han, et al., Lung infection quantification of COVID-19 in CT images with deep learning, preprint, arXiv: 2003.04655.
    [59] J. P. Cohen, P. Morrison, L. Dao, K. Roth, T. Q. Duong, M. Ghassemi, COVID-19 image data collection: Prospective predictions are the future, preprint, arXiv: 2006.11988.
    [60] J. Zhao, Y. Zhang, X. He, P. Xie, COVID-CT-dataset: A CT scan dataset about COVID-19, preprint, arXiv: 2003.13865.
    [61] MedRxiv, the Preprint Server for Health Sciences, Available from https://www.medrxiv.org.
    [62] BioRxiv, the Preprint Server for Biology, Available from https://www.biorxiv.org.
    [63] E. Soares, P. Angelov, S. Biaso, M. H. Froes, D. K. Abe, SARS-COV-2 CT-scan dataset: A large dataset of real patients CT scans for SARS-COV-2 identification, preprint, medRxiv: 2020.04.24.20078584.
    [64] K. Zhang, X. Liu, J. Shen, Z. Li, Y. Sang, X. Wu, et al., Clinically applicable AI system for accurate diagnosis, quantitative measurements, and prognosis of COVID-19 pneumonia using computed tomography, Cell, 181 (2020), 1423-1433. doi: 10.1016/j.cell.2020.04.045
    [65] S. A. Duzgun, G. Durhan, F. B. Demirkazik, M. G. Akpinar, O. M. Ariyurek, COVID-19 pneumonia: The great radiological mimicker, Insights Imaging, 11 (2020), 118-118. doi: 10.1186/s13244-020-00933-z
    [66] A. Krizhevsky, I. Sutskever, G. Hinton, Imagenet classification with deep convolutional neural networks, Commun. ACM, 60 (2017), 84-90. doi: 10.1145/3065386
    [67] S. K. Wajid, A. Hussain, K. Huang, W. Boulila, Lung cancer detection using local energy-based shape histogram (LESH) feature extraction and cognitive machine learning techniques, in 2016 IEEE 15th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC), (2016), 359-366.
    [68] R. Sarkar, A. Hazra, K. Sadhu, P. Ghosh, A novel method for pneumonia diagnosis from chest X-ray images using deep residual learning with separable convolutional networks, in Computer Vision and Machine Intelligence in Medical Image Analysis, Springer, (2019), 1-12.
    [69] S. Marcel, Y. Rodriguez, Torchvision the machine-vision package of Torch, in Proceedings of the 18th ACM International Conference on Multimedia, (2010), 1485–1488.
    [70] G. Huang, Z. Liu, L. Van Der Maaten, K. Q. Weinberger, Densely connected convolutional networks, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 2261-2269.
    [71] J. Irvin, P. Rajpurkar, M. Ko, Y. Yu, S. Ciurea-Ilcus, C. Chute, et al., CheXpert: A large chest radiograph dataset with uncertainty labels and expert comparison, preprint, arXiv: 1901.07031.
    [72] H. Pham, T. Le, D. Ngo, D. Tran, H. Nguyen, Interpreting chest X-rays via CNNs that exploit hierarchical disease dependencies and uncertainty labels, Neurocomputing, 437 (2021), 186-194. doi: 10.1016/j.neucom.2020.03.127
    [73] I. Allaouzi, M. Ben Ahmed, A novel approach for multi-label chest X-ray classification of common thorax diseases, IEEE Access, 7 (2019), 64279-64288. doi: 10.1109/ACCESS.2019.2916849
    [74] H. Wang, S. Wang, Z. Qin, Y. Zhang, R. Li, Y. Xia, Triple attention learning for classification of 14 thoracic diseases using chest radiography, Med. Image Anal., 67 (2021), 64279-64288.
    [75] M. A. Morid, A. Borjali, G. Del Fiol, A scoping review of transfer learning research on medical image analysis using ImageNet, Comput. Biol. Med., 128 (2021).
    [76] Pytorch.org, Transfer Learning for Computer Vision Tutorial. Available from https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html.
    [77] D. Kingma, J. Ba, ADAM: A method for stochastic optimization, preprint. arXiv: 1412.6980.
    [78] L. Prechelt, Early stopping-but when?, in Lecture Notes in Computer Science, Springer Berlin, (2012), 53-67.
    [79] M. Horry, S. Chakraborty, M. Paul, A. Ulhaq, B. Pradhan, M. Saha, et al., COVID-19 detection through transfer learning using multimodal imaging data, IEEE Access, 8 (2020), 149808-149824. doi: 10.1109/ACCESS.2020.3016780
    [80] T. C. Kwee, R. M. Kwee, Chest CT in COVID-19: What the radiologist needs to know, Radiographics, 40 (2020), 1848-1865. doi: 10.1148/rg.2020200159
    [81] J. L. Lehr, P. Capek, Histogram equalization of CT images, Radiology, 154 (1985), 163-169. doi: 10.1148/radiology.154.1.3964935
    [82] O. Ronneberger, P. Fischer, T. Brox, U-Net: Convolutional networks for biomedical image segmentation, in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015, Springer Verlag, (2015), 234-241.
  • This article has been cited by:

    1. Yesol Kim, Mihui Kim, Yeonju Kim, Mona Choi, Using nursing data for machine learning-based prediction modeling in intensive care units: A scoping review, 2025, 00207489, 105133, 10.1016/j.ijnurstu.2025.105133
  • Reader Comments
  • © 2021 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(4350) PDF downloads(131) Cited by(6)

Figures and Tables

Figures(8)  /  Tables(15)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog