Loading [MathJax]/jax/output/SVG/jax.js
Research article Special Issues

Detection of cigarette appearance defects based on improved YOLOv4

  • Received: 03 November 2022 Revised: 10 December 2022 Accepted: 26 December 2022 Published: 10 January 2023
  • Citation: Guowu Yuan, Jiancheng Liu, Hongyu Liu, Yihai Ma, Hao Wu, Hao Zhou. Detection of cigarette appearance defects based on improved YOLOv4[J]. Electronic Research Archive, 2023, 31(3): 1344-1364. doi: 10.3934/era.2023069

    Related Papers:

    [1] Xiaohong Tian, Rui Xu, Jiazhe Lin . Mathematical analysis of an age-structured HIV-1 infection model with CTL immune response. Mathematical Biosciences and Engineering, 2019, 16(6): 7850-7882. doi: 10.3934/mbe.2019395
    [2] Yan Wang, Minmin Lu, Daqing Jiang . Viral dynamics of a latent HIV infection model with Beddington-DeAngelis incidence function, B-cell immune response and multiple delays. Mathematical Biosciences and Engineering, 2021, 18(1): 274-299. doi: 10.3934/mbe.2021014
    [3] Patrick W. Nelson, Michael A. Gilchrist, Daniel Coombs, James M. Hyman, Alan S. Perelson . An Age-Structured Model of HIV Infection that Allows for Variations in the Production Rate of Viral Particles and the Death Rate of Productively Infected Cells. Mathematical Biosciences and Engineering, 2004, 1(2): 267-288. doi: 10.3934/mbe.2004.1.267
    [4] A. M. Elaiw, N. H. AlShamrani . Stability of HTLV/HIV dual infection model with mitosis and latency. Mathematical Biosciences and Engineering, 2021, 18(2): 1077-1120. doi: 10.3934/mbe.2021059
    [5] A. M. Elaiw, N. H. AlShamrani . Analysis of an HTLV/HIV dual infection model with diffusion. Mathematical Biosciences and Engineering, 2021, 18(6): 9430-9473. doi: 10.3934/mbe.2021464
    [6] Jinliang Wang, Ran Zhang, Toshikazu Kuniya . A note on dynamics of an age-of-infection cholera model. Mathematical Biosciences and Engineering, 2016, 13(1): 227-247. doi: 10.3934/mbe.2016.13.227
    [7] Jinliang Wang, Jingmei Pang, Toshikazu Kuniya . A note on global stability for malaria infections model with latencies. Mathematical Biosciences and Engineering, 2014, 11(4): 995-1001. doi: 10.3934/mbe.2014.11.995
    [8] Yicang Zhou, Yiming Shao, Yuhua Ruan, Jianqing Xu, Zhien Ma, Changlin Mei, Jianhong Wu . Modeling and prediction of HIV in China: transmission rates structured by infection ages. Mathematical Biosciences and Engineering, 2008, 5(2): 403-418. doi: 10.3934/mbe.2008.5.403
    [9] Lu Gao, Yuanshun Tan, Jin Yang, Changcheng Xiang . Dynamic analysis of an age structure model for oncolytic virus therapy. Mathematical Biosciences and Engineering, 2023, 20(2): 3301-3323. doi: 10.3934/mbe.2023155
    [10] Yu Yang, Shigui Ruan, Dongmei Xiao . Global stability of an age-structured virus dynamics model with Beddington-DeAngelis infection function. Mathematical Biosciences and Engineering, 2015, 12(4): 859-877. doi: 10.3934/mbe.2015.12.859


  • Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and influenza viruses cause COVID-19 and influenza diseases, respectively, and mainly infect the upper and lower respiratory tract[1,2]. Both infections present similar primary symptoms such as cough, fever, sore throat, runny or stuffy nose, tiredness, and body aches [3,4]. Early on in infection this can lead to a clinical dilemma in diagnosis [5,6,7]. Recently, COVID-19 has, through its worse overall decompensation due to its intensive transmission and vascular effects, caused an unrivaled global crisis [8,9,10,11]. As the globe moves to endemicity, as the striking COVID-19 outbreak continues, the concurrence of COVID-19 and influenza epidemics is impending. This motivates the current study, to design a data analysis tool that can accurately differentiate between these two infections.

    One way to rapidly classify patients with influenza or COVID-19 could be through machine learning approaches. Preliminary investigation illustrated the potentials of machine-learning models for accurately distinguishing between these two viral infections, using demographics, body mass index, and vital signs in infected patients [8]. Herein, we used a simple ML-based classification to identify patients with influenza or SARS-CoV-2 based on the main features of the within-host viral dynamics and the immune response. During the past decade, in-host mathematical modelling has become an increasingly powerful tool to study inter and intracellular viral infection and the ensuing immune response [12]. Such mathematical models can deepen our understanding of virus spread within organs leading to antiviral drug inventions and optimized treatment regimens. Furthermore, using the mechanistic model to generate synthetic patient data for various infections can help us to mitigate difficulties related to clinical data analyses, such as time-inconsistent data sets that can cause biased results [13].

    Recent studies have suggested that that artificial intelligence (AI) and machine learning (ML) methods can perform as well as or even better than humans at significant healthcare tasks, such as diagnosing disease [14,15,16,17]. We apply a basic mathematical model on the cellular scale (the so-called target cell-limited model [18,19]) fit to two different sets of in vivo data for COVID-19 and influenza infections, to create virtual patient cohorts. Using our multi-class classifier, the patients are differentiated between the two infections. This is conducted over the entire infection and for early time-points only. Herein, we show that, with just some important in-host measurements, our method is able to discern which virus has infected a patient with a high degree of certainty. Such results can lead to the development of rapid diagnostic tests in future to aid in early patient diagnosis They can also be used in clinical trials of new therapeutics and vaccines to determine the need for new participant enrolments, the number of measurements needed from each participant, and what would be best to measure to show if a new vaccine or therapeutic is effective [20,21,22].

    This paper is organized as follows: In section 2, through subsection 2.1, we discuss the in-host mathematical modelling of influenza and COVID-19 and parameter estimation. In subsection 2.2, we use the mechanistic model to generate synthetic patient data. In subsections 2.3 we study developing and evaluating a supervised machine learning method to discriminate the patients with different infections. The Interpretability of the developed model is discussed in subsection 2.4. The results of the prediction are presented in section 3 through subsection 3.1. Subsection 3.2 discusses the importance of the data features and determines the dominant features. The paper concludes with a discussion in Section 4.

    We employed a target-cell limited model of viral dynamics using five differential equations that track susceptible target cells (T), infected cells in the eclipse phase (I1), productively infected cells (I2), virus (V), and interferon (F) in-host. Figure 1 presents a flow diagram of the model. The system of ordinary differential equations is as follows:

    dTdt=βTVϕTF (2.1a)
    dI1dt=βTVkI1 (2.1b)
    dI2dt=kI1δI2 (2.1c)
    dVdt=pI2cV (2.1d)
    dFdt=qI2dF (2.1e)
    Figure 1.  Schematic of viral infection. Each Target cell, T, is infected by a virus, V, with a constant rate β. During the eclipse period the productively infected cell, I2, is being produced by the first infected cell, I1, with a constant rate k. The Infected cell, I2, produces virus at rate p, IFNI at rate q and dies at rate δ per cell. IFNI hinders viral infection by converting target cells to a virus-resistant state with a constant rate ϕ and decays with rate d. Free virus particles that can be influenza or coronaviruses are cleared at per-capita rate c.

    Briefly, virus particles V can infect susceptible target cells T to produce infected cells. This is represented by the term βTV. Newly infected cells first enter the eclipse phase I1 and become productively infected cells I2 when within-cell processes that program the cell to make new virus particles are completed. The eclipse phase takes, on average, 1/k time units. Productively infected cells produce new virus particles with a rate of p, and the virus particles are cleared from the system with a rate of c. We assumed that productively infected target cells have a death rate δ. Susceptible target cells can be protected from infection by Type I interferon (IFNI), F. Type I interferons protect neighboring cells from infection and elicit an immune response [23,24]. They are central to combating different virus infections and are regularly measured in clinical trials or infection studies in humans and animals [25]. We assumed that interferon production is proportional to the number of productively infected cells [18,19,26,27], that interferon has a natural decay rate d, and that interferon protects susceptible cells by removing them from the susceptible target cells population, with a rate ϕF. This term was ignored in [18] for influenza infection. The model described by Eq. 2.1 was used in [18] and [19] to examine the kinetics of influenza A and SARS-CoV-2 viral dynamics, respectively. For the sake of simplicity, we have ignored a half-day lag in IFNI response that was considered in [18].

    Model parameters for influenza A infection were fit to data from an experimental H1N1 influenza A/Hong Kong/123/77 infection for six patients [18] and for SARS-CoV-2 from thirteen untreated patients infected with severe acute respiratory syndrome-coronavirus [19]. The geometric average parameter values along with their 95% confidence intervals and units are summarized in Table 1. We assumed that the initial number of target cells, T0, is equal to the total number of target cells in the upper respiratory tract and set T0=4×108 cells. In [19] the authors considered that the target cells distributed in a volume of 30 mL. Assuming that 1% of these cells expresses the angiotensin-converting enzyme 2 (ACE2) as a receptor for SARS-CoV-2, the target cell concentration, T0, was expressed as 1.33×105 cell/ml. Model variables with initial values were estimated as in Table 2.

    Table 1.  Average values and confidence intervals, CI, for influenza A and SARS-CoV-2 within-host viral infection model parameters. Confidence levels of 95% display the degree of certainty around the mean for each parameter value.
    Influenza Model Parameters [18]
    V0[95%CI] R0 β[95%CI] k[95%CI] p[95%CI] c[95%CI] δ[95%CI] q d ϕ
    TCID50/ml1 (TCID50/ml)1d1 d1 (TCID50/ml)d1 d1 d1 d1 d1 d1cell1
    [0.2cm] 0.075[7.6E4,7.5] 21.5[10.1-46.1] 3.2E5[6E6,1.7E4] 4[3,5.2] 0.046[0.012,0.17] 5.2[3.18.7] 5.2[3.28.6] 1 1.9[23,28,29] 0
    SD:3.5724 SD:17.15 SD:7.8124 SD:1.0486 SD:0.07527 SD:2.6677 SD:2.5724
    COVID-19 Model Parameters[19]
    V0 R095%[CI] β k p95%[CI] c δ95%[CI] q d ϕ
    [0.3cm] Copies/ml (Copies/ml)1d1 d1 (Copies/ml)d1 d1 d1 d1 d1 d1cell1
    0.1 8.6[1.917.6] 5.68E9 3 22.71[059.64] 10 0.6[0.220.97] 1 0.4 1.97E-6 [30]
    SD:12.9893 SD:49.3426 SD:0.62051
    1 [TCID50/ml] corresponds to 4000[Copies/ml] [31].
    2 R0 is the basic reproduction number.

     | Show Table
    DownLoad: CSV
    Table 2.  Model Variables with Initial values.
    Variable Definition Initial Value Unit
    T Target cell 4E+8 Cell
    I1 Infected cell (eclipse phase) 0 Cell
    I2 Productively infected cell 0 Cell
    V Viral load (flu) 7.5E-2 TCID50/ml
    Viral load(COVID-19) 0.1 Copies/ml
    F type I interferon (IFNI) 0 Interferon

     | Show Table
    DownLoad: CSV

    To generate a cohort of virtual patients, we followed a technique similar to the one used in [24]. Each patient is distinguished by five different in-host measurements, {T,I1,I2,V,F}, that are the solutions of Equation 2.1 for different sets of model parameters. Initial parameter sets representing individual virtual patients were drawn from normal distributions with means fixed to the corresponding parameter value in Table 1 and standard deviations derived from confidence interval measurements. Standard deviations were obtained from standard errors, confidence intervals, and t statistics which measure the size of the difference relative to the variation in the sample data. For each parameter value, the standard deviation was obtained by dividing the length of the confidence interval by standard errors width (2×tvalue) and then multiplying by the square root of the sample size as follows

    SD=N×SE=N×(upperlimitlowerlimit)/(2×tvalue) (2.2)

    Standard errors must be of means calculated from within each parameter confidence interval. The tvalue for a 95% confidence interval from a sample size of N was then obtained in Microsoft Excel using the tinv function (i.e. tinv(10.95,N1)). From [18], the sample size for the influenza cohort is 6 patients infected by H1N1 influenza A/Hong Kong/123/77 infection. The COVID-19 cohort consisted of 13 untreated patients infected with severe acute Respiratory syndrome-coronavirus2 [19]. Therefore, the tvalue for influenza patients is 2.571 and for COVID-10 patients is about 2.179. From normal distributions with standard deviations, σ, and means, μ, as the original parameter values, we then generated normal distributions covering values lying around each parameter value such that |μ±σμ|<h. Herein, the parameter h is the user-defined value as a measure of data diversity. In the other words, the bigger the parameter h, the more diverse the synthetic data. Accordingly, the external noise can affect the data through the parameter h. The dynamics of 100 virtual patients from each cohort are shown in Figure 2. The diversity of patient data is mainly reflected in various viral load levels to agree with prior studies that different viral load is associated with the severity of diseases or different factors such as age or sex of the patients [32].

    Figure 2.  Cohort Dynamics. One hundred virtual patients are generated with different features of Target cells, infected/productively infected cells, viral load, and the only immune factor type I interferon for Influenza (upper two rows) and COVID-19 (lower two rows). Each solid curve with a different color represents a patient. The insets are in log scale.

    Generating data with time consistency for different cohorts of infections is of great importance. Data inconsistency can lead to loss of information or biased results. Since the influenza mechanistic model predicts faster clearance of influenza-infected cells than SARS-CoV-2 [19], the infection period for influenza and COVID-19 patient dynamics are not the same, see Figure 2. Therefore we limited the consistency of flu/COVID-19 cohorts to have the same number of data points during the infection time. Hereupon, as an example, we divided the main infection period (i.e., [1,2,3,4,5,6] days for influenza patients and [10,11,12,13,14,15,16,17,18,19,20] days for COVID-19 patients) into ten different sub-intervals with half-day length time steps for influenza patients and one and half-day length time steps for COVID-19 patients (see Figure 3). Hence, despite having different infection periods and time steps with different lengths to report the new virtual data point, the total number of data for the two different cohorts was the same.

    Figure 3.  Consistency of the number of virtual data points during the time of infection. Dashed cross blue lines show eleven-time points of an influenza or COVID-19 patient.

    In addition to the total infection period, we were also interested in studying the viral load dynamics in the early period of infection. The median incubation period for influenza A(B) virus is estimated to be 1.4(0.6) days, and for SARS-CoV-2 is around 56 days [33]. Therefore, we assumed the time interval [0.9,1.3] days for influenza, and [56.5] days for COVID-19 cohorts, corresponding to [102104]Copies/ml viral load. Dividing each interval into three different sub-intervals to get the time steps with length one-sixth of a day for Influenza and half a day for COVID-19 patients, we had four consistent data points for each patient.

    To distinguish between patients who encounter COVID-19 from those who are exposed to influenza, we developed a predictive model based on some biological feature selections. Accordingly, we adopted Logistic regression with 1-regularization, referred to Lasso (stands for least absolute shrinkage and selection operator) Regression, as an appropriate technical classification. Lasso regression is widely used for many supervised classification problems based on the concept of probability [34]. It can simplify the model complexity by removing irrelevant features of the data set. Recently, this algorithm was used by Han and et al. to find some additional novel immune features that accurately identified patients before the clinical diagnosis of preeclampsia [35].

    Logistic regression, which is a special case of linear regression and used for binary classification, is defined by the following sigmoid function

    h(X)=11+e(β0+βX) (2.3)

    in which X is the (n×p) model feature matrix of n=100 patients and p=5 biological hallmarks. Defining the cost/objective (C) function of logistic regression in mean squared error format leads to a non-convexity that makes it difficult to optimally converge. Therefore, it is represented by the following equations

    C(h(X),Y)={log(h(X)),if y=1log(1h(X)),if y=0 (2.4)

    where Y is a binary response vector of outcome (CVOID-19 vs flu). Compressing the above two equations inside a single function, we have

    J(X)=1nni=1[yilog(h(xi)+(1yi)log(1h(xi))] (2.5)

    Replacing the sigmoid function from equation (2.3) and applying a penalty term equal to the absolute value of the magnitude of coefficients, we can reach the following objective function (after doing some mathematical simplifications) [35]

    J(X)=[1nni=1yi(β0+βTxi)log(1+eβ0+βTxi)]+α||β||,α>0 (2.6)

    The penalty term which is called the 1-regularization term is added to prevent data over-fitting. The model objective is to find a specific solution with a best-optimized cost function.

    For model training and testing, we developed a K-fold cross-validation strategy, which is a re-sampling method to evaluate machine learning models on a limited data sample. The procedure has a single parameter called K which displays the number of groups that a given data sample is to be split into. As such, the procedure is often called K-fold cross-validation. Therefore, our regression model is not tailored to a particular data set and is exposed to all available samples of a given subject in the training set. This approach implies that the training procedure was entirely blinded to the synthetic patient data sets, and ensures the presumed independence from any intra-subject correlations that are required for Lasso classification. We fixed the number of folds of the data as K=5. Running the analysis on each fold, the predicted outcome will be the one with the least estimated prediction error. The regularization parameter α is estimated by a cross-validation procedure.

    The discriminating ability of the developed model in predicting patients with influenza from COVID-19 was evaluated using AUC (Area Under The Curve) ROC (Receiver Operating Characteristics) curve analysis. AUC - ROC curve is one of the most important evaluation metrics to visualize the performance of multi-class classification problems. ROC represents a probability curve of sensitivity (true positive rate = TPTP+FN) against 1-specificity (false positive rate = FPFP+TN) and AUC is a performance measure of discrimination. In the other words, the AUC score is a criterion that explains how well the model is capable of discerning different cohorts. Generally, an AUC closer to 1 indicates a better overall diagnostic performance of influenza classes as influenza or COVID-19 to COVID-19.

    From [36,37], "Interpretability" is the degree to which a human can understand the cause of a decision and consistently predict the model's result. The higher the interpretability of a machine learning model, the better understanding of why certain predictions have been made. Interpretable machine learning models are beneficial to extract the relevant knowledge from relationships either contained in data or learned by the model [38,39].

    Here, we looked at the regularization path which is a plot of all coefficients values against the values of α in-1 penalization term, to see the behavior of the Lasso regression and interpret the prediction outcomes. The main purpose of Lasso regression is to classify groups of data by providing feature coefficients that can select the important features and maintain model regularization to avoid over-fitting the data. Therefore, the Lasso path can give us an idea of the feature's importance.

    In this study we developed a classifier in the Lasso framework to identify patients with either influenza or COVID-19, based on four major entities of viral dynamics, {T(t),I1(t),I2(t),V(t)}, and one main factor of host immune response, type I interferon (F(t)), as the entry data features. The model was trained on data from 100 virtual patient-level data sets in each infection cohort without noise, and it was externally validated on testing sets with demographic noise (reflected in diverse viral load levels). Results in Figures 4, 5 and 6 reflect the Lasso predictions using the entire infection period (see Section 2.2.1). In Figures 4 and 5, two-dimensional scatter plots are used to compare ground truth to regression predicted values based on all model features. The hue spectrum from light to dark illustrates the probability of being in the influenza (blue) or COVID-19 (red) group. In the other words, the darker the colors, the better the prediction. Considering three attributes in the data, the predicted outcomes are improved. This is shown in three-dimensional scatter plots in Figure 6 of the ground truth and regression predicted values. ROC AUC = 95% indicates a satisfactory performance of the model to distinguish between COVID-19 and influenza patients. We note that our analysis was also completed on data from 1000 virtual patients, and a similar result was obtained, ROC AUC = 93%. See Figure 7 for more details.

    Figure 4.  Two-dimensional scatter plots of ground truth and regression predicted values based on model features. Classification of the data was done for: I2 versus I1 in panels (a), T vs. I1 in panels (b), V vs. I1 in panels (c), F vs. I1 in panels (d), T vs. I2 in panels (e), V versus I2 in panels (f), and F versus I2 in panel (g). Color denotes the patient probability of being in the influenza (blue color scheme) or COVID-19 (red color scheme) cohorts. Data points, corresponding to each model feature, are rescaled by dividing by their standard deviations.
    Figure 5.  Two-dimensional scatter plots of the ground truth and regression predicted values for three model features T,V,F. Classification of the data was done based on: T versus V in panels (a), T versus F in panels (b), and F versus V in panels (c). Color denotes the patient probability of being in the influenza (blue color scheme) or COVID-19 (Red color scheme) cohorts.
    Figure 6.  Three-dimensional scatter plots of the ground truth and regression predicted values based on all model features. Classification is based on I1,I2,T in panels (a), I1,I2,V in panels (b), I1,I2,F in panels (c), T,V,F in panels (d), I1,T,V in panels (e), I2,T,V in panels (f), I1,T,F in panels (g), I2,T,F in panels (h), I1,F,V in panels (i), and I2,F,V in panels (j). Shades of blue (red) indicate influenza (COVID-19) group patients. Data points are dimensionless by dividing by the corresponding standard deviations.
    Figure 7.  Receiver Operating Characteristic curve (ROC) of influenza vs COVID-19 patients. The area under the ROC curve indicates the predictive performance of the model between COVID-19 and influenza encounters on the external validation test for 100 patients from each cohort during the main (blue curve)/ early days (orange curve) of infection period, and for 1000 patients during the main infection (purple curve). The black dashed line in the diagonal has a ROC AUC of 0.5.

    We examined the model prediction for the data generated at the early days of infection after the incubation period. The results are shown in Figure 8 based on the model features. From the figure, we can see that there are some mispredictions, for small values of I1(t),I2(t),V(t), and F(t), especially when I2(t) is plotted as a function of I1(t) or V(t) is plotted in terms of I2(t). In the other words, for this range of values, the influenza patients were misdiagnosed with COVID-19. In an attempt to find the reason, we compared correlations between the different variables in our model. See Figure 9. Here, we see small regions of overlap between influenza and COVID-19 models. Accordingly, the compatibility of the results between the two infections may lead to some overlaps in the model predictions. However, the ability of the model in the prediction of infections when the patients were monitored by V(t)/F(t) as a function of I1(t), panels (b) and (c), or F(t) in terms of I2(t)/V(t), panels (e) and (f), can be satisfactory, and thus can serve as benchmarks for clinical diagnosis. The model had a ROC AUC of 91% on the external validation data set for early infection – see Figure 7.

    Figure 8.  Early days of infection. Two-dimensional scatter plots of the ground truth and regression predicted values based on model features are shown. Classification is based on I1,I2 in panels (a), I1,V in panels (b), I1,F in panels (c), I2,V in panels (d), I2,F in panels (e), and V,F in panels (f). Shades of blue (red) indicate influenza (COVID-19) group patients. Data points are dimensionless by dividing by the corresponding standard deviations.
    Figure 9.  Comparison of in-host measurements, {T,I1,I2,V,F}, between influenza and COVID-19 virtual patients where plotted as a function of each other. Blue(red) solid lines represent the ratio of the features for one hundred influenza (COVID-19) patients. Data points are divided by the corresponding standard deviations for each feature.

    To investigate the importance of various data features we created our 1-regularization path, which was the best way to see the behavior of the Lasso regression. The regularization path is a plot of all coefficient values in terms of the regularization parameter. Figure 10 illustrates the selection path of each feature with its corresponding coefficient in terms of the logarithm of the regularization parameter α. For each value of α, the path method on the Lasso object returns the coefficients that solve the logistic regression problem with that parameter value. The optimal value of log(α) was estimated at around 3.25 for the test set distributed over the entire infection course, and 3.04 when the early days of infection were studied. The results suggested a higher coefficient value for viral load V(t) and productively infected cells I2(t) compared to the other features. In the same analysis on 1000 virtual patient data sets, the viral load had the predominant identifying role.

    Figure 10.  Lasso coefficients of five sample features, {T,I1,I2,V,F}, as a function of the logarithm of regularization parameter, logα. Each colored line represents the value taken by a different coefficient in the optimization objective for Lasso. The black dashed line indicates the selected regularization parameter with the value of log(α)3.25. This number was 3.04 with the same Lasso Paths when the early days of the infection period were considered.

    This study presents a machine learning model to effectively classify influenza and COVID-19 virtual patients using in-host patient data. Our model employed a Lasso regression classifier trained to identify between two hundred patients, highlighted by a ROC AUC of 95%. Using within-host model structures from the literature, we generated synthetic data with five in-host measurements including target cells, eclipse phase, and productively infected cells, viral load, and type I IFN. Analyzing the feature importance revealed that the viral load and the productively infected cells are the most important components to determine if a patient is infected by influenza or SARS-CoV-2.

    While our machine learning model was built on synthetic data distributed during the main infection period, it ascertained a good performance (ROC AUC = 91%) even for the early days of infection after the incubation period. However, in early infection, there were some exceptions for the small values of in-host features where the influenza patients were misdiagnosed as COVID-19. The reason was explained by the fact that during the early days of infection, influenza and COVID-19 patients have comparable in-host measurements that lead to some errors in discriminating the patients. This is interpreted as a limitation of our model even though the ROC AUC was still very high. A future extension of our work here will be in developing dynamic models which take more immune entities into account and end in a better classifier.

    Our model was trained and successfully evaluated on synthetic data. The model, however, could be applied to animal or human clinical data. This could be useful, for example, if a clinical trial is complicated by the existence of an infectious disease with similar infection characteristics. The model could be applied as a low-cost classification system that would not require expensive virus typing procedures and could rely solely on viral load and interferon measurements. Additionally, the AI/ML method could be applied to determine when a clinical trial using a continuous enrollment design has accumulated sufficient data to determine whether a new pharmaceutical is effective. We note that studies like [8] that focus analysis on demographic and observational data can be cheaper to conduct than a study requiring viral load or immune system measurements, but these data can also be subject to inconsistencies and bias, affecting classification outcomes. In a future study, we will expand our analysis to a model of in-host measurements and observational data to determine if specific combinations of in-host and observational data that best classify influenza and COVID-19 infections differ.

    Fourteen different AI/ML techniques in disease predictions were reviewed in [40]. Quiroz-Juárez and et al. developed an effective machine-learning algorithm for the identification of high-risk COVID-19 patients [41]. Some AI approaches that have significant contributions in the fields of health care were presented in [42] and their applications in confronting COVID-19, such as diagnosis and drug development were studied. Salehi and et al. studied machine and deep learning-based architectures performance for classification of coronavirus images such as X-ray and computed tomography [20]. Our machine learning model was developed in the Lasso framework. Ridge regression or Partial least squares discriminant analysis (PLS-DA) also can be employed, and require only small changes to our method to include this. The model demonstrated a satisfactory performance by using either Ridge or PLS regression – (ROC AUC = 95%) for the main infection period and –(ROC AUC = 89%) for the early days of infection.

    We acknowledge the support of the Natural Sciences and Engineering Research Council of Canada (NSERC) and the NRC Pandemic Response Challenge Program Grant No. PR016-1.

    The authors declare there is no conflict of interest.



    [1] A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with deep convolutional neural networks, Commun. ACM, 60 (2017), 84–90. https://doi.org/10.1145/3065386 doi: 10.1145/3065386
    [2] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, preprint, arXiv: 1409.1556.
    [3] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 770–778. https://doi.org/10.1109/CVPR.2016.90
    [4] Q. Q. Gao, B. C. Huang, W. Z. Liu, T. Tong, Detection method of bamboo strip surface defects based on improved CenterNet, J. Comput. Appl., 31 (2020), 1–8. https://doi.org/10.11772/j.issn.1001-9081.2020081167 doi: 10.11772/j.issn.1001-9081.2020081167
    [5] Y. Y. Liu, Research on cloth defect detection method based on deep learning, Master thesis, Harbin Institute of Technology in Harbin, 2020.
    [6] G. X. Ding, H. Huang, Y. Ma, Automatic detection of cloth defects based on laws texture filtering, in Proceedings of 2019 2nd International Conference on Intelligent Systems Research and Mechatronics Engineering (ISRME 2019), (2019), 148–152.
    [7] X. P. Kou, S. J. Liu, Z. R. Ma, Steel strip defect detection method based on Faster-RCNN, China Metall., 31 (2021), 77–83. https://doi.org/10.13228/j.boyuan.issn1006-9356.20200506 doi: 10.13228/j.boyuan.issn1006-9356.20200506
    [8] Q. Xu, H. J. Zhu, H. H. Fan, H. Y. Zhou, G. H. Yu, Study on detection of steel plate surface defects by improved YOLOv3 network, Comput. Eng. Appl., 56 (2020), 265–272. https://doi.org/10.3778/j.issn.1002-8331.2003-0232 doi: 10.3778/j.issn.1002-8331.2003-0232
    [9] M. O. Lawal, Tomato detection based on modified YOLOv3 framework. Sci. Rep., 11 (2021), 1447. https://doi.org/10.1038/s41598-021-81216-5 doi: 10.1038/s41598-021-81216-5
    [10] A. M. Roy, R. Bose, J. Bhaduri, A fast accurate fine-grain object detection model based on YOLOv4 deep neural network, Neural Comput Appl., 34 (2022), 3895–3921. https://doi.org/10.1007/s00521-021-06651-x doi: 10.1007/s00521-021-06651-x
    [11] A. M. Roy, J. Bhaduri, Real-time growth stage detection model for high degree of occultation using DenseNet-fused YOLOv4, Comput. Electron. Agr., 193 (2022), 106694. https://doi.org/10.1016/j.compag.2022.106694 doi: 10.1016/j.compag.2022.106694
    [12] A. M. Roy, J. Bhaduri. A deep learning enabled multi-class plant disease detection model based on computer vision, AI, 2 (2021), 413–428. https://doi.org/10.3390/ai2030026 doi: 10.3390/ai2030026
    [13] Z. Y. Xiao, Research and Implementation of Cigarette Defect Detection Algorithm, Master Thesis, Yunnan University in Kunming, 2018.
    [14] J. Li, H. H. Lu, X. Wang, J. H. Hong, S. Wang, L. X. Shen, et al., Online inspection system for cigarette tipping quality based on machine vision, Tob. Sci. Technol., 52 (2019), 109–114. https://doi.org/10.16135/j.issn1002-0861.2018.0562 doi: 10.16135/j.issn1002-0861.2018.0562
    [15] G. W. Yuan, J. C. Liu, H. Y. Liu, R. Qu, H. Zhou, Classification of cigarette appearance defects based on ResNeSt, J. Yunnan Univ.: Nat. Sci. Ed., 44 (2022), 464–470. https://doi.org/10.7540/j.ynu.20210257 doi: 10.7540/j.ynu.20210257
    [16] H. Y. Liu, G. W. Yuan, Cigarette appearance defect detection method based on improved YOLOv5s, Comput. Technol. Dev., 32 (2022), 161–167. https://doi.org/10.3969/j.issn.1673-629X.2022.08.026 doi: 10.3969/j.issn.1673-629X.2022.08.026
    [17] H. Y. Liu, G. W. Yuan, L. Yang, K. X. Liu, H. Zhou, An appearance defect detection method for cigarettes based on C-CenterNet, Electronics, 11 (2022), 2182. https://doi.org/10.3390/electronics11142182 doi: 10.3390/electronics11142182
    [18] J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: unified, real-time object detection, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 779–788, https://doi.org/10.1109/CVPR.2016.91
    [19] K. He, X. Zhang, S. Ren, J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition, IEEE Trans Pattern Anal. Mach. Intell., 37 (2015), 1904–1916. https://doi.org/10.1109/TPAMI.2015.2389824 doi: 10.1109/TPAMI.2015.2389824
    [20] S. Liu, L. Qi, H. Qin, J. Shi, J. Jia, Path aggregation network for instance segmentation, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2018), 8759–8768. https://doi.org/10.1109/CVPR.2018.00913
    [21] J. Hu, L. Shen, G. Sun. Squeeze-and-excitation networks, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2018), 7132–7141. https://doi.org/10.1109/CVPR.2018.00745
    [22] M. Yang, K. Yu, C. Zhang, Z. Li, K. Yang, DenseASPP for semantic segmentation in street scenes, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2018), 3684–3692. https://doi.org/10.1109/CVPR.2018.00388
    [23] F. Yu, V. Koltun., Multi-scale context aggregation by dilated convolutions, preprint, arXiv: 1511.07122.
    [24] Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, D. Ren, Distance-IoU loss: Faster and better learning for bounding box regression, in Proceedings of the AAAI Conference on Artificial Intelligence, (2020), 12993–13000. https://doi.org/10.1609/aaai.v34i07.6999
    [25] H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, S, Savarese, Generalized intersection over union: A metric and a loss for bounding box regression, in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), 658–666. https://doi.org/10.1109/CVPR.2019.00075
    [26] J. He, S. Erfani, X. Ma, J. Bailey, Y. Chi, X. Hua, α-IoU: A family of power intersection over union losses for bounding box regression, preprint, arXiv: 2110.13675
  • This article has been cited by:

    1. Cameron J. Browne, Xuejun Pan, Hongying Shu, Xiang-Sheng Wang, Resonance of Periodic Combination Antiviral Therapy and Intracellular Delays in Virus Model, 2020, 82, 0092-8240, 10.1007/s11538-020-00704-3
    2. Jahangir Alam, Ghulam Murtaza, Efstratios Tzirtzilakis, Mohammad Ferdows, Biomagnetic Fluid Flow and Heat Transfer Study of Blood with Gold Nanoparticles over a Stretching Sheet in the Presence of Magnetic Dipole, 2021, 6, 2311-5521, 113, 10.3390/fluids6030113
    3. Zizi Wang, Qian Zhang, Yong Luo, A general non-local delay model on oncolytic virus therapy, 2022, 102, 0307904X, 423, 10.1016/j.apm.2021.09.045
    4. Zhongzhong Xie, Xiuxiang Liu, Global dynamics in an age-structured HIV model with humoral immunity, 2021, 14, 1793-5245, 2150047, 10.1142/S1793524521500479
    5. Junfeng Yin, Xianqin Zhang, M. Israr Ur Rehman, Aamir Hamid, Thermal radiation aspect of bioconvection flow of magnetized Sisko nanofluid along a stretching cylinder with swimming microorganisms, 2022, 30, 2214157X, 101771, 10.1016/j.csite.2022.101771
    6. Chunyang Li, Xiu Dong, Jinliang Wang, Stability analysis of an age-structured viral infection model with latency, 2022, 2022, 1072-6691, 16, 10.58997/ejde.2022.16
    7. Abderrazak Nabti, Salih Djilali, Malek Belghit, Dynamics of a Double Age-Structured SEIRI Epidemic Model, 2025, 196, 0167-8019, 10.1007/s10440-025-00723-z
  • Reader Comments
  • © 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(2737) PDF downloads(177) Cited by(13)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog