Research article Special Issues

A discrete stochastic model of the COVID-19 outbreak: Forecast and control

  • Received: 22 February 2020 Accepted: 12 March 2020 Published: 16 March 2020
  • The novel Coronavirus (COVID-19) is spreading and has caused a large-scale infection in China since December 2019. This has led to a significant impact on the lives and economy in China and other countries. Here we develop a discrete-time stochastic epidemic model with binomial distributions to study the transmission of the disease. Model parameters are estimated on the basis of fitting to newly reported data from January 11 to February 13, 2020 in China. The estimates of the contact rate and the effective reproductive number support the efficiency of the control measures that have been implemented so far. Simulations show the newly confirmed cases will continue to decline and the total confirmed cases will reach the peak around the end of February of 2020 under the current control measures. The impact of the timing of returning to work is also evaluated on the disease transmission given different strength of protection and control measures.

    Citation: Sha He, Sanyi Tang, Libin Rong. A discrete stochastic model of the COVID-19 outbreak: Forecast and control[J]. Mathematical Biosciences and Engineering, 2020, 17(4): 2792-2804. doi: 10.3934/mbe.2020153

    Related Papers:

    [1] Yunqian Yu, Zhenliang Hao, Guojie Li, Yaqing Liu, Run Yang, Honghe Liu . Optimal search mapping among sensors in heterogeneous smart homes. Mathematical Biosciences and Engineering, 2023, 20(2): 1960-1980. doi: 10.3934/mbe.2023090
    [2] Ray-Ming Chen . Extracted features of national and continental daily biweekly growth rates of confirmed COVID-19 cases and deaths via Fourier analysis. Mathematical Biosciences and Engineering, 2021, 18(5): 6216-6238. doi: 10.3934/mbe.2021311
    [3] Sakorn Mekruksavanich, Anuchit Jitpattanakul . RNN-based deep learning for physical activity recognition using smartwatch sensors: A case study of simple and complex activity recognition. Mathematical Biosciences and Engineering, 2022, 19(6): 5671-5698. doi: 10.3934/mbe.2022265
    [4] Keruo Jiang, Zhen Huang, Xinyan Zhou, Chudong Tong, Minjie Zhu, Heshan Wang . Deep belief improved bidirectional LSTM for multivariate time series forecasting. Mathematical Biosciences and Engineering, 2023, 20(9): 16596-16627. doi: 10.3934/mbe.2023739
    [5] Xiaoguang Liu, Meng Chen, Tie Liang, Cunguang Lou, Hongrui Wang, Xiuling Liu . A lightweight double-channel depthwise separable convolutional neural network for multimodal fusion gait recognition. Mathematical Biosciences and Engineering, 2022, 19(2): 1195-1212. doi: 10.3934/mbe.2022055
    [6] Wajid Aziz, Lal Hussain, Ishtiaq Rasool Khan, Jalal S. Alowibdi, Monagi H. Alkinani . Machine learning based classification of normal, slow and fast walking by extracting multimodal features from stride interval time series. Mathematical Biosciences and Engineering, 2021, 18(1): 495-517. doi: 10.3934/mbe.2021027
    [7] Tianjun Lu, Xian Zhong, Luo Zhong, RuiqiLuo . A location-aware feature extraction algorithm for image recognition in mobile edge computing. Mathematical Biosciences and Engineering, 2019, 16(6): 6672-6682. doi: 10.3934/mbe.2019332
    [8] Shangbin Li, Yu Liu . Human motion recognition based on Nano-CMOS Image sensor. Mathematical Biosciences and Engineering, 2023, 20(6): 10135-10152. doi: 10.3934/mbe.2023444
    [9] Xihe Qiu, Xiaoyu Tan, Chenghao Wang, Shaotao Chen, Bin Du, Jingjing Huang . A long short-temory relation network for real-time prediction of patient-specific ventilator parameters. Mathematical Biosciences and Engineering, 2023, 20(8): 14756-14776. doi: 10.3934/mbe.2023660
    [10] Xiaoguang Liu, Yubo Wu, Meng Chen, Tie Liang, Fei Han, Xiuling Liu . A double-channel multiscale depthwise separable convolutional neural network for abnormal gait recognition. Mathematical Biosciences and Engineering, 2023, 20(5): 8049-8067. doi: 10.3934/mbe.2023349
  • The novel Coronavirus (COVID-19) is spreading and has caused a large-scale infection in China since December 2019. This has led to a significant impact on the lives and economy in China and other countries. Here we develop a discrete-time stochastic epidemic model with binomial distributions to study the transmission of the disease. Model parameters are estimated on the basis of fitting to newly reported data from January 11 to February 13, 2020 in China. The estimates of the contact rate and the effective reproductive number support the efficiency of the control measures that have been implemented so far. Simulations show the newly confirmed cases will continue to decline and the total confirmed cases will reach the peak around the end of February of 2020 under the current control measures. The impact of the timing of returning to work is also evaluated on the disease transmission given different strength of protection and control measures.



    Smart homes aim to provide a comfortable, convenient, and efficient living environment and effectively alleviate the impact of the functional decline [1,2]. Besides, smart home is also designed to improve energy management [3,4,5]. The premise of achieving the goal is to accurately recognize daily activities, which take place in smart homes. To achieve good activity recognition performance, several approaches have been proposed. Existing approaches focus on different stages of the process of activity recognition [6]. Some approaches focus on stages such as, segmenting sensor event streams [7,8,9,10], extracting and selecting daily activity features [11,12,13,14,15], and developing recognition models [16,17,18,19]. In this paper, the proposed approach focuses on extracting and selecting daily activity features.

    The primary task of extracting and selecting daily activity features is to establish a feature space and generate a sample space. Daily activity features are divided into temporal and sensor features. Temporal features include the start time, end time, and duration of the daily activity. For the sensor features, while some approaches take all smart home sensors as the feature space, others take the sets or sequences of frequency sensors as the feature space. For a given sensor feature and daily activity, most of the approaches take the frequency activated in the daily activity as the value of the sensor feature. The existing common practice for extracting daily activity features is discretizing sensor event streams. This widespread practice leads to the character loss of the time series of sensor event streams and limits the improvement of activity recognition performance.

    To utilize the character of the time series of sensor event streams to improve activity recognition performance, this paper proposes a novel approach for extracting daily activity features. As compared with existing approaches, the proposed approach achieves better activity recognition performance. The main contributions of this paper are as follows.

    (1) An algorithm that serves to extract time series data from sensor event streams is proposed.

    (2) Several common statistic formulas are proposed to establish an initial feature space.

    (3) A feature selection algorithm is employed to generate final daily activity features.

    (4) The proposed approach is evaluated on two common datasets. The experiment results show that the proposed approach achieves better performance than previous approaches for solving daily activity features.

    The rest of this paper is arranged as follows: First, related work is introduced; the proposed approach is then introduced; the proposed approach is validated and results discussed. Finally, we summarize our findings.

    Approaches for activity recognition in smart homes can be divided into knowledge-driven and data-driven approaches. For knowledge-driven approaches, an activity model is developed as a reusable context model, which associates objects, space, and time with activities. The knowledge driven model is semantically clear and follows an agreed indication. Logic language and ontology are the two most common models representing domain knowledge [20,21,22,23,24,25,26]. After the knowledge model is established, logical reasoning is employed to perform activity recognition. Knowledge-driven approaches are robust but face limitations in the case of uncertain data.

    Data-driven approaches adopt data mining and machine learning techniques to develop the activity recognition model. Conventional classification algorithms e. g. Naive Bayesian (NB) [27,28,29,30], Hidden Markov Model (HMM) [31,32], Dynamic Bayesian Network (DBN) [33], Support Vector Machine (SVM) [34], Conditional Random Field (CRF) [35], and Recursive Neural Network (RNN) [36] have been widely used in activity recognition tasks. Besides conventional classification algorithms, some specialized algorithms were invented. Wan et al. proposed a novel activity recognition model called COBRA. COBRA mixed the combined sliding window with a logistic regression model for near-real-time activity recognition [37]. To deal with the problem of class imbalance and improve the performance of the model, Medina-Quero et al. developed an integrated classifier based on long short-term memory (LSTM) to recognize daily activities [38].

    Besides classification algorithms, the fine features of daily activities are equally vital to activity recognition performance. Daily activity features can be divided into temporal and sensor features. The temporal features space usually includes the time when a daily activity starts, its duration, and when it ends. The sensor features space is generated directly or indirectly from the set of initial sensors. Liu et al. take the set of initial sensors as the sensor features space and a sensor as a daily activity feature [39]. Because the relationship between sensors is lost when a sensor is taken as a daily activity feature, daily activities which activate similar sensors are hard to differentiate. To improve activity recognition performance, the frequent items mining method [40], frequent periodic pattern mining method [41], and activity modelling based on a low-dimensional feature space [42] were proposed. In addition, Wen et al. and Nasreen et al. used the association rules mining method to mine frequent sensor combinations, to conduct activity modelling for the low-dimensional feature space [43,44]. Twomey proposed an unsupervised method to learn the topology structure of sensors in a smart home and mined effective combinations of sensor events as daily behaviour characteristics, according to the topology structure [45]. Yatbaz et al. [46] used a Scanpath Trend Analysis (STA) method to set a priority for sensors to obtain sensor combinations that represent daily activity features, to improve the evaluation standard of the model. Compared to the approach where a sensor is taken as a daily activity feature, these approaches consume more computing resources, even if activity recognition performance is slightly improved.

    For sensor features, truth value, frequency, and density of the activated sensors are the most common eigenvalues [47]. In addition, the term frequency-inverse document frequency (TF-IDF) formula [8], mutual information formula [48], deep learning technology [49], and differential representation between different activities were employed to compute these eigenvalues [50]. However, these above mentioned strategies for estimating the eigenvalues generate only shallow features, which are far from adequate when describing the nature of the time series of sensor event streams. This paper performs an in-depth feature mining on the time series data, which can preserve the essential information of the time itself. Consequently, the proposed approach can be extended to promote the user activity recognition model.

    In a smart home, different types of non-invasive sensors e.g. infrared motion sensors, and temperature sensors, are deployed in different parts of the house. When residents carry on daily activities e.g. sleeping, and bathing, corresponding sensor readings are generated accordingly. Figure 1 shows a sequence of sensors triggered by cooking breakfast as a daily activity. Each line denotes a sensor event. Each activated sensor is recorded as a sensor event se, denoted as a four tuple; se = (D, T, I, R). D and T are the date and the time when the se is generated, respectively; I is the identification of the activated sensor, and R is the sensor reading. For example, the sensor event shown in line 1 is generated at 07:58:39.655022 on 2011-06-15. The activated sensor is M007 with a reading ON.

    Figure 1.  Sequence of activated sensors by daily activity, cooking breakfast.

    Based on the nature time series of sensor events, 6 categories of common daily activity features are proposed in form of statistic formulas. Each statistic formula corresponds to a given time series. Throughout this section, T = < t1, t2, …, tn > denotes a time series, where tiis the ith time value of a given time series.

    (1) Mean: µ(T) returns the mean of T. µ(T) is defined in formula (1).

    μ(T)=tin (1)

    (2) Standard Deviation: σ(T) returns the standard deviation of T. σ(T) is defined in formula (2).

    σ(T)=(tiμ)2n1 (2)

    (3) Skewness: Skew(T) returns the skewness of T. Skew(T) is defined in formula (3).

    Skew(T)=μ3σ3 (3)

    (4) Slope: Slope(T) returns the slope of the linear least-squares regression for the values of T. Slope(T) is defined in formula (4).

    Slope(T)=slope(llsr(T)) (4)

    Where llsr returns the linear least-squares regression for the values of T.

    (5) Wave: Wave(T) returns the number of troughs and peaks and of T. Wave(T) is defined in formula (5).

    Wave(T)=peaks(T)+troughs(T) (5)

    Where peaks returns the number of peaks of T. troughs returns the number of troughs of T.

    (6) Wavelet Transform Coefficients: For two specified parameters w and tT, CWTC(t, w) returns continuous wavelet transform coefficients. CWTC(t, w) is defined in formula (6). cwt(t, w) returns a continuous wavelet transform for the ricker wavelet of the wavelet function.

    CWTC(t,w)=wavelettransformcoefficients(t,w,cwt(t,w)) (4.1)
    cwt(t,w)=23wπ14(1t2w2)exp(t22w2) (6.2)

    Where w is the width parameter in the wavelet transform function, which is 2 in the experiment.

    For a sequence of sensor events activated by a daily activity, time series data which is input to each feature category. Feature space is generated by Algorithm 1. Algorithm 1 can be divided into two stages. In the first stage (lines 4-12), the identification and times of each activated sensor are extracted. Activated times form time series data. In the second stage (lines 13-22), features are extracted using feature formulas with extracted time series data.

    One drawback to the features of daily activity is the high dimension of feature set. To obtain the strong ability features and eliminate the weak ability ones, we need to use feature selection technique to optimize feature subset. In this paper, the SDSFS algorithm [51] is used to evaluate the activity recognition capability of these features.

    In the initial phase, each agent is assigned to combine the features subset in their respective search spaces (all possible combinations of the features). Each agent will use an independent random split to divide the dataset into training and testing subsets according to a ratio of 4:1. The hypothesis is a binary string that represents the feature subset within the subset size. In the string, if the bit is 1, it contains its corresponding feature, if 0, it is not.

    In the test phase, these activities of the agent are determined according to the average F-score of multiple classifiers in their fitness function, where the agent selects another random agent and compares them. If the F-score of the selected agent is more than that of the random agent, then the selected agent is set to active, otherwise it is set to inactive. The agents will repeat this process to determine their respective states. After that, the diffusion phase would ideally begin.

    In the diffusion phase, both inactive and active agents choose other agents. If the randomly selected agent is active, it will offset the hypothesis (the feature subset), which will be shared with an inactive agent. Instead, the selected agents choose the new random hypothesis (the feature subset) from its search space (all feature combinations in the subset size). For offsetting, one of the features is randomly removed (by changing 1 to 0) with another randomly added one (by changing 0 to 1). This keeps the size of the subset. Besides, when the active agent picks another active agent that maintains a similar hypothesis, the selected agent will be set to inactive and assigned to a random hypothesis. This frees up all agents and increases diversity. Algorithm 2 is repeatedly executed until the maximum number of iterations (numIterations) is reached.

    Algorithm 1. featureExtraction
    Input: S, deployed sensor identifications in smart house
         Φ, set of the proposed feature categories
          E, a sequence of sensor events activated by a daily activity a
    Output: F
    1. F←Ø;
    2. TS←Ø;
    3. IS←Ø; // set of sensor identifications activated by a
    4. while(true)
    5.      egetNextSensorEvent(E); //Get next sensor event e in E.
    6.      (t, s)←extractTime & Sensor(e)//Extract T and I of e.
    7.      TSTS∪{(T, I)};
    8.      ISIS∪{I};
    9.      if(e is last traversed sensor event in E) then
    10.        break;
    11.      end if
    12. end for
    13. for each I in S
    14.      for each φ in Φ
    15.        if IIS then
    16.          TI← {T|(T, I)∈TS};
    17.          FF∪{(I, j(TI))};
    18.        else
    19.          FF∪{(I, 0)};
    20.         end if
    21.       end for
    22. end for
    23. return F

    Algorithm 2. Description of SDSFS algorithm
    Input: numIterations, the number of iterations
         numAgents, the number of agents
    Output: Optimal feature subset
    1.    //Initialisation phase
    2.    Assign numAgents agents to random hypotheses with inactive states, each agent represents a set of features.
    3.   while less than numIterations do
    4.      //Evaluation phase
    5.      for each agent in agents
    6.       Evaluate the fitness value;
    7.       Find the maximum fitness value;
    8.     end for
    9.      //Test phase
    10.      for each agent in agents
    11.       if Agent’s fitness > random agent’s fitness then
    12.         Set agent as active;
    13.       end if
    14.     end for
    15.      //Diffusion phase
    16.      for each agent in agents
    17.        if agent is inactive then
    18.         Select a random agent;
    19.          if selected agent is active then
    20.            Copy its hypothesis & offset it;
    21.            Evaluate the fitness value;
    22.         else
    23.           Pick a random hypothesis;
    24.         Evaluate the fitness value;
    25.          end if
    26.         end if
    27.     end for
    28.    end for
    29. return Optimal feature subset

    Activity recognition performance depends on the daily activity feature. We use two common datasets “Cairo” and “Tulum2009” to evaluate the approaches for solving activity recognition performance of daily activity features. “Cairo” and “Tulum2009” are provided by the Washington State University [52]. The involved sensors and daily activities are listed in Table 1.

    Table 1.  Involved sensors and daily activities.
    Dataset Residents and pets Sensor Categories Number of sensors Activity Categories Number of Activity instances Measurement Time
    “Cairo” 2 residents and 1 pet “Motion sensors” (M001-M027) 27 “Night_wandering” 67 57 days
    “Bed_to_toilet” 30
    “R1_wake” 53
    “R2_wake” 52
    “R2_take_medicine” 44
    “Breakfast” 48
    “Temperature sensors ”(T001-T005) 5 “Leave_home” 69
    “Lunch” 37
    “Dinner” 42
    “R2_sleep” 52
    “R1_sleep” 50
    “R1_work_in_office” 46
    “Laundry” 10
    “tulum2009” 2 residents “Motion sensors” (M001-M018) 18 “Cook_Breakfast” 80 84 days
    “R1_Eat_Breakfast” 66
    “Cook_Lunch” 71
    “Leave_Home” 75
    “Watch_TV” 528
    “Temperature sensors ”(T001-T002) 2 “R1_Snack” 491
    “Enter_Home” 73
    “Group_Meeting” 11
    “R2_Eat_Breakfast” 47
    “Wash_Dishes” 71

     | Show Table
    DownLoad: CSV

    For daily activity feature solving approaches, we use Jupyter Notebook to carry out experimental comparison of four methods. First, the proposed method is called "SR", and the second is called "FR". FR is an extraction approach of daily activity feature. For FR method, frequency of activated sensor is extracted as daily activity feature. “FR” and its variants have been be used most extensively as daily activity feature. Feature spaces of the approach are composed of “st”, “et”, “du” and sensor features [53]. The other two are the combination of SR and FR with SDSFS algorithm respectively, called "SR+FS" and "FR+FS". FR+FS (FR+FS) means that SDSFS is employed to select features of daily activity after features are extracted.

    For a given daily activity, the values of st, et, and du refer to the start time, end time, and duration of the daily activity, respectively. Sensor features, each of which corresponds to a sensor are mapped to all deployed sensors in the smart home. For FR, the value of a sensor feature is the frequency that activates the corresponding sensor in the given daily activity. For the SDSFS algorithm, the parameters involved are listed in Table 2.

    Table 2.  Parameter Interpretation of SDSFS algorithm.
    Configuration Name Parameter Interpretation
    the number of iterations numIterations←150
    the number of agents numAgents←30
    the minimum number of features included in an agent lowerLim←5
    the maximum number of features included in an agent upperLim←30

     | Show Table
    DownLoad: CSV

    For data-driven approaches, activity recognition is usually treated as a classification problem. Without loss of generality, Logistic Regression (LR), Naive Bayesian (NB), Decision Tree (DT) and LSTM are used to evaluate the proposed approach. The parameters involved are listed in Table 3. The rest of the parameters are default. A leave-one-day-out cross validation is taken to evaluate the proposed approach. The performance indicators used are the Recall, Precision, and F-score, which are defined in formula (7), (8) and (9), respectively, where Q is the number of activity labels; TPi is the number of true positives; FPi is the number of false positives; FNi is the number of false negatives; TNi is the number of true negatives.

    Recall=Qi=1TPiTPi+FNiQ (7)
    Precision=Qi=1TPiTPi+FPiQ (8)
    Fscore=2PrecisionRecallPrecision+Recall (9)
    Table 3.  Parameter Settings of Classifiers.
    Classifiers Name Parameter Name Parameter Settings
    LR regularization intensity, random number seed C←1.0, random_state←2018
    DT random number seed random_state←2018
    NB / /
    LSTM The number of units 16
    Gradient descent algorithm AdamOptimizar
    Learning rate 1e-3
    Batch size 100
    Epoch number 100

     | Show Table
    DownLoad: CSV

    Recall, Precision, and F-measure are listed in Tables 4-9 and Figures 3, 5.

    Table 4.  Precision of activity recognition in different classifiers of “Cairo”. The best-performing tests by the metric are shown in underline.
    Approaches LR NB DT LSTM
    FR 79.263% 72.207% 84.346% 81.021%
    SR 70.900% 70.775% 79.358% 88.152%
    FR + FS 83.112% 81.334% 87.379% /
    SR + FS 87.006% 83.958% 89.511% /

     | Show Table
    DownLoad: CSV
    Table 5.  Recall of activity recognition in different classifiers of “Cairo”. The best-performing tests by the metric are shown in underline.
    Approaches LR NB DT LSTM
    FR 74.776% 70.225% 85.018% 75.2693%
    SR 66.043% 68.411% 78.236% 88.268%
    FR + FS 81.310% 79.023% 86.295% /
    SR + FS 82.918% 80.634% 87.114% /

     | Show Table
    DownLoad: CSV
    Table 6.  . F-score of activity recognition in different classifiers of “Cairo”. The best-performing tests by the metric are shown in underline.
    Approaches LR NB DT LSTM
    FR 75.603% 69.265% 84.280% 76.321%
    SR 66.254% 67.167% 78.074% 87.748%
    FR + FS 81.021% 78.589% 86.189% /
    SR + FS 84.133% 80.214% 87.529% /

     | Show Table
    DownLoad: CSV
    Table 7.  Precision of activity recognition in different classifiers of “Tulum2009”. The best-performing tests by the metric are shown in underline.
    Approaches LR NB DT LSTM
    FR 72.627% 58.688% 84.993% 85.907%
    SR 77.075% 73.139% 80.689% 88.8%
    FR + FS 81.864% 65.650% 86.122% /
    SR + FS 90.844% 81.221% 85.329% /

     | Show Table
    DownLoad: CSV
    Table 8.  Recall of activity recognition in different classifiers of “Tulum2009”. The best-performing tests by the metric are shown in underline.
    Approaches LR NB DT LSTM
    FR 64.540% 72.008% 79.591% 76.256%
    SR 74.478% 75.513% 78.609% 86.716%
    FR + FS 65.519% 75.564% 81.985% /
    SR + FS 86.101% 87.574% 84.771% /

     | Show Table
    DownLoad: CSV
    Table 9.  F-score of activity recognition in different classifiers of “Tulum2009”. The best-performing tests by the metric are shown in underline.
    Approaches LR NB DT LSTM
    FR 65.912% 58.368% 81.628% 80%
    SR 75.543% 71.112% 79.265% 86.909%
    FR + FS 68.535% 66.448% 83.669% /
    SR + FS 88.205% 83.562% 84.727% /

     | Show Table
    DownLoad: CSV
    Figure 3.  Performance comparison of the different daily activity feature solving approaches on three classifiers.
    Figure 5.  Performance comparison of the different daily activity feature solving approaches on three classifiers.

    (1) Results on Dataset “Cairo”

    After feature selection, the scores of each agent are shown in Figure 2. FR+FS and SR+FS have the highest average F-score in the 14th and 3rd agents respectively. Based on the above results, we conduct the following experiments using the features in the best agents respectively.

    Figure 2.  The average F-score of the extracted features in each agent.

    The Precision obtained using SR+FS is higher than that obtained using the other three methods for all classifiers. The Recall and F-score obtained using SR+FS are higher than those obtained using FR, SR and FR+FS for LR and NB. The highest Precision (89.511%), Recall (87.114%) and F-score (87.529%) are obtained using SR+FS for DT. Besides, the first three classifiers, their average Precision value obtained using SR+FS improves by at least 2.883% compared with the performance of other tests. And the average Recall value of the SR+FS is 83.555%. There is 1.346% improvement over the best outcomes of the first three methods. Similarly, the average F-score of the SR+FS achieves at least 7.574%, 13.459% and 2.024% improvements over other benchmark methods. Finally, SR also beats FR in every metric of the LSTM.

    (2) Results on Dataset “Tulum2009”

    After feature selection, the scores of each agent are shown in Figure 4. FR+FS and SR+FS have the highest average F-score in the 5th and 22nd agents respectively. Based on the above results, we conduct the following experiments using the features in the best agents respectively.

    Figure 4.  The average F-score of the extracted features in each agent.

    The dataset also gets the same result pattern. The Precision obtained using SR+FS is also higher than that obtained using the other three methods for all classifiers. The Recall and F-score obtained using SR+FS are higher than those obtained using FR, SR and FR+FS for LR and NB. Although SR+FS lags behind FR + FS in the Precision of DT. Besides, the first three classifiers, their average Precision value obtained using SR+FS improves by at least 7.919% compared with the performance of other tests. Besides, the average Recall value of the SR+FS is 86.148%. There is 9.948% improvement over the best outcomes of the first three methods. Similarly, the average F-score of the SR+FS achieves at least 16.862%, 10.192% and 12.614% improvements over other benchmark methods. Finally, SR also beats FR in every metric of the LSTM.

    We discuss few crucial observations from our experiments. As shown in Figures 3, 5, SR+FS performs better than the other three groups. First, this gain may be due to feature selection. There are inevitably redundant features in the original data. These features are not sensitive to the classification label, but they can disturb the classifier's correct judgment of the sample. Therefore, the performance of SR is low in some classifiers.

    In addition, it may be that the traditional method only counts the frequency of the sensor and loses the time information of the sensor. In contrast, our method extracts the trigger time of the sensor in the activity in turn, and then carries out feature calculation. Such different feature computing methods increase the diversity of features. In this way, the frequency information and the time information of the time series data are retained.

    We note that there are imbalances of categories in daily activities, but this paper does not do the corresponding processing when modeling. Excessive differences between categories may make the model biased towards more categories, which may affect the performance evaluation of the model to some extent. Consequentially, it may be worthwhile to perform further studies to determine how such problems impact performance, especially in smart home environment.

    Daily activity features have a significant influence on activity recognition performance. To improve activity recognition performance, we proposed a statistic representation of daily activity features based on the time series nature of sensor event streams. We utilized four classifiers to compare the proposed approach with approaches based on the frequency and truth of sensor events on two common datasets. The results showed that the proposed approach can significantly improve activity recognition performance.

    This work was supported by the National Natural Science Foundation of China (No. 61976124); the Open Project Program of Artificial Intelligence Key Laboratory of Sichuan Province (Nos. 2018RYJ09, 2019RZJ01); the Opening Project of Key Laboratory of Higher Education of Sichuan Province for Enterprise Informationalization and Internet of Things(No. 2019WZY03); the Major Frontier Project of Science and Technology Plan of Sichuan Province (No. 2018JY0512).

    The authors declare no conflicts of interest.



    [1] World Health Organization (WHO). Coronavirus. Available from: https://www.who.int/health-topics/coronavirus (accessed on January 23, 2020).
    [2] Wuhan Municipal Health Commission. Available from: http://wjw.wuhan.gov.cn/front/web/showDetail/2019123108989 (accessed on December 31, 2019).
    [3] World Health Organization (WHO). Disease Outbreak News. Available from: https://www.who.int/csr/don/archive/disease/novelcoronavirus/en/ (accessed on January 14, 2020).
    [4] World Health Organization (WHO). Situation reports. Available from: http://who.maps.arcgis.com/apps/opsdashboard/index.html#/c88e37cfc43b4ed3baf977d77e4a0667 (accessed on January 23, 2020).
    [5] National Health Commission of the People's Republic of China. Available from: http://www.nhc.gov.cn/xcs/xxgzbd/gzbdindex.shtml (accessed on February 14, 2020).
    [6] Y. Zhou, Z. Ma, F. Brauer, A Discrete Epidemic Model for SARS Transmission and Control in China, Math. Comput. Model., 40 (2004), 1491-1506. doi: 10.1016/j.mcm.2005.01.007
    [7] G. Chowell, C. Castillo-Chavez, P. Fenimore, M. Christopher, C. Kribs-Zaleta, L. Arriola, et al., Model Parameters and Outbreak Control for SARS, Emerg. Infect. Dis., 10 (2004), 1258-1263. doi: 10.3201/eid1007.030647
    [8] P. Lekone, B. Finkenstädt, Statistical Inference in a Stochastic Epidemic SEIR Model with Control Intervention: Ebola as a Case Study, Biometrics, 62 (2006), 1170-1177. doi: 10.1111/j.1541-0420.2006.00609.x
    [9] J. Wu, K. Leung, G. Leung, Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study, Lancet (2020).
    [10] S. Zhao, S. Musa, Q. Lin, J. Ran, G. Yang, W. Wang, et al., Estimating the unreported number of novel coronavirus (2019-nCoV) vases in China in the first half of January 2020: a data-driven modelling analysis of the early outbreak, J. Clin. Med., 9 (2020), 388. doi: 10.3390/jcm9020388
    [11] B. Prasse, M. Achterberg, L. Ma, P. Mieghem, Network-Based Prediction of the 2019-nCoV Epidemic Outbreak in the Chinese Province Hubei, arXiv preprint arXiv (2002), 2002.04482.
    [12] C. Anastassopoulou, L. Russo, A. Tsakris, C. Siettos, Data-Based Analysis, Modelling and Forecasting of the novel Coronavirus (2019-nCoV) outbreak, medRxiv (2020).
    [13] Y. Yang, Q. Lu, M. Liu, Y. Wang, A. Zhang, N. Jalali, et al., Epidemiological and clinical features of the 2019 novel coronavirus outbreak in China, medRxiv (2020).
    [14] C. You, Y. Deng, W. Hu, J. Sun, Q. Lin, F. Zhou, et al., Estimation of the Time-Varying Reproduction Number of COVID-19 Outbreak in China, medRxiv (2020).
    [15] S. Hermanowicz, Forecasting the Wuhan coronavirus (2019-nCoV) epidemics using a simple (simplistic) model, medRxiv (2020).
    [16] K. Mizumoto, K. Kagaga, G. Chowell, Early epidemiological assessment of the transmission potential and virulence of 2019 Novel Coronavirus in Wuhan City: China, 20192020. medRxiv (2020).
    [17] B. Tang, X. Wang, Q. Li, N. L. Bragazzi, S. Tang, Y. Xiao, et al., Estimation of the transmission risk of the 2019-nCoV and its implication for public health interventions, J. Clin. Med., 9 (2020), 462. doi: 10.3390/jcm9020462
    [18] B. Tang, N. Bragazzi, Q. Li, S. Tang, Y. Xiao, J. Wu, An updated estimation of the risk of transmission of the novel coronavirus (2019-nCov), Infect. Disease Model., 5 (2020), 248-255. doi: 10.1016/j.idm.2020.02.001
    [19] Health Commission of Hubei Province. Available from: http://wjw.hubei.gov.cn/bmdt/ztzl/fkxxgzbdgr f yyq/ (accessed on February 15, 2020).
    [20] L. Bettencourt, R. Ribeiro, Real Time Bayesian Estimation of the Epidemic Potential of Emerging Infectious Diseases, PLoS One, 3 (2008), e2185. doi: 10.1371/journal.pone.0002185
    [21] A. Morton, B. Finkenstädt, Discrete time modelling of disease incidence time series by using Markov chain Monte Carlo methods, J. R. Stat. Soc., 54 (2005), 575-594. doi: 10.1111/j.1467-9876.2005.05366.x
    [22] S. Tang, Y. Xiao, Y. Yang, Y. zhou, J. Wu, Z. Ma, Community-based measures for mitigating the 2009 H1N1 pandemic in China, PLoS One, 5 (2010), e10911. doi: 10.1371/journal.pone.0010911
    [23] World Health Organization (WHO). Available from: https://www.who.int/news-room/detail/23-01-2020-statement-on-the-meeting-of-the-international-health-regulations-(2005)-emergency-committee-regarding-the-outbreak-of-novel-coronavirus-(2019-ncov) (accessed on January 23, 2020).
    [24] G. Chowell, N. Hengartner, C. Castillo-Chavez, P. Fenimore, J. Hyman, The basic reproductive number of Ebola and the effects of public health measures: the cases of Congo and Uganda, J. Theor. Biol., 229 (2004), 119-126. doi: 10.1016/j.jtbi.2004.03.006
    [25] C. Favier, N. Degallier, M. Rosa-Freitas, J. Boulanger, J. R. Costa Lima, J. Luitgards-Moura, et al., Early determination of the reproductive number for vector-borne diseases: the case of dengue in Brazil, Trop. Med. Int. Health., 11 (2006), 332-340. doi: 10.1111/j.1365-3156.2006.01560.x
    [26] C. Althaus, Estimating the Reproduction Number of Ebola Virus (EBOV) During the 2014 Outbreak in West Africa, PLoS Curr., 6 (2014).
    [27] G. Chowell, H. Nishiura, L. Bettencourt, Comparative estimation of the reproduction number for pandemic influenza from daily case notification data, J. R. Soc. Interface., 4 (2007), 155-166. doi: 10.1098/rsif.2006.0161
    [28] S. Paine, G. Mercer, P. Kelly, D. Bandaranayake, M. Baker, W. Huang, et al., Transmissibility of 2009 pandemic influenza A(H1N1) in New Zealand: effective reproduction number and influence of age, ethnicity and importations, Euro. Surveill., 15 (2010), pii = 19591.
    [29] S. Park, B. Bolker, D. Champredon, D. Earn, M. Li, J. Weitz, et al., Reconciling early-outbreak estimates of the basic reproductive number and its uncertainty: framework and applications to the novel coronavirus (2019-nCoV) outbreak, medRxiv (2020).
    [30] A. Kucharski, T. Russell, C. Diamond, Y. Liu, J. Edmunds, S. Funk, et al., Early dynamics of transmission and control of COVID-19: a mathematical modelling study, medRxiv (2020).
    [31] Y. Liu, A. Gayle, A. Wilder-Smith, J. Rocklöv, The reproductive number of COVID-19 is higher compared to SARS coronavirus, J. Travel. Med., (2020), 1-4.
  • This article has been cited by:

    1. Sakorn Mekruksavanich, Anuchit Jitpattanakul, RNN-based deep learning for physical activity recognition using smartwatch sensors: A case study of simple and complex activity recognition, 2022, 19, 1551-0018, 5671, 10.3934/mbe.2022265
    2. Keruo Jiang, Zhen Huang, Xinyan Zhou, Chudong Tong, Minjie Zhu, Heshan Wang, Deep belief improved bidirectional LSTM for multivariate time series forecasting, 2023, 20, 1551-0018, 16596, 10.3934/mbe.2023739
  • Reader Comments
  • © 2020 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(13896) PDF downloads(3668) Cited by(165)

Figures and Tables

Figures(6)  /  Tables(1)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog