1.
Introduction
Since the first cases occurred in early December 2019, the COVID-19 crisis has been accompanied by unprecedented data release. The first cluster of cases was reported on December 31, 2019, by WHO (World Health Organization) [1]. Chinese authorities confirmed on January 7 that this cluster was caused by a novel coronavirus [2]. The disease then rapidly spread throughout the world; a case was identified in the U.S.A. as early as January 19, 2020, for instance [3]. According to the WHO database [4], the first cases in Japan date back to January 14, in Italy to January 29 (even though the cluster of cases was announced on January 21, 2020 [5]), in France to January 24, 2020, etc. The spread of the epidemic across countries was monitored, and the data was made publicly available at the international level by recognized scientific institutions such as WHO [4] and the Johns Hopkins University [6], who collected data provided by national health agencies. To the best of our knowledge, this is the first time in history that such detailed epidemiological data have been made publicly available on a global scale; this opens up new questions and new challenges for the scientific community.
Modeling efforts in order to analyze and predict the dynamics of the epidemics were initiated from the start [7,8,9]. Forecasting the propagation of the epidemic is, in particular, a key challenge in infectious disease epidemiology. It has quickly become clear to medical doctors and epidemiologists that covert cases (asymptomatic or unreported infectious cases) play an important role in the spread of the COVID-19. An early description of an asymptomatic transmission in Germany was reported by Rothe et al. [10]. It was also observed on the Diamond Princess cruise ship in Yokohama in Japan [11] that many of the passengers were tested positive for the virus but never presented any symptoms. On the French aircraft carrier Charles de Gaulle, clinical and biological data for all 1739 crew members were collected on arrival at the Toulon harbor and during quarantine: 1121 crew members (64%) were tested positive for COVID-19 using RT-PCR, and among these, 24% were asymptomatic [12]. The importance of covert cases in the silent propagation of the epidemic was highlighted by Qiu [13]. Models accounting for asymptomatic transmission, which agree with reported cases data, have been used from the start of the epidemic [7,14,15,16]. The implementation of such models depends, however, on the a priori knowledge of some characteristic parameters of the host-pathogen interaction, among which is the ascertainment rate. Nishiura and collaborators [17] estimated this ascertainment rate as 9.2% on a 7.5-days detection window, based on testing data of repatriated Japanese nationals from Wuhan. This was corrected later to 44% for non-severe cases [18]. An early review of SARS-CoV-2 facts can be found in the work of Bar-On et al. [19].
To describe the spread of COVID-19 mathematically, Liu et al. [7] first took into account the infection of susceptible individuals by contacts with unreported infectious individuals. A new method using the number of reported cases in SIR models was also proposed in the same work. This method and the model were extended in several directions by the same group [14,15,16,20] to include non-constant transmission rates and a period of exposure. More recently, the method was extended and successfully applied to a Japanese age-structured dataset by Griette, Magal, and Seydi [21]. The method was also extended to investigate the predictability of the outbreak in several countries, including China, South Korea, Italy, France, Germany, and the United Kingdom by Liu, Magal, and Webb [20].
Phenomenological models were extensively used in the literature even before the SARS-CoV-2 pandemic to describe reported cases data, see e.g., [22,23] for the 2003 SARS outbreak, and also [24,25], to cite a few. In the case of the SARS-CoV-2 epidemic, articles related to phenomenological models are particularly numerous, see e.g., [26,27,28,29,30,31,32,33,34,35]. More precisely, Castro et al. [27] investigate the possibility of predicting the turning point of an epidemic wave. Many studies use phenomenological models to issue short-term predictions on the epidemic [30,32,33,34]. But these models can also be used to reconstruct the evolution of the epidemic a posteriori [26,28,35].
In previous works [7,14,15,16], we have replaced the data with the phenomenological model, and we use this continuous description as the output of the epidemic model. This allows us to understand how to express part of the initial distribution and some parameters (e.g., the transmission rate) from the data and the given parameters of the model. By using this approach, we obtain an explicit formula for the time-dependent transmission rate expressed by using some given parameters of the model and some parameters of the phenomenological model. In [36], we used a Bernoulli-Verhulst phenomenological model to describe a single epidemic wave and compute a time-dependent transmission rate.
There are many potential phenomenological models to represent a single epidemic wave [24,26,28]. However, except in the case of the logistic equation, there is usually no explicit formula for the solution. The explicit formula in the case considered here permits to develop a comprehensive statistical analysis. Phenomenological models also serve to regularize the data, which is a complex question. Indeed the idea is to get rid of the stochastic oscillations (due for example to the way the data are collected, or the stochastic nature of the contact between individuals). Some phenomenological models also redistribute the reported cases to dampen the fluctuations in the data. Let us stress here the fact that some oscillations in cases data may not be random and might correspond to complex transmission dynamics (delayed infection, peak in contact numbers during the day, etc.) This highlights one of the drawbacks of phenomenological models: while they allow a precise description of epidemic waves, they might also hide some valuable information on how the disease is transmitted in the population.
A key parameter in understanding the dynamics of the COVID-19 epidemic is the transmission rate, defined as the fraction of all possible contacts between susceptible and infected individuals that effectively result in a new infection per unit of time. Estimating the average transmission rate is one of the most crucial challenges in the epidemiology of communicable diseases. In practice, many factors can influence the actual transmission rate, (ⅰ) the coefficient of susceptibility; (ⅱ) the coefficient of virulence; (ⅲ) the number of contacts per unit of time [37]; (ⅳ) the environmental conductivity [38]. Let us remark that the rate of contacts per units of time can also be investigated by agent models [39].
Epidemic models with time-dependent transmission rates have been considered in several articles in the literature. The classical approach is to fix a function of time that depends on some parameters and to fix these parameters by using the best fit to the data. In Chowell et al. [40] a specific form was chosen for the rate of transmission and applied to the Ebola outbreak in Congo. Huo et al [41] used a predefined transmission rate which is a Legendre polynomial depending on a tunable number of parameters. Let us also mention that kinetic model idea has been used to understand this problem in the paper by Dimarco et al. [42]. Here we are going the other way around. We reconstruct the transmission rate from the data by using the model without choosing a predefined function for the transmission rate. Such an approach was used in the early 70s by London and Yorke [43,44] who used a discrete-time model and discussed the time-dependent rate of transmission in the context of measles, chickenpox, and mumps. More recently, several authors [36,45,46] used both an explicit formula and algorithms to reconstruct the transmission rate. These studies allow us to understand that the regularization of the data is a complex problem and is crucial in order to rebuild a meaningful time-dependent transmission rate.
In the present paper, we apply a new method to compute the transmission rate from cumulative reported cases data. While the use of a predefined transmission rate τ(t) as a function of time can lead to very good fits of the data, here we are looking for a more intrinsic relationship between the data and the transmission rate. Therefore we propose a different approach and use a two-step procedure. Firstly, we use a phenomenological model to describe the data and extract the general trend of the epidemiological dynamics while removing the insignificant noise. Secondly, we derive an explicit relationship between the phenomenological model and the transmission rate. In other words, we compute the transmission rate directly from the data. As a result, we can reconstruct an estimation of the state of the population at each time, including covert cases. Our method also provides new indicators for the epidemiological dynamics that are related to the reproductive number.
2.
Methods
2.1. COVID-19 data and phenomenological model
We regularized the time series of cumulative reported cases by fitting standard curves to the data to reconstruct the time-dependent transmission rate. We first identified the epidemic waves for each of the eight geographic areas. A Bernoulli–Verhulst curve was then fitted to each epidemic wave using the Levenberg–Marquardt algorithm [47]. We reported the detailed output of the algorithm in the supplementary material, including confidence bounds on the parameters. The model was completed by filling the time windows between two waves with straight lines. Finally, we applied a Gaussian filter with a standard deviation of 7 days to the curve to obtain a smooth model.
2.1.1. Data sources
We used reported cases data for 8 different geographic areas, namely California, France, India, Israel, Japan, Peru, Spain, and the UK. Apart from California State, for which we used data from the COVID tracking project [48], the reported cases data were taken from the WHO database [4].
2.1.2. Phenomenological model used for multiple epidemic waves
To represent the data, we used a phenomenological model to fit the curve of cumulative rate cases. Such an idea is not new since it was already proposed by Bernoulli [49] in 1760 in the context of the smallpox epidemic. Here we used the so-called Bernoulli–Verhulst [50] model to describe the epidemic phase. Bernoulli [49] investigated an epidemic phase followed by an endemic phase. This appears clearly in Figures 9 and 10 of the paper by Dietz, and Heesterbeek [51] who revisited the original article of Bernoulli. We also refer to Blower [52] for another article revisiting the original work of Bernoulli. Several works comparing cumulative reported cases data and the Bernoulli–Verhulst model appear in the literature (see [22,24,25]). The Bernoulli–Verhulst model is sometimes called Richard's model, although Richard's work came much later in 1959.
The phenomenological model deals with data series of new infectious cases decomposed into two successive phases: 1) endemic phases followed by 2) epidemic phases.
Endemic phase: During the endemic phase, the dynamics of new cases appears to fluctuate around an average value independently of the number of cases. Therefore the average cumulative number of cases is given by
where t0 denotes the beginning of the endemic phase, and a is the average value of the daily number of new cases.
We assume that the average daily number of new cases is constant. Therefore the daily number of new cases is given by
Epidemic phase: In the epidemic phase, the new cases are contributing to produce secondary cases. Therefore the daily number of new cases is no longer constant, but varies with time as follows
In other words, the daily number of new cases follows the Bernoulli–Verhulst [49,50] equation. Namely, by setting
we obtain
completed with the initial value
In the model, Nbase+N0 corresponds to the value CR(t0) of the cumulative number of cases at time t=t0. The parameter N∞+Nbase is the maximal value of the cumulative reported cases after the time t=t0. χ>0 is a Malthusian growth parameter, and θ regulates the speed at which CR(t) increases to N∞+Nbase.
Regularize the junction between the epidemic phases: Because the formula for τ(t) involves derivatives of the phenomenological model regularizing CR(t) (see Eqs (2.12)–(2.15)), we need to connect the phenomenological models of the different phases as smoothly as possible. Let t0,…,tn denote the n+1 breaking points of the model, that is, the times at which there is a transition between one phase and the next one. We let ~CR(t) be the global model obtained by placing the phenomenological models of the different phases side by side.
More precisely, ~CR(t) is defined by Eq (2.3) during an epidemic phase [ti,ti+1], or during the initial phase (−∞,t0] or the last phase [tn,+∞). During an endemic phase, ~CR(t) is defined by Eq (2.1). The parameters are chosen so that the resulting global model ~CR is continuous. We define the regularized model by using the convolution formula:
where
is the Gaussian function with mean 0 and variance σ2. The parameter σ controls the trade-off between smoothness and precision: increasing σ reduces the variations in CR(t) and reducing σ reduces the distance between CR(t) and ~CR(t). In any case the resulting function CR(t) is very smooth (as well as its derivatives) and close to the original model ~CR(t) when σ is not too large. In the results section (Section 3), we fix σ=7 days.
Numerically, we will need to compute some t→CR(t) derivatives. Therefore it is convenient to take advantage of the convolution Eq (2.6) and deduce that
for n=1,2,3.
2.2. Epidemic model
To reconstruct the transmission rate, we used the underlying mathematical model described by the flowchart presented in Figure 1.
The model itself includes five parameters whose values were taken from the literature: the average length of the noninfectious incubation period (1 day, (E)xposed); the average length of the infectious incubation period (3 days, (I)nfectious); the average length of the symptomatic period (7 days, (R)eported or (U)nreported); the ascertainment rate (0.8). Additional parameters appear in the initial condition and could not be computed from the initial number of unreported individuals. The transmission rate was computed from the regularized data and the assumed parameters according to a methodology adapted from Demongeot et al. [36].
Many epidemiological models are based on the SIR or SEIR model, which is classical in epidemic modelling. We refer to [53,54] for the earliest articles devoted to such a question and to [55,56,58,59,60,61,62,63,64,65] for more models. In this chapter, we will compare the following SEIUR model to the cumulative reported cases data
where at time t, S(t) is the number of susceptible, E(t) the number of exposed (not yet capable of transmitting the pathogen), I(t) the number of asymptomatic infectious, R(t) the number of reported symptomatic infectious and U(t) the number of unreported symptomatic infectious. This system is supplemented by initial data
In this model, τ(t) is the rate of transmission, 1/α is the average duration of the exposure period, 1/ν is the average duration of the asymptomatic infectious period, and for simplicity, we subdivide the class of symptomatic patients into the fraction 0≤f≤1 of patients showing some severe symptoms, and the fraction 1−f of patients showing some mild symptoms assumed to be not detected. The quantity 1/η is the average duration of the symptomatic infectious period. In the model, we assume that the average time of infection is the same for Reported and Unreported infectious individuals. We refer to [66,67] for more information about this topic. Finally, we assume that reported symptomatic individuals do not contribute significantly to the transmission of the virus.
The cumulative number of reported cases CR(t) is connected to the epidemic model by the following relationship
where
2.2.1. Instantaneous reproduction number computed for COVID-19 data
We have only a single epidemic phase in the standard SI epidemic model because the epidemic exhausts the susceptible population. Here, the changes of regimes (epidemic phase versus endemic phase) are partly due to the decay in the number of susceptible. But these changes are also influenced by the changes in the transmission rate. These changes in the transmission rate are due to the limitation of contacts between individuals or to changes in climate (in summer) or other factors influencing transmissions.
In this section, we will observe that the main factors for the changes in the epidemic regimes are the changes in the transmission rate. To investigate this for the COVID-19 data, we use our method to compute the transmission rate, and we consider the instantaneous reproduction number
and the quasi-instantaneous reproduction number
in which the transmission varies, but the size of the susceptible population remains constant equal to S0. We refer to subsection 83141592631415927 for detailed computations to obtain the Eq (2.16).
The comparison between Re(t) and R0e(t) permits us to understand the contribution of the decay of the susceptible population in the variations of Re(t). Another interesting aspect is that R0e(t) is proportional to the transmission rate τ(t). Therefore plotting R0e(t) permits us to visualize the variation of t→τ(t) only.
2.2.2. Computation of the initial value of the epidemic model
Based on Eq (2.4), we can recover the initial number of asymptomatic infectious I0=I(t0) and the initial number of exposed E0=E(t0) for an epidemic phase starting at time t0. Indeed by definition, we have CR′(t)=νfI(t) and therefore
By differentiating Eq (2.5) we deduce that
therefore
By using the third equation in Eq (2.8) we obtain
2.2.3. Theoretical formula for τ(t)
We first remark that the S-equation of model (2.8) can be written as
therefore by integrating between t0 and t we get
Next we plug the above formula for S(t) into the E-equation of model (2.8) and obtain
and by integrating this equation between t0 and t we obtain
Define the cumulative numbers of exposed, infectious and unreported individuals by
and note that CE′(t)=E(t). We can rewrite the Eq (2.22) as
By taking the logarithm of both sides we obtain
and by differentiating with respect to t:
Therefore we have an explicit formula giving τ(t) as a function of I(t), U(t) and CE(t) and its derivatives. Next we explain how to identify those three remaining unknowns as a function of CR(t) and its derivatives. We first recall that, from Eq (2.10), we have
The I-equation of model (2.8) can be rewritten as
and by integrating this equation between t0 and t we obtain
Finally, by applying the variation of constants formula to the U-equation of system (2.8) we obtain
From these computations we deduce that τ(t) can be computed thanks to Eq (2.23) from CR(t), α, ν, η, κ, f and U0. The following theorem is a precise statement of this result.
Theorem 2.1. Let S0>0, E0>0, I0>0, U0>0, CR0≥0, α, ν, η and f>0 be given. Let t↦τ(t)≥0 be a given continuous function and t→I(t) be the second component of system (2.8). Let ^CR:[t0,∞)→R be a twice continuously differentiable function. Then
if and only if ^CR satisfies
and τ(t) is given by
where
Proof. Assume first that ^CR(t) satisfies Eq (2.26). Then by using the first equation of system (2.8) we deduce that
Therefore
and by taking the derivative of both sides we obtain
which is equivalent to
By using the fact that E(t)=CE′(t) and I=CR′(t)/(νf), we deduce Eq (2.32). By differentiating Eq (2.26), we get Eqs (2.28) and (2.30). Equation (2.29) is a consequence of the E-component of Eq (2.8). We get Eq (2.31) by combining Eqs (2.37) and (2.35) (since ^CE(t)=CE(t)).
Conversely, assume that τ(t) is given by Eq (2.31) and all the Eqs (2.27)–(2.36) hold. We define ˆI(t)=^CR′(t)/νf and ^CI(t)=(^CR(t)−CR0)/νf. Then, by using Eq (2.27), we deduce that
and by using Eq (2.28), we deduce
Moreover, from Eq (2.31) and ˆI(t)=^CR′(t)/νf we deduce that
Multiplying Eq (2.40) by ˆI(t)+κˆU(t) and integrating, we obtain
where the right-hand side is well defined thanks to Eq (2.31). By combining Eqs (2.27), (2.28) and (2.35) we obtain
and by taking the derivative in Eq (2.35) we obtain
therefore by using Eq (2.29) we deduce that
In particular, E0+S0−^CE′(t0)−α^CE(t0)=S0 and, by taking the exponential of Eq (2.41), we obtain
which, differentiating both sides, yields
and therefore
where ˆE(t):=^CE′(t) and ˆS(t):=S0e−∫tt0τ(σ)[ˆI(σ)+κˆU(σ)]dσ. Differentiating the definition of ˆS(t), we get
Next the derivative of Eq (2.35) can be rewritten as
Finally, differentiating Eq (2.36) yields
By combining Eqs (2.44)–(2.47) we see that (ˆS(t),ˆE(t),ˆI(t),ˆU(t)) satisfies Eq (2.8) with the initial condition (ˆS(t0),ˆE(t0),ˆI(t0),ˆU(t0))=(S0,E0,I0,U0). By the uniqueness of the solutions of Eq (2.8) for a given initial condition, we conclude that (ˆS(t),ˆE(t),ˆI(t),ˆU(t))=(S(t),E(t),I(t),U(t)). In particular, CR(t) satisfies Eq (2.26). The proof is completed.
Remark 2.2. The condition Eq (2.31) is equivalent to
Remark 2.3. The present computations have been previously done, in a different context, by Hadeler [68].
2.3. Computing the explicit formula for τ(t) during an epidemic phase
In this section, we assume that the curve of cumulative reported cases is given by the Bernoulli–Verhulst formula
and we recall that
Then we can compute an explicit formula for the components of the system (2.8). By definition we have
which gives
so that by using the I-component in the system (2.8) we get
By integration, we get
and since
we obtain
Note also that we have explicit formulas for E(t)=CE′(t) and E′(t)=CE″(t),
and
Next, recall the U-equation of Eq (2.8), that is,
therefore by the variation of constant formula we have
2.3.1. Compatibility conditions for the positivity of the transmission rate
Recall from Eq (2.51):
Here we require that the numerator and the denominator of the last fraction stay positive for all times.
Positivity of the numerator: The model is compatible with the data if the transmission rate τ(t) stays positive for all times t∈R. The numerator
is a second-order polynomial with N∈(0,1). Let Δ:=B2−4AC be the discriminant of p(N). Since p′(0)=−B<0 and
we have two cases: 1) B2A≥1; or 2) 0<B2A<1.
Case 1: If B2A≥1, p(N) is non-negative for all N∈[0,1] if and only if
Substituting A, B, C by their expression, we get
Case 2: If B2A<1, p(N) is non-negative for all N∈[0,1] if and only if
Lemma 2.4. Δ<0⇒A+C−B>0.
Proof. We have
and after simplifying the result follows.
Positivity of the denominator: Next we turn to the denominator in the expression of τ, i.e., we want to ensure
We let Y:=N(t)N∞ and observe that E(t)+αCE(t) can be written as
since we know that A>0. Therefore Eq (2.58) becomes
We let
and notice that
is exactly p(N):=AN2−BN+C.
Therefore, assuming that A+C−B>0, the derivative g′(Y) is positive and g is strictly increasing. So we only have to check the final value g(1). We get
2.3.2. Computing the explicit formula for τ(t) during an endemic phase
Recall that during an endemic phase, the cumulative number of cases is assumed to be a line. Therefore,
and
Therefore
and
Hence
Moreover
and we obtain
By combining Eqs (2.12) and (2.61)–(2.64) we obtain the following explicit formula.
Remark 2.5. The above transmission rate corresponds to a constant number of daily infected A. Therefore it is impossible to maintain such a constant flux of new infected whenever the number of susceptible individuals is finite. The time t=fS0A+t0 corresponds to the maximal time starting from t0 during which we can maintain such a regime.
3.
Results
3.1. Phenomenological model applied to COVID-19 data
Our method to regularize the data was applied to the eight geographic areas. The resulting curves are presented in Figure 2. The blue background color regions correspond to epidemic phases, and the yellow background color regions to endemic phases. We added a plot of the daily number of cases (black dots) and the derivative of the regularized model for comparison, even though the daily number of cases is not used in the fitting procedure. The figures show in general, an extremely good agreement between the time series of reported cases (top row, black dots) and the regularized model (top row, blue curve). The match between the daily number of cases (bottom row, black dots) and the derivative of the regularized model (bottom row, blue curve) is also excellent, even though it is not a part of the optimization process. Of course, we lose some of the information like the extremal values ("peaks'') of the daily number of cases. This is because we focus on an averaged value of the number of cases. More information could be retrieved by studying statistically the variation around the phenomenological model. However, we leave such a study for future work. The relative error between the regularized curve and the data may be relatively high at the beginning of the epidemic because of the stochastic nature of the infection process and the small number of infected individuals but quickly drops below 1% (see the supplementary material for more details).
3.2. Bounds for the value of non-identifiable parameters
Even if some parameters of the mathematical model are not identifiable, we were able to gain some information on possible values for those parameters. Indeed, a mathematical model with a negative transmission rate τ(t) cannot be consistent with the real phenomenon. Therefore, parameter values which produce such negative transmission rates cannot be compatible with the data. Using this argument, we found that the average incubation period cannot exceed eight days. The actual value of the upper bound is highly variable across countries and epidemic waves. We report the values of the upper bound in Section 11 of the supplementary material.
3.3. Instantaneous reproduction number computed for COVID-19 data
Our analysis allows us to compute the instantaneous transmission rate τ(t). We use this transmission rate to compute two different indicators of the epidemiological dynamics for each geographic area, the instantaneous reproduction number and the quasi-instantaneous reproduction number. Both coincide with the basic reproduction number R0 on the first day of the epidemic. The instantaneous reproduction number at time t, Re(t), is the basic reproduction number corresponding to an epidemic starting at time t with a constant transmission rate equal to τ(t) and with an initial population of susceptibles composed of S(t) individuals (the number of susceptible individuals remaining in the population). The quasi-instantaneous reproduction number at time t, R0e(t), is the basic reproduction number corresponding to an epidemic starting at time t with a constant transmission rate equal to τ(t) and with an initial population of susceptibles composed of S0 individuals (the number of susceptible individuals at the start of the epidemic). The two indicators are represented for each geographic area in the top row of Figure 3 (black curve: instantaneous reproduction number; green curve: quasi-instantaneous reproduction number).
There is one interpretation for Re(t) and another for R0e(t). The instantaneous reproduction number indicates if, given the current state of the population, the epidemic tends to persist or die out in the long term (note that our model assumes that recovered individuals are perfectly immunized). The quasi-instantaneous reproduction number indicates if the epidemic tends to persist or die out in the long term, provided the number of susceptible is the total population. In other words, we forget about the immunity already obtained by recovered individuals. Also, it is directly proportional to the transmission rate and therefore allows monitoring of its changes. Note that the value of R0e(t) changed drastically between epidemic phases, revealing that τ(t) is far from constant. In any case, the difference between the two values starts to be visible in the figures one year after the start of the epidemic.
We also computed the reproduction number by using the method described in Cori et al. [69], which we denote Rce(t). The precise implementation is described in the supplementary material. It is plotted in the bottom row of Figure 3 (green curve), along with the instantaneous reproduction number Re(t) (green curve).
Remark 3.1. In the bottom of Figure 3, we compare the instantaneous reproduction numbers obtained by our method in black and the classical method of Cori et al. [69] in green. We observe that the two approaches are not the same at the beginning. This is because the method of Cori et al. [69] does not take into account the initial values I0 and E0 while we do. Indeed the method of Cori et al. [69] assumes that I0 and E0 are close to 0 at the beginning when it is viewed as a Volterra equation reformulation of the Bernoulli–Kermack–McKendrick model with the age of infection. Our method, on the other hand, does not require such an assumption since it provides a way to compute the initial states I0 and E0.
Remark 3.2. It is essential to "regularize" the data to obtain a comprehensive outcome from SIR epidemic models. In general, the rate of transmission in the SIR model (applying identification methods) is not very noisy and meaningless. For example, at the beginning of the first epidemic wave, the transmission rate should be decreasing since peoples tend to have less and less contact while to epidemic growth. The standard regularization methods (like, for example, the rolling weekly average method) have been tested for COVID-19 data in Demongeot, Griette, and Magal [36]. The outcome in terms of transmission rate is very noisy and even negative transmission (which is impossible). Regularizing the data is not an easy task, and the method used is very important in order to obtain a meaningful outcome for the models. Here, we tried several approaches to link an epidemic phase to the next endemic phase. So far, this regularization procedure is the best one.
3.4. Consequences for vaccination
It is essential to "regularize" the data to obtain a comprehensive outcome from SIR epidemic models. In general, the rate of transmission in the SIR model (applying identification methods) is not very noisy and meaningless. For example, at the beginning of the first epidemic wave, the transmission rate should be decreasing since peoples tend to have less and less contact while to epidemic growth. The standard regularization methods (like, for example, the rolling weekly average method) have been tested for COVID-19 data in Demongeot, Griette, and Magal [36]. The outcome in terms of transmission rate is very noisy and even negative transmission (which is impossible). Regularizing the data is not an easy task, and the method used is critical in order to obtain a meaningful outcome for the models. Here, we tried several approaches to link an epidemic phase to the next endemic phase. So far, this regularization procedure is the best one we tested.
4.
Discussion
In this article, we presented a new phenomenological model to describe cumulative reported cases data. This model allows us to handle multiple epidemic waves and fits the data for the eight geographic areas considered very well. The use of Bernoulli-Verhulst curves to fit an epidemic wave is not necessary. We expect that a number of different phenomenological models could be employed for the same purpose; however, our method has the advantage of involving a limited number of parameters. Moreover, the Bernoulli-Verhulst model leads to an explicit algebraic formula for the compatibility conditions of non-identifiable parameters. It is far from obvious that the same computations can be carried out with other models. Our method also provides a very smooth curve with controlled upper bound for the first (four) derivatives, and we use the regularity obtained to compute the transmission rate. We refer to Demongeot, Griette, and Magal [36] for several examples of problems that may occur when using other methods to regularize the data (rolling weekly average, etc.).
The first goal of the article was to understand how to connect successive epidemic waves. As far as we know, this is new compared to the existing literature. A succession of epidemic waves separated by a short period of time with random transmissions is regularly observed in the COVID-19 epidemic data. But several consecutive epidemic phases may happen without endemic transition. An illustration of this situation is provided by the case of Japan, where the parameters of the Bernoulli-Verhulst model changed three times during the last epidemic phases (without endemic interruption). Therefore we subdivide this last epidemic wave into three epidemic phases.
Another advantage of our method is the connection with an epidemiological model. Our study provides a way to explain the data by using a single epidemic model with a time-dependent transmission rate. More precisely, we find that there exists precisely one model that matches the best fit to the data. The fact that the transmission rate corresponding to the data is not constant is, therefore, meaningful. This means that the depletion of susceptible hosts due to natural epidemiological dynamics is not sufficient to explain the reduction in the epidemic spread. Indeed, due to the social changes involving the distancing between individuals, the transmission rate should vary to take into account the changes in the number of contacts per unit of time. The variations in the observed dynamics of the number of cases mainly result from the modification of people's behavior. In other words, the social changes in the population have a stronger impact on the propagation of the disease than the pure epidemiological dynamics. By computing the transmission rate and the associated reproduction numbers, we propose a new method to quantify those social changes. Other factors may also influence the dynamics of the COVID-19 outbreak (temperature, humidity, etc.) and should be taken into account. However, the correlation between the dates of the waves and the mitigation measures imposed by local governments suggests that the former phenomenon takes a more significant role in the epidemiological dynamics.
Precisely because it involves an epidemiological model, our method provides an alternative, robust way to compute indicators for the future behavior of the epidemic: the instantaneous and quasi-instantaneous reproductive numbers Re(t) and R0e(t). It is natural to compare them to an alternative in the literature, sometimes called "effective reproductive number''. The method of Cori et al. [69] is a popular framework to estimate its value. Compared with this standard method, our indicators perform better near the beginning of the epidemic and close to the last data point and are less variable in time. That we require an a priori definition of epidemic waves can be considered as an advantage and a drawback. It is a drawback because the computed value of the indicator may slightly depend on the choice of the dates of the epidemic waves. On the other hand, this flexibility also allows testing different scenarios for the future evolution of the epidemic. Thanks to the explicit formula for Re(t) expressed in function of the parameters, we can also explore the dependency to the parameters (see supplementary material Section S6).
It appears from our results that the instantaneous reproduction number in almost every geographic area considered is less than 3.5. Therefore, an efficient policy to eliminate the COVID-19 would be to vaccinate a fraction of 75−80% of the population. Once this threshold is reached, the situation should go back to normal in all the geographic areas considered in this study. This proportion can even be reduced at the expense of partially maintaining the social distancing and the other anti-COVID measures for a sufficiently long period of time.
With a few modifications, our method could also include several other features. It is likely, for instance, that the vaccination of a large part of the population has an impact on the epidemiological dynamics, and this impact is not taken into account for the time being. Different distributions of serial intervals could be taken into account by replacing the mathematical model of ordinary differential equations with integral equations. What we have shown is that the coupling of a phenomenological model to describe the data, with an epidemiological model to take into account the nature of the underlying phenomenon, should provide us with a new, untapped source of information on the epidemic.
Conflict of interest
The authors declare there is no conflict of interest.
Supplementary Material
S1. Table of estimated parameters for the phenomenological model
S.1.1. California
S.1.2. France
S.1.3. India
S.1.4. Israel
S.1.5. Japan
S.1.6. Peru
S.1.7. Spain
S.1.8. United Kingdom
S2. Plot of the multiple Bernoulli–Verhulst models fitted to each epidemic phase
In Figure S1, we present the details of the fit of the Bernoulli–Verhulst models to the successive epidemic waves in the 8 geographic areas considered. Each epidemic wave is associated with a different color.
S3. Relative error of the fitted curve compared to the data in each geographic area
S3.1. California
S3.2. France
S3.3. India
S3.4. Israel
S3.5. Japan
S3.6. Peru
S3.7. Spain
S3.8. United Kingdom
S4. Table of estimated parameters for the phenomenological model
S4.1. California
S4.2. France
S4.3. India
S4.4. Israel
S4.5. Japan
S4.6. Peru
S4.7. Spain
S4.8. United Kingdom
S5. Additional information for the results section
S5.1. California
S5.2. France
S5.3. India
Figure 4 of the main text is devoted to the reproduction number of the model. The instantaneous reproduction number t→Re(t) is decreasing from February 01, 2020 until February 25, 2021.
S5.4. Israel
S5.5. Japan
S5.6. Peru
S5.7. Spain
S5.8. United Kingdom
S6. Dependency with respect to the parameters for the French data
Influence of f on basic reproduction number:
Influence of κ on basic reproduction number:
Influence of ν on basic reproduction number:
Influence of η on basic reproduction number:
Influence of α on basic reproduction number:
S7. Upper bound of the duration for the exposed period and the asymptomatic infectious period
Let us finally mention that for each country and each epidemic wave we evaluated the parameter 1/(χθ). In Figure S20 we plot the histogram of its estimated value and obtain a median value be 15.61 days.
Therefore the length of exposure and the length asymptomatic infectious period should smaller than 15.61 days. In this section, we plot the estimated values of the parameter 1/(χθ) for each epidemic period and each country consider in this study. The parameter corresponds to the upper bound of the length of the exposed period and asymptomatic infectious period. Indeed from the section devoted to the compatibility condition we know that the average duration of the exposed period should satisfy
and the average duration of the asymptomatic infectious period should should satisfy
S8. Computing R0
The basic reproduction number R0 can be computed for the SEIUR model by the formula (see [71,72])
where F is the matrix containing new infections and V contains the rates of transfer between classes:
see [71] and [72] for details. Therefore
It follows that
S9. Stochastic approach to effective reproductive ratio
In numerical applications, we also present the results obtained by applying the method described in the paper of Cori et al. [69]. Let us summarize the principle of the method. We consider that the incidence data (i.e., the daily number of new reported cases) correspond to infection events that have occurred in the past. For each new reported case, we reconstruct the time the infectious period started by sampling a Gamma distribution (i.e. the time from the infection to the moment at which the individual is reported follows a Gamma distribution). The parameters of this Gamma distribution are computed to match the differential equation framework. In numerical application, 1/μ=10 days, we took the average for the average of the Gamma distribution as well as its standard deviation. We denote It the resulting number of individuals that begin their infectious period on the day t. As described in [69], we use a smoothing window of τ days (τ=14 days in numerical applications). The resulting effective reproductive ratio Rt is then computed as
where a and b are a prior distribution on Rt (we took a=1 and b=5, as in [69]) and Λs is computed by the formula
where ws is the average infectiousness profile after time s. In numerical applications, and following [69,Web Appendix 11], we used the following formula for ws
where FΓ,α,β(s) is the cumulative density of a Gamma distribution of parameters (α,β):
The parameters α and β are computed to match the Gamma distribution of the serial intervals which, in our case, have mean value 1/μ=10 days and standard deviation as well of 1/μ, so that α=1/μ and β=1/μ.
Because of the sampling of random numbers involved in the computation of Rt, the procedure described above was repeated 100 times (each time drawing a new sequence of Is from the daily number of new cases) and the final value of Rt presented in Figure 4 of the main text (green curves) is the average of the values obtained during these 100 simulations.