
Citation: Jing Liu, Wancheng Qiu, Xiaoying Shen, Guangchun Sun. Bioinformatics analysis revealed hub genes and pathways involved in sorafenib resistance in hepatocellular carcinoma[J]. Mathematical Biosciences and Engineering, 2019, 16(6): 6319-6334. doi: 10.3934/mbe.2019315
[1] | Martha Tampaki, Georgia Koutouzidou, Katerina Melfou, Athanasios Ragkos, Ioannis A. Giantsis . The contrasting mosaic of consumers' knowledge on local plant genetic resources sustainability vis a vis the unawareness for indigenous farm animal breeds. AIMS Agriculture and Food, 2024, 9(2): 645-665. doi: 10.3934/agrfood.2024035 |
[2] | Giuseppe Timpanaro, Paolo Guarnaccia, Silvia Zingale, Vera Teresa Foti, Alessandro Scuderi . The sustainability role in the purchasing choice of agri-food products in the United Arab Emirates and Italy. AIMS Agriculture and Food, 2022, 7(2): 212-240. doi: 10.3934/agrfood.2022014 |
[3] | Zarbà Carla, La Via Giovanni, Pappalardo Gioacchino, Hamam Manal Samir Mustafa . The sustainability of Novel foods in the transition phase to the circular economy; the trade “Algae fit for human consumption” in European Union. AIMS Agriculture and Food, 2020, 5(1): 54-75. doi: 10.3934/agrfood.2020.1.54 |
[4] | Iyabo Olunike Omomowo, Oluwaseun Emmanuel Shittu, Olawale Israel Omomowo, Olusola Nathaniel Majolagbe . Influence of phosphate solubilizing non-toxigenic Aspergillus flavus strains on maize (Zea mays L.) growth parameters and mineral nutrients content. AIMS Agriculture and Food, 2020, 5(3): 408-421. doi: 10.3934/agrfood.2020.3.408 |
[5] | Emanuele Dolfi, Margherita Masi, Ernesto Simone Marrocco, Gizem Yeter, Martina Magnani, Yari Vecchio, Alessio Bonaldo, Felice Adinolfi . Indirect entomophagy: Consumer willingness to pay toward fish fed with insect-based feed. AIMS Agriculture and Food, 2025, 10(2): 266-292. doi: 10.3934/agrfood.2025014 |
[6] | Wei Yang, Bryan Anh, Phuc Le . Do consumers care about environmentally sustainable attributes along the food supply chain? —A systematic literature review. AIMS Agriculture and Food, 2023, 8(2): 513-533. doi: 10.3934/agrfood.2023027 |
[7] | Fuad Jafarli, Maurizio Canavari . Place branding in rural areas: A literature review. AIMS Agriculture and Food, 2025, 10(1): 129-152. doi: 10.3934/agrfood.2025007 |
[8] | Francesco Sottile, Stefano Massaglia, Valentina Maria Merlino, Cristiana Peano, Giulia Mastromonaco, Ferdinando Fornara, Danielle Borra, Oriana Mosca . Consumption vs. non-consumption of plant-based beverages: A case study on factors influencing consumers' choices. AIMS Agriculture and Food, 2023, 8(3): 889-913. doi: 10.3934/agrfood.2023047 |
[9] | Alexandros Tsoupras, Eirini Panagopoulou, George Z. Kyzas . Olive pomace bioactives for functional foods and cosmetics. AIMS Agriculture and Food, 2024, 9(3): 743-766. doi: 10.3934/agrfood.2024040 |
[10] | Saowanee Wijitkosum, Preamsuda Jiwnok . Effect of biochar on Chinese kale and carbon storage in an agricultural area on a high rise building. AIMS Agriculture and Food, 2019, 4(1): 177-193. doi: 10.3934/agrfood.2019.1.177 |
Stable estimation of system parameters for infectious disease outbreaks is of paramount importance to the design of adequate forecasting algorithms [1,2,3]. Oftentimes parameter estimation procedures are cast as ODE-constrained nonlinear least squares problems, where infinite dimensional time dependent disease parameters need to be recovered from finite dimensional data sets. As the result, the Jacobian of the corresponding parameter-to-data operator is generally ill-conditioned and may be numerically singular. When such an operator is fitted to noise-contaminated epidemiological data, the estimated parameters tend to be entirely unreliable due to severe error propagation into the approximate solution. The sources of noise in the reported incidence data vary for different types of diseases and can be attributed to possible under or over reporting owing to, for instance, a large proportion of asymptomatic cases or false diagnostics.
Noisy data coupled with modeling, discretization, and computational errors necessitate the use of special mathematical tools known as regularization [4,5]. It amounts to solving some "nearby" auxiliary problem in place of the initial one. The auxiliary problem has to be formulated in such a way that its solution is less sensitive to noise propagation as opposed to the solution of the original problem.
A time dependent transmission rate of an infectious disease is an important parameter, which can be defined as the effective contact rate, that is, the probability of infection given contact between an infectious and susceptible individual multiplied by the average rate of contacts between these groups. Generally the transmission rate cannot be pre-estimated since it depends on multiple environmental, genetic, social, and other factors. Hence one has to recover cause form effect using epidemiological data for an emerging outbreak together with a suitable compartmental model governing the disease. Once recovered and extrapolated, the transmission rate can be used to project future incidence cases. That, in turn, may be helpful in the design of effective control measures and optimal resource allocation.
In what follows, we use Matlab built-in implementation of the Levenberg-Marquardt algorithm [6,7] to reconstruct a variable transmission rate. The regularization provided by this optimization scheme, which is a penalized version of the Gauss-Newton procedure [8], is enforced by the appropriate problem-oriented discretization tools. Specifically, we compare what we call parametric and non-parametric discretization routines. By parametric discretization we mean that the transmission rate is modeled by a pre-defined function that involves only a few parameters. The rationale behind this approach is simple: if one is given some a priori information about the outbreak, one can reasonably choose an appropriate expression to describe changes in the transmission coefficient. For example, in case of a single cycle outbreak with the incorporation of control measures at the beginning stages, it is reasonable to assume a declining transmission rate defined, say, by a hyperbolic, harmonic, or exponential function. While parametric discretization may not capture all aspects of the actual transmission rate, it may capture enough crucial information to provide a useful forecasting tool. Our main expectation is that recovering fewer parameters helps mitigate instability caused by noise and the lack of data without, we hope, a significant loss in accuracy.
At the same time, even for a single-cycle outbreak, the transmission rate of an infectious disease may vary depending on the type of a disease, population group, characteristics of a region, and the efficiency of control measures. So, realistically we cannot expect transmission rates to always exhibit a simple decline pattern and therefore parametric discretization inevitably leads to a loss of information. In order to better capture the shape of the time dependent transmission rate, one has to use non-parametric discretization schemes. In such schemes, the transmission rate is projected onto a subspace spanned by a finite set of orthogonal polynomials or spline functions. Again, depending on the nature of the transmission, one may use Legendre or Chebyshev polynomials, B-splines, wavelets, or other base functions.
The main goal of our numerical study is to see how parametric and non-parametric discretization schemes compare in terms of accuracy of parameter estimation and in terms of their ability to provide a reliable forecasting tool. The paper is organized as follows. In Section 2, the governing SEIR model and the regularized inversion procedure are outlined, followed by the discussion of parametric and non-parametric discretization algorithms. In Section 3, numerical experiments with synthetic data are presented. Simulation results with real data for the 1918 influenza outbreak in San Francisco are given in Section 4. Future plans are summarized in Section 5.
Consider a well-mixed population of size N, where individuals have the same probability of being in contact with each other. The population is sorted into four classes: susceptible (S), exposed (E), infectious (I) and removed (R) [9] as shown in Figure 2.
It is assumed that susceptible humans (category S) infected with a virus enter the latent period (category E) at the rate β(t)I(t)/N, where β(t) is the mean transmission rate per day (week). Latent humans progress to the infectious class (category I) at the rate κ (1/κ is the mean latent period). Infected individuals are assumed to recover and acquire protective immunity for the duration of the entire epidemic period at rate γ, where 1/γ is the average time from symptoms onset to recovery. Recovered humans move back to the susceptible class at the rate σ (1/σ is the duration of immunity). For simplicity, we let the host birth and death rates have the same value, which means that the total population size remains constant for the duration of the outbreak. The overall transmission dynamics can be mathematically described by the following set of nonlinear differential equations for t∈[a,b]
dSdt=−β(t)S(t)I(t)N | (2.1) |
dEdt=β(t)S(t)I(t)N−κE(t) | (2.2) |
dIdt=κE(t)−γI(t) | (2.3) |
dRdt=γI(t) | (2.4) |
with initial conditions
S(a)=N−C1−C1/κ,E(a)=C1/κ,I(a)=C1,R(a)=0. | (2.5) |
and the system parameters listed in Table 1 below.
Variable | Parameter |
N | Total effective population size |
β(t) | Transmission rate |
1/κ | Average incubation period |
1/γ | Average time from the onset of symptoms to recovery |
In our numerical experiments all parameters of the model, except for the transmission rate β(t), are pre-estimated for each particular disease, while β(t) is fitted to the incidence data given by the following expression:
dCdt=κE(t), | (2.6) |
where C(t) and dCdt are the cumulative and incidence data, respectively. A primary limitation of (2.1)-(2.5) is that we assume a completely susceptible population at the beginning of the epidemic, and let the transmission rate capture the baseline susceptibilty of the population. A more detailed model could also account for age-specific transmission rates because for diseases like measles, for example, there are significant differences in transmission rates between children and adults. Our goal is to recover β(t) by solving a nonlinear ODE-constrained least-squares minimization problem with limited data for an emerging outbreak and then to forecast future disease incidence cases. Given finite incident data at each point in time, D=[D1,D2,...,Dm], the reconstruction of β(t) can be formulated as follows:
minp‖D−κE‖2withF(p,u)=0. | (2.7) |
Here u stands for [S,E,I,R], p denotes the unknown parameter vector, and the operator equation F(p,u)=0 is given by (2.1)-(2.5).
Let u=u(p) be a (numerical) solution to the SEIR system (2.1)-(2.5). Introduce a parameter-to-observation map
ψi(p):=κEi[p,u(p)],i=1,2,...,m. | (2.8) |
One then obtains the unconstrained least squares problem:
minp‖D−ψ(p)‖2=minpm∑i=1(Di−ψi(p))2,ψ:Rn→Rm, | (2.9) |
where m is the number of data points and n the number of unknowns for each particular discretization algorithm. The least squares problem is solved with the Levenberg-Marquardt numerical optimization procedure (Matlab's built in implementation), where ψ′(pk) is the Frećhet derivative of the nonlinear operator ψ evaluated at the point pk, ψ′∗(pk) is the adjoint of ψ′(pk), and I is the identity operator,
pk+1=pk−[ψ′∗(pk)ψ′(pk)+τkI]−1ψ′∗(pk)ψ(pk). | (2.10) |
At each step of the iterative process, the ODE system is solved with Matlab's ode23s stiff solver.
Let [a,b] and (b,b+c] be the regions where the incident cases are given and where they are forecasted, respectively. In case of a nonparametric discretization, β(t) is projected onto the finite dimensional space spanned by a set of base functions
Bj(t)=B(t−tjh). |
Here B(t) is the cubic B-spline [10,8], h=tj−tj−1, j=0,1,...,l+1, and the vertices of the splines are located at the grid points
t−1<t0=a<t1<⋯<tl−1<tl=T<tl+1, |
for some T>a such that T+3h≥b+c. Thus the base functions, Bj(t), are defined on the overlapping subintervals, whose union covers the entire region, [a,b+c], as illustrated in Figures 1, 3, and 4. The domains for B-splines are selected in such a way that they all intersect with the interval [a,b], and at least some of the domains also intersect with (b,b+c]. Once the least squares problem is solved on [a,b] for the set of unknown expansion parameters, Aj, j=1,...,n, n=l+3, one approximates the transmission rate, β(t), as a linear combination,
β(t)=n∑j=1AjBj(t) |
and extrapolates it to the interval [a,b+c] in order to solve the forward problem and to obtain the forecasting incidence values.
For a parametric discretization approach, the transmission rate is modeled as a predefined function with few parameters to enforce stability. We choose a four parametric hyperbolic decline with initial transmission rate β0, decline rate q, curvature degree ν, and a chosen asymptotic value of the transmission rate ϕ:
β(t)=β0((1−ϕ)(1/(1+qνt)1ν)+ϕ), | (2.11) |
where β0>0, 0≤q≤1, 0≤ν≤1, and ϕ≥0. This monotonically decreasing transmission rate is assumed based on a priori information that appropriate intervention and control measures have been introduced at the early stages of the outbreak, and they continue to remain effective for the entire duration of the epidemic.
To compare the two discretization approaches, which also serve as an important regularization tool complementing the regularization introduced by the damping parameter in the Levenberg-Marquardt procedure, we start by considering numerical examples with synthetic data.
For our first experiment, we choose a model transmission rate in such a way that it could be approximated by a four parametric hyperbolic decline function (2.11) with a reasonable accuracy. To that end, we introduce the following piece-wise defined function
βm(t)={1.46,t≤21.77,0.65+0.81exp(−0.11t+2.3947),t>21.77, | (3.1) |
shown in the left picture of Figure 5. By solving the corresponding forward problem, i.e., the SEIR system with given β(t) and given values of κ and γ presented in Table 2, we generate synthetic incidence data (the right picture in Figure 5).
Variable | Experiment 1 | Experiment 2 |
N | 6,000,000 | 55,000 |
1/κ | 8/7 weeks | 2 days |
1/γ | 6/7 weeks | 3 days |
To quantify uncertainty in the recovered transmission rate, β(t), and in our estimates of future incidence cases, we use the bootstrapping strategy proposed in [11,12]. For the selected model function, βm(t), the corresponding incidence data is perturbed 10 times by adding a simulated error structure with Poisson mean for the number of new case notifications between week j−1 and week j being equal to the "true" incidence, Dj, j=2,...,m. As the result, a different noisy incidence curve is generated for each parameter estimation step, which enables us to use the mean value forecasts as our best estimates for the short-term projections of future incidence cases and the mean value of β(t) as the most accurate approximation of the disease transmission rate.
To start the iterative process (2.10), we take initial guesses of β(t) to be constants that are uniformly distributed in the interval [0.5,2], though in case of parametric discretization for some data sets this interval has to be slightly reduced. Since there would be no prior knowledge of the transmission rate in case of real epidemiological data, we randomly select 10 initial values of β0 for every partial noise contaminated incidence data set to avoid any bias towards any particular starting point and to confirm that there is a broad range of initial guesses for which convergence can be expected.
In case of non-parametric discretization the two regularization parameters, the initial damping factor, τ0, in (2.10) and the step size, h, for the base spline functions, are selected to provide the best possible fit for the "given" partial data set (close but without over-fitting) and to ensure the most agammaessive convergence rate for the Levenberg-Marquardt iterative scheme, which is terminated at p-tolerance and ψ-tolerance equal to 10−5 or when the maximum number of function evaluations is exhausted. In the course of our numerical simulations, we have tried multiple values of h for B-spline discretization in order to analyze how it affects parameter estimation and forecasting. The values h=4, h=12, and h=20 have emerged as near optimal for 10, 30, and 50 data points, respectively. These choices of h give the same size of the solution space for the above three data sets, see Figures 1, 3, and 4.
For parametric discretization, the size of the solution space is fixed, and one has no control as to how much regularization it provides. Thus the initial damping coefficient, τ0, in (2.10) is the only regularization parameter one can vary to balance accuracy and stability. For parametric and non-parametric discretization routines, we used τ0 between 105 and 1013. The over-damping at the start of the iterative process promotes stability. As iterations progress, {τk} goes down, which guarantees convergence of {pk} as long as the stopping rule does not result in over-fitting.
As one can see in Figures 6 and 7, for the highly unstable scenario when only 10 weeks of data are available, parametric discretization provides the amount of regularization that is "just right". Despite of a drastic reduction in the size of the solution space, the reconstruction process does not seem to be over-regularized. It generates the forecasting bundle with the mean value that slightly under-estimates the actual incidence curve. All forecasts are close together, which points to the stability of the inversion algorithm. On the other hand, the non-parametric discretization, while much harder to implement due to the need to adjust two regularization parameters τ and h, generates less stable (and less accurate) results. Its mean value shows almost twice the actual number of cases at the end of week 16. Individual forecasting curves for B-spline discretization deviate from the model data by quite a lot, some under-estimating and some considerably over-estimating the actual case count. This indicates that the reconstruction with B-spline functions is less stable as compared to parametric discretization for early stages of an emerging outbreak, when the transmission rate is constant and, therefore, close in its structure to a four-parametric hyperbolic function (2.11) in the interval [a,b+c].
For 30 weeks of data (half-way through the outbreak, Figures 8 and 9) forecasting with parametric discretization method is much less accurate. It over-estimates future incidence cases showing more than double the actual number at the end of week 36. It cuts through the "real" incidence curve between weeks 19 and 28 showing the wrong inflection point around week 33 for the cumulative data as illustrated in Figure 9. That happens because even though the reconstructed hyperbolic transmission rate (see Figure 8) is close to the model β(t), it does not capture the steep exponential decline that begins at week 22. The recovered β(t) shows a much less agammaessive decent hence resulting in an over-estimate of future incidence cases. None of that happens when the transmission rate is approximated with B-spline functions. For a non-parametric discretization scheme, β(t) is no longer tied to any particular shape and, though erratically, is capable of mimicking the actual model β(t). As the result, the mean forecasting curve follows the model incidence data remarkably well, with only two (out of 100) individual forecasting curves being slightly "off" and the rest of the curves being close together in the forecasting bundle. Thus, even though the B-spline discretization is less stable, for 30 weeks of data it does a much better job forecasting future incidence cases and recovering the model transmission rate as compared to parametric discretization with hyperbolic function.
Finally, for 50 weeks of data what we see in Figures 10 and 11 is essentially the reconstruction from full data set, since the outbreak practically comes to an end between weeks 50 and 60. While parametric method correctly shows the conclusion of the epidemic at this stage, the B-spline approximation erroneously hints at the beginning of a new cycle with the recovered β(t) unexpectedly exhibiting an uphill behavior after 40 weeks of the outbreak. Thus, due to instability and the lack of data between weeks 40 and 50, the B-spline discretization does not succeed in making accurate future projections. On the other hand, the parametric discretization, where hyperbolic decline is built-in, gives rise to more accurate forecasting curves.
There is also a major difference in how the true incidence data is followed for the first 50 weeks by the incidence curves recovered with two competing discretization algorithms. The incidence curve recovered with B-splines is very accurate and follows the "real" data in a very stable manner. At the same time, the incidence curve recovered with hyperbolic parametrization once again cuts through the "real" data over-estimating the number of cases for the first 20 weeks and from week 33 on, while under-estimating between weeks 20 and 33 and shifting the cumulative turning point to the left. This is the exact consequence of the transmission rate being over-estimated in the beginning of the outbreak as well as in its second phase and under-estimated in the middle due to the special nature of the discretization method. For spline base functions the recovered transmission rate is accurate though, again, unstable. So, in fitting data that is available, the non-parametric discretization is the winner.
We now turn our attention to the second experiment with synthetic data. This time the model transmission rate takes the form that is entirely different from a hyperbolic decline as seen in Figure 12 and defined by equation (3.2). We set βm(t) to be exponentially increasing for the first 20 weeks before it stabilizes at a constant level and then goes down with some minor oscillations after week 40:
βm(t)={0.1+1.4exp(0.5(t−20)),t≤20,1.5,20<t<40,0.65+0.85exp(0.1(40−t))+0.05sin(1.2(t−40)),t≥40. | (3.2) |
It is worth mentioning that the corresponding incidence data has very similar shape as compared to the first example, but it picks at almost 6,000 as opposed to 49 in case of the previous model. Thus, the shape of the incidence curve is not always indicative of the behavior of the disease transmission rate, and any a priori assumption regarding the nature of β(t) based on the shape of the incidence curve may easily turn out to be wrong. We also assume that the disease is different in case of the second experiment, which is reflected in the new values of the parameters κ and γ as shown in Table 2.
As one can see from the incidence curve, the effect of the increasing transmission rate at the early stage of the outbreak is delayed and in the first 18 days the number of new cases is very low, until the incidence curve goes up almost vertically and reaches its pick in the next 10 days. The hyperbolic model is not designed to capture this kind of increase in β(t), and even the B-spline discretization does not succeed in recovering the right shape of the transmission rate from the limited data set (see Figures 13 and 14). From 10 days of data, parametric discretization projects steady near-horizontal pattern for days 11 through 16. The B-spline discretization over-estimates the true β(t) and, as the results, forecasts the future uphill behavior too early. Thus, even though parametric discretization recovers a less accurate β(t), it gives a better short-term projection. The B-spline discretization succeeds in predicting a steep rise in new incidence cases, but it shows it sooner than it actually happens. To summarize, in case of the second experiment, from the first 10 days of data one simply does not have enough information to ascertain the true structure of β(t) that would enable us to generate an accurate forecasting curve.
For 30 and 50 days of data (see Figures 15–16 and 17–18, respectively), the B-spline discretization method provides very stable and accurate forecasting information with little to no deviations from the model incidence curve. It captures the right shape of β(t) in the middle of the outbreak and shows the right pick for the incidence curve with the right inflection point for the cumulative number of cases. The B-spline discretization fails, however, to recover the right shape of the transmission rate at the early and late stages of the epidemic. Interestingly, that does not makes the recovered incidence curves any less accurate. The parametric discretization does the best it possibly can: approximates true β(t) by a constant. This kind of approximation suggests that the outbreak picks with 4,000 incidence cases rather than 6,000. It also results in slight over-estimate of the actual number of cases between days 10 and 22 as well as days 33 and 45.
10 Data points | 30 Data points | 50 Data points | |
Experiment 1 | τ0=1010 | τ0=108 | τ0=108 |
h=4 | h=12 | h=20 | |
Experiment 2 | τ0=1012 | τ0=1012 | τ0=1012 |
h=4 | h=12 | h=20 | |
Experiment 3 | τ0=1011 | τ0=1011 | τ0=1013 |
h=4 | h=12 | h=20 |
Overall, the two discretization methods both have their pros and cons. The comparison highlights the importance of using a priori information about the structure of the solution in the regularization algorithm. When this information is relevant, it helps to reinforce stability without considerable loss in accuracy. At the bottom of ill-posedness is always the lack of information. Thus, when the structure of the true solution is incorporated in the numerical algorithm, the algorithm becomes much more efficient. At the same time, the similarity of the incidence curves in the two experiments shows that it is hard to draw a reliable conclusion about the structure of the solution from the shape of the incidence data. And when the wrong structure is incorporated, the forecasting curve based on that structure may turn out to be misleading.
We now move to recovering the disease transmission rate from real data. The real data set illustrates daily incidence cases of influenza in San Francisco during the 1918 "Spanish Flu" pandemic. The influenza pandemic of 1918-19 was a major public health challenge. The virus was extremely contagious and virulent. According to CDC, it killed an estimated 20 to 50 million people worldwide. The data we consider includes 63 days of this epidemic in San Francisco, one of the most affected cities in the Unacted States. We assume a population of 550,000 individuals with infectious rate κ=1/2 (days−1) and recovery rate γ=1/3 (days−1).
For the case of limited data with just 10 points available (see Figures 19 and 20), the parametric discretization generates a perfect forecasting bundle, and its mean value passes through all real data points. For the B-spline discretization, all transmission rates are stuck at their initial values. The corresponding forecasting curves (for the most part) grossly over-estimate the actual number of future incidence cases.
Half way through the outbreak, for 30 data points, the situation is very different as illustrated in Figures 21 and 22. The B-spline discretization does an excellent job by showing the right turning point for the cumulative number of cases and the future downhill behavior of the incidence curve. It fails, however, in predicting the right number of new cases between weeks 30 and 36. The method seems to recover the right shape of β(t), which reflects the dynamics of the outbreak without being over-regularized. At the same time, the epidemic curve obtained with parametric discretization algorithm mistakenly shows an exponential increase in new incidence cases, suggesting that the number of cases will continue to grow until the city runs out of susceptible population. Nevertheless, the parametric discretization covers the real data between weeks 30 and 36 much better as compared to the non-parametric one.
With 50 data points (see Figures 23 and 24), the parametric incidence curve cuts through the real data right in the middle showing less than half the actual number of cases at the pick of the epidemic and over-estimating the actual number of cases for the first 27 days and from day 39 on. It does, however, correctly predict that the outbreak will come to an end not long after day 56 (the last day of forecasting). Contrary to that, the non-parametric incidence curve follows the real data very closely for the entire 50 day time period, leaving out only 3 data points at the top of the epidemic. The non-parametric incidence curve hints at the beginning of the new cycle after day 50 considerably over-estimating the actual number of cases reported between day 52 and 56. It is important to mention that, unlike the case of the first experiment, where the same kind of erroneous forecasting can only be attributed to noise and instability, in case of real data this false alarm is completely justified by the uphill behavior of the actual data between days 48 and 51. The data for this time period does suggest that a new cycle is about to begin, but the data afterwards shows that this is a "false positive". The reconstruction of β(t) for the non-parametric case looks very stable and very reasonable.
Overall, we tend to declare the B-spline discretization a winner in case of the real influenza outbreak, since it correctly predicts the turning point of the epidemic based on 30 days of data. This kind of reliable estimate is crucial for optimal resource allocation and for an adequate design of control measures in the affected area.
Forecasting the trajectory of naturally occurring processes involving social dynamics in real time requires a sensible combination of mathematical and statistical methods together with reliable data sets at different spatial and temporal scales. Their importance can be investigated in different contexts using numerical simulation studies such as the type that we carried out in this paper. More specifically, we have explored the role of parametric and non-parametric discretization methods for both calibration and forecasting of epidemics that are shaped by time-dependent variation in the transmission rate using a simple SEIR compartmental model. Our findings highlight the limitations imposed by insufficient amount of information in time series data about the spread of infectious diseases. However, such limitations can often be handled or remedied through an appropriate balance of model complexity and state-of-the-art numerical methods, particularly regularization methods [5,4,13].
In our simulation study of parametric and nonparametric discretization algorithms using synthetic and real data we have observed that at the early stage (the first 10 weeks/days of the outbreak), parametric discretization consistently provides more accurate (and stable) forecasting results as compared to B-spline approximation. At the same time, half way through the outbreak the nonparametric discretization is superior, while parametric discretization happens to be less reliable and often misses the turning point. This can be explained by the fact that even if the transmission rate is on the rise at the early stage, its impact on the dynamics of the incidence data is delayed. Therefore, the restrictions on the shape of β(t), incorporated in the parametric discretization scheme, are not hurting the forecasting results, which greatly benefit from the stability imposed by a priori information enforced through the parametric approach. With more data, however, the restrictions on the shape of β(t) become a liability keeping the algorithm from recovering a more accurate transmission rate and using it as a more reliable forecasting tool.
To summarize, one of the most challenging aspects in the applications of inverse problem is the need to develop techniques that handle parameter identifiability issues, which often arise owing to over-parameterized models or lack of information in available data. Fortunately, regularization techniques are one way in which scientists can still draw useful conclusions about model parameters and in turn generate potentially helpful forecast that policy makers could use to guide investments in particular control strategies [14]. In our study we found that lack of information in limited time series data is often the main challenge in generating useful parameter estimates and forecasts. This suggests that the incorporation of additional data about the epidemic dynamics in the regularization algorithm could prove useful for better constraining parameters and generating more accurate short-term forecasts of epidemic outbreaks. One such additional data could include social media streams which are being generated by Google Search Trends, Twitter, and Facebook. The hope is that capturing the right signals from these data sources could rapidly inform how the disease process is changing in real time. This is potentially a fruitful area for future research [15,16,17,18,19].
This work is supported by NSF Grant 1818886. DMS Computational Mathematics.
All authors declare no conflict of interest in this paper.
[1] | L. A. Torre, F. Bray, R. L. Siegel, et al., Global cancer statistics, 2012, CA. Cancer J. Clin., 65(2015), 87–108. |
[2] | R. Romagnoli, V. Mazzaferro and J. Bruix, Surgical resection for hepatocellular carcinoma: Moving from what can be done to what is worth doing, Hepatology, 62 (2015), 340–342. |
[3] | N. F. Esnaola, G. Y. Lauwers, N. Q. Mirza, et al., Predictors of microvascular invasion in patients with hepatocellular carcinoma who are candidates for orthotopic liver transplantation, J. Gastrointest. Surg., 6 (2002), 224–232; discussion 232. |
[4] | S. D'Angelo, M. Secondulfo, R. De Cristofano, et al., Selection and management of hepatocellular carcinoma patients with sorafenib: recommendations and opinions from an Italian liver unit, Future Oncol., 9 (2013), 485–491. |
[5] | L. Adnane, P. A. Trail, I. Taylor, et al., Sorafenib (BAY 43-9006, Nexavar), a dual-action inhibitor that targets RAF/MEK/ERK pathway in tumor cells and tyrosine kinases VEGFR/PDGFR in tumor vasculature, Meth. Enzymol., 407 (2006), 597–612. |
[6] | S. Wilhelm, C. Carter, M. Lynch, et al., Discovery and development of sorafenib: a multikinase inhibitor for treating cancer, Nat. Rev. Drug. Discov., 5 (2006), 835–844. |
[7] | S. M. Wilhelm, C. Carter, L. Tang, et al., BAY 43-9006 exhibits broad spectrum oral antitumor activity and targets the RAF/MEK/ERK pathway and receptor tyrosine kinases involved in tumor progression and angiogenesis, Cancer Res., 64 (2004), 7099–7109. |
[8] | O. Waidmann and J. Trojan, Novel drugs in clinical development for hepatocellular carcinoma, Expert Opin. Investig. Drug., 24 (2015), 1075–1082. |
[9] | M. J. Blivet-Van Eggelpoel, H. Chettouh, L. Fartoux, et al., Epidermal growth factor receptor and HER-3 restrict cell response to sorafenib in hepatocellular carcinoma cells, J. Hepatol., 57 (2012), 108–115. |
[10] | K. F. Chen, H. L. Chen, W. T. Tai, et al., Activation of phosphatidylinositol 3-kinase/Akt signaling pathway mediates acquired resistance to sorafenib in hepatocellular carcinoma cells, J. Pharmacol. Exp. Ther., 337 (2011), 155–161. |
[11] | B. Zhai, F. Hu, X. Jiang, et al., Inhibition of Akt reverses the acquired resistance to sorafenib by switching protective autophagy to autophagic cell death in hepatocellular carcinoma, Mol. Cancer Ther., 13 (2014), 1589–1598. |
[12] | A. K. Chow, L. Ng, C. S. Lam, et al., The Enhanced metastatic potential of hepatocellular carcinoma (HCC) cells with sorafenib resistance, PLoS One, 8 (2013), e78675. |
[13] | D. Morgenszternand H. L. McLeod, PI3K/Akt/mTOR pathway as a target for cancer therapy, Anticancer Drugs, 16 (2005), 797–803. |
[14] | A. Parveen, M. S. Akash, K. Rehman, et al., Dual Role of p21 in the Progression of Cancer and Its Treatment, Crit. Rev. Eukaryot. Gene Expr., 26 (2016), 49–62. |
[15] | J. C. Su, P. H. Tseng, S. H. Wu, et al., SC-2001 overcomes STAT3-mediated sorafenib resistance through RFX-1/SHP-1 activation in hepatocellular carcinoma, Neoplasia, 16 (2014), 595–605. |
[16] | Z. Xu, Y. Zhou, Y. Cao, et al., Identification of candidate biomarkers and analysis of prognostic values in ovarian cancer by integrated bioinformatics analysis, Med. Oncol., 33 (2016), 130. |
[17] | Y. Guo, Y. Bao, M. Ma, et al., Identification of key candidate genes and pathways in colorectal cancer by integrated bioinformatical analysis, Int. J. Mol. Sci., 18 (2017). |
[18] | B. Győrffy, P. Surowiak, J. Budczies, et al., Online survival analysis software to assess the prognostic value of biomarkers using transcriptomic data in non-small-cell lung cancer, PloS One., 8 (2013), e82241. |
[19] | C. Stottrup, T. Tsang and Y. R. Chin, Upregulation of AKT3 confers resistance to the AKT inhibitor MK2206 in breast cancer, Mol. Cancer Ther., 15 (2016), 1964–1974. |
[20] | A. Forner, J. M. Llovet and J. Bruix, Hepatocellular carcinoma, Lancet, 379 (2012), 1245–1255. |
[21] | D. Huang, W. Yuan, H. Li, et al., Identification of key pathways and biomarkers in sorafenib-resistant hepatocellular carcinoma using bioinformatics analysis, Exp. Ther. Med., 16 (2018), 1850–1858. |
[22] | X. Wang, W. M. Ghareeb, X. Lu, et al., Coexpression network analysis linked H2AFJ to chemoradiation resistance in colorectal cancer, J. Cell. Biochem., (2018). |
[23] | K. Kohno, M. Chiba, S. Murata, et al., Identification of natural antisense transcripts involved in human colorectal cancer development, Int. J. Oncol., 37 (2010), 1425–1432. |
[24] | R. Karess, Rod-Zw10-Zwilch: A key player in the spindle checkpoint, Trends Cell Biol., 15 (2005), 386–392. |
[25] | Y. Yu, Z. Kovacevicand D. R. Richardson, Tuning cell cycle regulation with an iron key, Cell Cycle, 6 (2007), 1982–1994. |
[26] | A. Fernandez-Vidal, A. Mazarsand S. Manenti, CDC25A: a rebel within the CDC25 phosphatases family?, Anticancer Agents Med. Chem., 8 (2008), 825–831. |
[27] | M. P. Sacristan, S. Ovejeroand and A. Bueno, Human Cdc14A becomes a cell cycle gene in controlling Cdk1 activity at the G(2)/M transition, Cell Cycle, 10 (2011), 387–391. |
[28] | M. T. Paulsen, A. M. Starks, F. A. Derheimer, et al., The p53-targeting human phosphatase hCdc14A interacts with the Cdk1/cyclin B complex and is differentially expressed in human cancers, Mol. Cancer, 5 (2006), 25. |
[29] | F. M. Schmid, K. B. Schou, M. J. Vilhelm, et al., IFT20 modulates ciliary PDGFRalpha signaling by regulating the stability of Cbl E3 ubiquitin ligases, J. Cell. Biol., 217 (2018), 151–161. |
[30] | H. Dong, H. Guo, L. Xie, et al., The metastasis-associated gene MTA3, a component of the Mi-2/NuRD transcriptional repression complex, predicts prognosis of gastroesophageal junction adenocarcinoma, PLoS One, 8 (2013), e62986. |
[31] | A. M. Houghton, Mechanistic links between COPD and lung cancer, Nat. Rev. Cancer, 13 (2013), 233–245. |
[32] | K. Vierlinger, M. H. Mansfeld, O. Koperek, et al., Identification of SERPINA1 as single marker for papillary thyroid carcinoma through microarray meta analysis and quantification of its discriminatory power in independent validation, BMC. Med. Genomics, 4 (2011), 30. |
[33] | H. J. Chan, H. Li, Z. Liu, et al., SERPINA1 is a direct estrogen receptor target gene and a predictor of survival in breast cancer patients, Oncotarget, 6 (2015), 25815–25827. |
[34] | C. H. Kwon, H. J. Park, J. H. Choi, et al., Snail and serpinA1 promote tumor progression and predict prognosis in colorectal cancer, Oncotarget, 6 (2015), 20312–20326. |
[35] | K. Thorsen, F. Mansilla, T. Schepeler, et al., Alternative splicing of SLC39A14 in colorectal cancer is regulated by the Wnt pathway, Mol. Cell. Proteomics, 10 (2011), M110002998. |
[36] | M. Nilbert and E. Rambech, Beta-catenin activation through mutation is rare in rectal cancer, Cancer Genet. Cytogenet., 128 (2001), 43–45. |
[37] | S. Yan, C. Zhou, W. Zhang, et al., β-Catenin/TCF pathway upregulates STAT3 expression in human esophageal squamous cell carcinoma, Cancer Lett., 271 (2008), 85–97. |
[38] | M. Takahashi, T. Tsunoda, M. Seiki, et al., Identification of membrane-type matrix metalloproteinase-1 as a target of the beta-catenin/Tcf4 complex in human colorectal cancers, Oncogene, 21 (2002), 5861–5867. |
[39] | L. Beneduce, F. Castaldi, M. Marino, et al., Improvement of liver cancer detection with simultaneous assessment of circulating levels of free alpha-fetoprotein (AFP) and AFP-IgM complexes, Int. J. Biol. Markers, 19 (2004), 155–159. |
[40] | S. X. Li, L. J. Liu, L. W. Dong, et al., CKAP4 inhibited growth and metastasis of hepatocellular carcinoma through regulating EGFR signaling, Tumour Biol., 35 (2014), 7999–8005. |
[41] | B. J. McMahon, L. Bulkow, A. Harpster, et al., Screening for hepatocellular carcinoma in Alaska natives infected with chronic hepatitis B: a 16-year population-based study, Hepatology, 32 (2000), 842–846. |
[42] | M. Abu El Makarem, An overview of biomarkers for the diagnosis of hepatocellular carcinoma, Hepat. Mon., 12 (2012), e6122. |
[43] | W. J. Zheng, M. Yao, M. Fang, et al., Abnormal expression of HMGB-3 is significantly associated with malignant transformation of hepatocytes, World J. Gastroenterol., 24 (2018), 3650–3662. |
[44] | Z. Jiang, X. Zhai, B. Shi, et al., KIAA1199 overexpression is associated with abnormal expression of EMT markers and is a novel independent prognostic biomarker for hepatocellular carcinoma, Onco. Targets Ther., 11 (2018), 8341–8348. |
[45] | P. Han, H. Li, X. Jiang, et al., Dual inhibition of Akt and c-Met as a second-line therapy following acquired resistance to sorafenib in hepatocellular carcinoma cells, Mol. Oncol., 11 (2017), 320–334. |
[46] | C. H. Wu, X. Wu and H. W. Zhang, Inhibition of acquired-resistance hepatocellular carcinoma cell growth by combining sorafenib with phosphoinositide 3-kinase and rat sarcoma inhibitor, J. Surg. Res., 206 (2016), 371–379. |
[47] | X. Tian, D. Zhou, L. Chen, et al., Polo-like kinase 4 mediates epithelial-mesenchymal transition in neuroblastoma via PI3K/Akt signaling pathway, Cell Death Dis., 9 (2018), 54. |
1. | Riccardo Motti, Wild Plants Used as Herbs and Spices in Italy: An Ethnobotanical Review, 2021, 10, 2223-7747, 563, 10.3390/plants10030563 | |
2. | Carla Zarbà, Gaetano Chinnici, Mario D’Amico, Novel Food: The Impact of Innovation on the Paths of the Traditional Food Chain, 2020, 12, 2071-1050, 555, 10.3390/su12020555 | |
3. | Natália Rohenkohl do Canto, Klaus G. Grunert, Marcia Dutra De Barcellos, Circular Food Behaviors: A Literature Review, 2021, 13, 2071-1050, 1872, 10.3390/su13041872 | |
4. | Carla Zarbà, Gaetano Chinnici, Biagio Pecorino, Gioacchino Pappalardo, Mario D'Amico, Recent scenarios in Italy on fresh-cut products in the Covid-19 context, 2022, 7, 2471-2086, 403, 10.3934/agrfood.2022026 | |
5. | Federica Davì, Maria Fernanda Taviano, Rosaria Acquaviva, Giuseppe Antonio Malfa, Emilia Cavò, Paola Arena, Salvatore Ragusa, Francesco Cacciola, Yassine Oulad El Majdoub, Luigi Mondello, Natalizia Miceli, Chemical Profile, Antioxidant and Cytotoxic Activity of a Phenolic-Rich Fraction from the Leaves of Brassica fruticulosa subsp. fruticulosa (Brassicaceae) Growing Wild in Sicily (Italy), 2023, 28, 1420-3049, 2281, 10.3390/molecules28052281 | |
6. | H. Nasser, S. Baydoun, N. Hani, N. Arnold, L. Chalak, Wild greens traded in the open markets of Lebanon, 2023, 0567-7572, 443, 10.17660/ActaHortic.2023.1384.56 | |
7. | Miriam Patti, Carmelo Maria Musarella, Serena Maria Postiglione, Fabiana Papalia, Maria Consuelo Falcone, Giuseppe Mammone, Maria Angela Zappalà, Valentina Lucia Astrid Laface, Giovanni Spampinato, Ethnobotanical studies on the Tyrrhenian side of the Aspromonte Massif (Calabria, Southern Italy), 2024, 158, 1126-3504, 545, 10.1080/11263504.2024.2336601 | |
8. | Mousaab Alrhmoun, Naji Sulaiman, Andrea Pieroni, Shifting Herbal Knowledge: The Ecological and Cultural Dynamics Behind Plant Use Changes in the Southern Occitan Alps, 2025, 14, 2223-7747, 367, 10.3390/plants14030367 | |
9. | Raluca-Florentina Cretu, Viorel-Costin Banta, Mihaela-Diana Oancea-Negescu, Adrian Anica-Popa, Cornel Dumitru Crecana, Challenges in the Food Market and the Agenda of Future Research Directions in the Context of Sustainability: A Bibliometric Exploration , 2025, 27, 15829146, 486, 10.24818/EA/2025/69/486 |
Variable | Parameter |
N | Total effective population size |
β(t) | Transmission rate |
1/κ | Average incubation period |
1/γ | Average time from the onset of symptoms to recovery |
Variable | Experiment 1 | Experiment 2 |
N | 6,000,000 | 55,000 |
1/κ | 8/7 weeks | 2 days |
1/γ | 6/7 weeks | 3 days |
10 Data points | 30 Data points | 50 Data points | |
Experiment 1 | τ0=1010 | τ0=108 | τ0=108 |
h=4 | h=12 | h=20 | |
Experiment 2 | τ0=1012 | τ0=1012 | τ0=1012 |
h=4 | h=12 | h=20 | |
Experiment 3 | τ0=1011 | τ0=1011 | τ0=1013 |
h=4 | h=12 | h=20 |
Variable | Parameter |
N | Total effective population size |
β(t) | Transmission rate |
1/κ | Average incubation period |
1/γ | Average time from the onset of symptoms to recovery |
Variable | Experiment 1 | Experiment 2 |
N | 6,000,000 | 55,000 |
1/κ | 8/7 weeks | 2 days |
1/γ | 6/7 weeks | 3 days |
10 Data points | 30 Data points | 50 Data points | |
Experiment 1 | τ0=1010 | τ0=108 | τ0=108 |
h=4 | h=12 | h=20 | |
Experiment 2 | τ0=1012 | τ0=1012 | τ0=1012 |
h=4 | h=12 | h=20 | |
Experiment 3 | τ0=1011 | τ0=1011 | τ0=1013 |
h=4 | h=12 | h=20 |