Loading [MathJax]/jax/output/SVG/jax.js
Research article Special Issues

Identifiability investigation of within-host models of acute virus infection


  • Uncertainty in parameter estimates from fitting within-host models to empirical data limits the model's ability to uncover mechanisms of infection, disease progression, and to guide pharmaceutical interventions. Understanding the effect of model structure and data availability on model predictions is important for informing model development and experimental design. To address sources of uncertainty in parameter estimation, we used four mathematical models of influenza A infection with increased degrees of biological realism. We tested the ability of each model to reveal its parameters in the presence of unlimited data by performing structural identifiability analyses. We then refined the results by predicting practical identifiability of parameters under daily influenza A virus titers alone or together with daily adaptive immune cell data. Using these approaches, we presented insight into the sources of uncertainty in parameter estimation and provided guidelines for the types of model assumptions, optimal experimental design, and biological information needed for improved predictions.

    Citation: Yuganthi R. Liyanage, Nora Heitzman-Breen, Necibe Tuncer, Stanca M. Ciupe. Identifiability investigation of within-host models of acute virus infection[J]. Mathematical Biosciences and Engineering, 2024, 21(10): 7394-7420. doi: 10.3934/mbe.2024325

    Related Papers:

    [1] Abdessamad Tridane, Yang Kuang . Modeling the interaction of cytotoxic T lymphocytes and influenza virus infected epithelial cells. Mathematical Biosciences and Engineering, 2010, 7(1): 171-185. doi: 10.3934/mbe.2010.7.171
    [2] A. M. Elaiw, Raghad S. Alsulami, A. D. Hobiny . Global dynamics of IAV/SARS-CoV-2 coinfection model with eclipse phase and antibody immunity. Mathematical Biosciences and Engineering, 2023, 20(2): 3873-3917. doi: 10.3934/mbe.2023182
    [3] Archana N. Timsina, Yuganthi R. Liyanage, Maia Martcheva, Necibe Tuncer . A novel within-host model of HIV and nutrition. Mathematical Biosciences and Engineering, 2024, 21(4): 5577-5603. doi: 10.3934/mbe.2024246
    [4] Babak Khorsand, Abdorreza Savadi, Javad Zahiri, Mahmoud Naghibzadeh . Alpha influenza virus infiltration prediction using virus-human protein-protein interaction network. Mathematical Biosciences and Engineering, 2020, 17(4): 3109-3129. doi: 10.3934/mbe.2020176
    [5] Vivek Sreejithkumar, Kia Ghods, Tharusha Bandara, Maia Martcheva, Necibe Tuncer . Modeling the interplay between albumin-globulin metabolism and HIV infection. Mathematical Biosciences and Engineering, 2023, 20(11): 19527-19552. doi: 10.3934/mbe.2023865
    [6] Ming-Zhen Xin, Bin-Guo Wang, Yashi Wang . Stationary distribution and extinction of a stochastic influenza virus model with disease resistance. Mathematical Biosciences and Engineering, 2022, 19(9): 9125-9146. doi: 10.3934/mbe.2022424
    [7] Jinhu Xu, Yicang Zhou . Bifurcation analysis of HIV-1 infection model with cell-to-cell transmission and immune response delay. Mathematical Biosciences and Engineering, 2016, 13(2): 343-367. doi: 10.3934/mbe.2015006
    [8] Muntaser Safan . Mathematical analysis of an SIR respiratory infection model with sex and gender disparity: special reference to influenza A. Mathematical Biosciences and Engineering, 2019, 16(4): 2613-2649. doi: 10.3934/mbe.2019131
    [9] Suqi Ma . Low viral persistence of an immunological model. Mathematical Biosciences and Engineering, 2012, 9(4): 809-817. doi: 10.3934/mbe.2012.9.809
    [10] Eunha Shim . Optimal strategies of social distancing and vaccination against seasonal influenza. Mathematical Biosciences and Engineering, 2013, 10(5&6): 1615-1634. doi: 10.3934/mbe.2013.10.1615
  • Uncertainty in parameter estimates from fitting within-host models to empirical data limits the model's ability to uncover mechanisms of infection, disease progression, and to guide pharmaceutical interventions. Understanding the effect of model structure and data availability on model predictions is important for informing model development and experimental design. To address sources of uncertainty in parameter estimation, we used four mathematical models of influenza A infection with increased degrees of biological realism. We tested the ability of each model to reveal its parameters in the presence of unlimited data by performing structural identifiability analyses. We then refined the results by predicting practical identifiability of parameters under daily influenza A virus titers alone or together with daily adaptive immune cell data. Using these approaches, we presented insight into the sources of uncertainty in parameter estimation and provided guidelines for the types of model assumptions, optimal experimental design, and biological information needed for improved predictions.



    The study of host-virus interactions using dynamical models (within-host models) has improved our understanding of the mechanistic interactions that govern chronic infections caused by pathogens such as human immunodeficiency virus [1,2] and hepatitis B virus[4], and mechanistic interactions that govern acute infections caused by pathogens such as influenza virus [5,6], dengue virus [7,8], Zika virus [10], and severe acute respiratory syndrome coronavirus 2 [11,12]. Regardless of the virus considered, the most basic within-host model has a general structure that includes the interaction between the cells susceptible to the virus, the cells infected by the virus, and the virus at short (acute) and long (chronic) timescales. The emergence of unexpected dynamics in the virus data, new information about the virus' life-cycle, data describing host immunity to the infection, or a combination of some or all of the above, may require addition of complexity into the within-host modeling process (see [13,14] for a review).

    Data fitting techniques for simple or complex within-host models use (normalized) least-squares approaches, in which the Euclidean distance between the data and the mathematical model is minimized with respect to the unknown parameters. The first step in the parameter estimation algorithm is to provide an initial guess for each parameter based on prior biological knowledge, such as the duration of eclipse stages, life-span of an infected cell and/or virus in vitro, and knowledge from modeling of virus dynamics of related viruses. When prior knowledge is unknown, the user makes the assumption that any positive parameter guess is acceptable. Then, an optimization search algorithm is employed until a termination criterion is reached. For many within-host mathematical models and corresponding datasets, the optimization is ill-posed due to the structure of the model and/or the frequency of the data [15]. As a result, some parameters may be difficult or impossible to quantify. To determine whether the uncertainty in parameter estimations is due to the model or the data, both structural and practical identifiability questions need to be addressed.

    Structural identifiability investigates the ability of a model to reveal its unknown parameters from noise-free infinite amount of data [16,17,18]. When nonstructural identifiability of parameters occurs, it is important to find the source of non-identifiability, such as correlation between model parameters. This allows the user to propose additional assumptions needed to make the model structurally identifiable. Only after the structural identifiability of the unknown parameters is guaranteed, can one conduct data fitting schemes to estimate parameter values.

    Practical identifiability investigates the ability of a model to reveal unknown structurally identifiable parameters under scarce and noisy (among subjects) data, often examined using Monte Carlo simulations [18,19,20], the Fisher information matrix (FIM) or correlation matrix [16,21,22,23], Bayesian method [24], and the profile likelihood method [25,26,27]. As with the structural identifiability, it is important to identify whether the practical identifiability issues are due to model structure. Additionally, it is important to determine whether increased data frequency, availability of data measurements for more than one model variable, and/or relaxing restrictions imposed on the unknown parameters can improve practical identifiability issues.

    To address these important considerations in model validation, one needs to compare a set of models for the same virus infection system and the same empirical data. Here, we accomplish that by investigating four previously developed models of influenza A virus (IAV) infection in mice [28]. The first three models, all validated with the same virus titer dataset, are ranging from the basic within-host model to models with increased complexity through the addition of nonlinear terms and/or the inclusion of additional variables for the host cell populations infected by the influenza virus. The fourth model is the most complex, due to the addition of both nonlinear terms and variables for the host immune system. This results in a large number of unknown parameters. To compensate for the added complexity, this model is validated with two datasets: the same virus titer data and an additional immune cell population data.

    The goal of this study is to determine how model complexity and data availability induce uncertainty in parameter estimates. Using the proposed models as proof of concept, we aim to provide a framework for model validation, from structural to practical identifiability, that can be generalized to other models of virus infections.

    We consider four within-host models of acute infections used to describe influenza A virus infection in mice [5]. They all describe the same influenza A virus titer data, but they account for increased biological complexity, as follows. Model 1 assumes that influenza A virus infects all available susceptible target cells before being cleared according to first order infected cells death and viral clearance rates (target cell limitation); Model 2 explains an observed viral biphasic decay in the data by assuming a second order (density dependent) infected cell killing rate; Model 3 explains an observed viral expansion delay in the data by assuming the presence of an eclipse phase; and Model 4 utilizes a secondary immune cells dataset by adding a model population that describes immune-mediated antiviral responses. With each model, we include biological realism that describes the dynamics of virus expansion and decay in more detail, while at the same time increasing model complexity through the addition of nonlinearities and increased numbers of model parameters. The flow charts of the four models are presented in Figure 1. Below we describe all models in detail, and address the ability of accounting for complexity given the available data by investigating structural and practical identifiability of each model considered.

    Figure 1.  Flow charts for Models 1–4.

    Model 1 is the classical target-cell limitation model of viral infection, which considers the interaction between target cells, infected cells, and virus as follows [5,28]. Target cells, T, interact with the virus, V, at rate β to become infected cells I. Infected cells die at per capita rate δ and produce virus at rate π. Virus is cleared at rate c. Model 1 is described by the system of ordinary differential equations (ODE) Eq (2.1) below,

    Model 1: dTdt=βTV,dIdt=βTVδI,dVdt=πIcV, (2.1)

    with initial conditions T(0)=T0, I(0)=I0, and V(0)=0.

    Experimental data has shown that, following peak expansion, virus decays in a biphasic manner. To capture the dynamics of viral decay, a modified death rate has been considered. It assumes that the rate of infected cell clearance increases as the density of infected cells decreases, as described by δ/(Kδ+I), where δ is the maximum per capita death rate and Kδ is infected cell population where death rate is half-maximal [28]. This leads to the modified target-cell limitation Model 2 given by the ODE system Eq (2.2) below,

    Model 2: dTdt=βTV,dIdt=βTVδKδ+II,dVdt=πIcV, (2.2)

    with initial conditions T(0)=T0, I(0)=I0, and V(0)=0.

    It was observed experimentally that, following influenza A virus exposure, there is a delay between infection of target cells and viral production by infected cells [29]. The delay was accounted for by assuming that, upon infection, cells enter an eclipse phase I1, where cells are infected but do not produce virus. They become productively infected I2 after 1/k days [6], where 1/k is the average time spent in eclipse phase. This leads to the target-cell limitation model with eclipse phase Model 3 given by the ODE system Eq (2.3) below,

    Model 3: dTdt=βTV,dI1dt=βTVkI1,dI2dt=kI1δKδ+I2I2,dVdt=πI2cV, (2.3)

    with initial conditions T(0)=T0, I1(0)=I0, I2(0)=0, and V(0)=0.

    The first three models do not explicitly account for any immune responses, but indirectly assume infected cell loss at nonspecific rate δ (or δ/(Kδ+I2)) and viral clearance at nonspecific rate c. The observed biphasic viral decay captured by Models 2 and 3 given by Eqs (2.2) and (2.3), however, has the additional feature that the timing of the second phase viral decay coincides with the development of adaptive immune cells in the form of CD8+ T cells, which are responsible for killing infected cells and resolving the infection [5]. To account for adaptive immunity (especially in the presence of immune cell data), an additional variable E is considered. It only accounts for the effector CD8+ T cell population (and ignores the memory CD8+ T cell population), as follows. In the absence of infection, a baseline of influenza A virus-specific effector CD8+ T cells are present, E(0)=E0. Infection results in recruitment of additional effector CD8+ T cells at a rate proportional to the productively infected cells I2. This is modeled in a density dependent manner at rate λ/(KE+E), where λ is the maximum influx and KE is the effector CD8+ T cell population where the influx is half-maximal. Effector CD8+ T cells proliferate in the presence of infection. This is modeled by a delayed term ηI2(tτI)E, which assumes that expansion occurs following interaction between effector CD8+ T cells and cells that became productively infected τI days ago. To account for effector CD8+ T cells function, the model assumes that effector CD8+ T cells kill infected cells in a density dependent manner modeled by the term δE/(Kδ+I2), where δE is the maximum per capita killing rate and Kδ is the I2 concentration where the killing is half-maximal. A nonspecific infected cell killing rate δ is still considered. The resulting delay differential equations (DDE) immune model is described by the DDE system Eq (2.4) below,

    dTdt=βTV,dI1dt=βTVkI1,dI2dt=kI1δI2δEKδ+I2EI2,dVdt=πI2cV,dEdt=λKE+EI2+ηEI2(tτI)dEE, (2.4)

    with initial conditions T(0)=T0, I1(0)=I0, V(0)=0, E(0)=E0, and I2(t)=0 for τIt0.

    To unify the goal of investigating uncertainty in parameter estimates when fitting ODE systems of virus dynamics to data, we first approximate the DDE system given by Eq (2.4) with an ODE system as follows [30]. For a delay of τI days, we incorporate n dummy variables which all span τI/n days in the variable I2's dynamics. Briefly, we let yi be the productively infected cell populations at times tinτI days post infection, for i=1,...,n, and consider the following ODE system for dummy variables yi(t),

    dy1dt=I2nτIy1,dyidt=nτIyi1nτIyi,dyndt=nτIyn1nτIyn, (2.5)

    with yi(0)=0 for i=1,...,n. Then, the delayed productively infected cell population is given by

    I2(tτI)yn(t). (2.6)

    Without loss of generality, we assume n=3. The corresponding immune Model 4 is given by the ODE system Eq (2.7) below,

    Model 4: dTdt=βTV,dI1dt=βTVkI1,dI2dt=kI1δI2δEKδ+I2EI2,dVdt=πI2cV,dEdt=λKE+EI2+ηEy3dEE,dy1dt=I23τIy1,dy2dt=3τIy13τIy2,dy3dt=3τIy23τIy3, (2.7)

    with initial conditions T(0)=T0, I1(0)=I0, I2(0)=0, V(0)=0, E(0)=E0, and yi(0)=0 for i=1,2,3.

    To study the structural identifiability of the Models 1–4, we rewrite them in the following general form

    x(t)=f(x,p), (3.1)

    and the observations as

    y(t)=g(x,p). (3.2)

    Here, x denotes the state variables, p is the parameter vector, and y is the output (given by the empirical data), also called the observations. The generic model given by Eq (3.1) is called structurally identifiable if the parameter vector p can be determined uniquely from the observations given by the smooth curve y(t). Otherwise, it is said to be unidentifiable. The formal definition of structural identifiability is provided below.

    Definition 1. Let p and ˆp be two distinct parameter vectors. Model Eq (3.1) is said to be globally (uniquely) structurally identifiable if and only if,

    g(x,p)=g(x,ˆp)impliesp=ˆp.

    Definition 2. Model Eq (3.1) is said to be locally structurally locally identifiable if for any p within an open neighborhood of ˆp in the parameter space,

    g(x,p)=g(x,ˆp)impliesp=ˆp.

    Various methods have been proposed for analyzing the structural identifiability of ODE models [16,17,31]. In this study, we employ the differential algebra approach. It performs the elimination of unobserved state variables, resulting in equations expressed as functions of model parameters and observed state variables. These are referred to as the input-output equations, and are differential-algebraic polynomials consisting of the outputs, y(t), with model parameters, p, as coefficients. The formal definition of structural identifiability within the differential algebra approach for model Eq (3.1) is provided below.

    Definition 3. Let c(p) denote the coefficients of the input-output equation corresponding to model Eq (3.1). We say that model Eq (3.1) is structurally identifiable from unlimited observations y(t) if and only if,

    c(p)=c(ˆp)impliesp=ˆp.

    Studying structural identifiability of ODE models using the differential algebra methods can be accomplished using several platforms and available open-source software. Here, we present three such platforms: the differential algebra for identifiability of system (DAISY) [32], the identifiable combinations web application (COMBOS) [33], and the StructuralIdentifiability.jl in JULIA [34].

    There are many similarities among the three methods. All of them offer insights into the structural identifiability status of each parameter by categorizing them into locally identifiable, globally identifiable, or non-identifiable. They employ a differential elimination method to calculate input-output equations of the considered system, and test the one-to-one map between the coefficients of the input-output equations and model parameters. COMBOS and the StructuralIdentifiability.jl package in JULIA are superior to DAISY, as they provide globally identifiable parameter correlations in an otherwise non-identifiable system. Even though DAISY does not print parameter correlations, the correlations can be derived using the coefficients of the input-output equations and algebraic manipulations in software such as MATHEMATICA. Of the three software, COMBOS does not print the input-output equations, making for a faster (yet more opaque) platform. Previous studies have shown that COMBOS works best for small to medium-size models and is not assured for models with large parameter vectors [33,35,36]. While highly similar, it is up to the user to determine which software is best suited for studying the identifiability of the models considered.

    To determine whether the considered models can reveal their parameters, we examine the structural identifiability of Models 1–3, given by Eqs (2.1)–(2.3), under unlimited observations of viral load and the structural identifiability of Model 4, given by Eq (2.7), under unlimited combined observations of viral load and effector CD8+ T cell concentrations. We used the differential algebra software DAISY.

    We assume that all Model 1's parameters p={β,δ,π,c} are unknown and that we have unlimited empirical observations of the viral load, y(t)=V(t). Using DAISY [32], we obtain the following input-output equation in variable V and model parameters p,

    0=VVVV+VV2β+VV(c+δ)V2(c+δ)+VV2β(c+δ)+V3βcδ. (3.3)

    By Definition 3, we need to examine whether another set of parameters, ˆp={ˆβ,ˆδ,ˆπ,ˆc} can produce the same empirical observation V(t), making the map from the parameter space p to the coefficients of input-output equation Eq (3.3) one-to-one. The coefficients of input-output equation Eq (3.3) are c(p)={β,c+δ,cδ}. To determine whether the map from the parameter space p to the coefficients c(p) is one-to-one, we set c(p)=c(ˆp), which is the following system:

    {β=ˆβ,c+δ=ˆc+ˆδ,cδ=ˆcˆδ}. (3.4)

    Solving Eq (3.4) results in the following two sets of solutions:

    S1:{β=ˆβ,c=ˆc,δ=ˆδ},S2:{β=ˆβ,c=ˆδ,δ=ˆc}.

    Hence, only the infection rate β is globally structurally identifiable, while the infected cells killing rate δ and the virus clearance rate c are locally identifiable. Lastly, the virus production rate π does not appear in the input-output equation Eq (3.3). Therefore, it is not structurally identifiable. We summarize the results for Model 1 below (see Table 1).

    Table 1.  Structural identifiability results for Models 1–4 using three software: DAISY, StructuralIdentifiability.jl package in JULIA, and COMBOS.
    ModelObserved states DAISY JULIA COMBOS
    Model 1
    unknown
    initial
    conditions
    V(t) β
    globally identifiable
    {c,δ}
    locally identifiable
    Correlations:
    c+δ=ˆc+ˆδ,
    cδ=ˆcˆδ
    β
    globally identifiable
    {c,δ}
    locally identifiable
    π
    nonidentifiable
    Correlations:
    c+δ=ˆc+ˆδ,
    cδ=ˆcˆδ
    β
    globally identifiable
    {c,δ}
    locally identifiable
    with 2 solutions
    Model 2
    unknown
    initial
    conditions
    V(t) {β,c}
    globally identifiable
    Correlations:
    δπ=ˆδˆπ, πKδ=ˆπ^Kδ
    {β,c}
    globally identifiable
    {π,Kδ,δ}
    nonidentifiable
    Correlations:
    δπ=ˆδˆπ, πKδ=ˆπ^Kδ
    {β,c,Kδπ,δπ}
    globally identifiable
    Model 3
    unknown
    initial
    conditions
    V(t) {β,c,k}
    globally identifiable
    Correlations:
    δKδ=ˆδ^Kδ, πKδ=ˆπ^Kδ
    {β,c,k}
    globally identifiable
    {π,Kδ,δ}
    nonidentifiable
    Correlations:
    δπ=ˆδˆπ, πKδ=ˆπ^Kδ
    {β,c,Kδπ,δπ}
    globally identifiable
    Model 4
    unknown
    initial
    conditions
    V(t),E(t) {β,c,dE,δ,k,KE,τI}
    globally identifiable
    Correlations:
    δEη=ˆδEˆη,
    Kδη=ˆKδˆη,
    λη=ˆλˆη, πη=ˆπˆη
    {β,c,dE,δ,k,KE,τI}
    globally identifiable
    {δE,Kδ,π,λ,η}
    nonidentifiable
    Correlations:
    Kδπ=ˆKδˆπ,
    Kδλ=ˆKδˆλ,
    ηKδ=ˆηˆKδ,
    δEπ=ˆδEˆπ
    identifiability
    unknown
    All models
    known
    initial
    conditions
    except for E0
    V(t)
    V(t),E(t)
    Models 1–3
    globally identifiable
    Model 4
    identifiability
    unknown
    Models 1–4
    globally identifiable
    Models 1–4
    identifiability
    unknown

     | Show Table
    DownLoad: CSV

    Proposition 1. Model 1 given by Eq (2.1) is not structured to identify all of its parameters from unlimited viral load observations, V(t). More precisely, parameter β is globally structurally identifiable, parameters c and δ are locally structurally identifiable, and parameter π is not structurally identifiable. Moreover, Model 1 is globally structural identifiable under known initial conditions.

    We assume that all parameters p={β,δ,Kδ,π,c} of Model 2, given by Eq (2.2), are unknown and that we have unlimited empirical observations of the viral load, y(t)=V(t). Using DAISY, we obtain the following input-output equation,

    0=VV2V+2VVV2c+2VVVKδπ+VV3c2+2VV2cKδπ+VVK2δπ2VV3+VV2V2βVV2Vc2VV2Kδπ+2VVV3βc+VVV2(2βKδπ+c2)VVK2δπ2+VV4βc2+VV3c(2βKδπ+c2)+VV2Kδπ(βKδπ+2c2)+VVKδπ2(cKδ+δ)V4c+V3V2βc2V3Vc2+V3π(2cKδδ)+2V2V3βc2+V2V2(2βcKδπ+βδπc3)2V2Vcπ(cKδ+δ)V2Kδπ2(cKδ+δ)+VV4βc3+2VV3βcπ(cKδ+δ)+VV2π(βcK2δπ+βδKδπc2δ)+V4βc2δπ+V3βcδKδπ2. (3.5)

    As before, we examine whether another set of parameters, ˆp, can produce the same empirical observation V(t), making the map from the parameter space p to the coefficients of input-output equation Eq (3.5) one-to-one. If we set c(p)=c(ˆp), we obtain

    c=ˆc,β=ˆβ,Kδπ=ˆKδˆπ,cπ(cKδ+δ)=ˆcˆπ(ˆc^Kδ+ˆδ),βc2δπ=ˆβˆc2ˆδˆπ,βcKδπ+βδπc3=ˆβˆc^Kδˆπ+ˆβˆδˆπˆc3,Kδπ2(ckδ+δ)=^Kδˆπ2(ˆc^Kδ+ˆδ),

    with solutions

    {c=ˆc,β=ˆβ,δπ=ˆδˆπ,πKδ=ˆπˆKδ}.

    Hence, Model 2 is not structurally identifiable. In particular, infection rate β, viral clearance rate c, and the products δπ, Kδπ (but not the individual parameters δ, π, and Kδ) are globally identifiable. Since the correlations δπ and Kδπ are known, fixing one of these parameters can make model Eq (2.2) identifiable. We summarize the structureal identifiability results for Model 2 below (see Table 1).

    Proposition 2. Model 2 given by Eq (2.2) is not structured to identify all of its parameters from unlimited viral load observations, V(t). More precisely, parameters β and c are globally structurally identifiable. Moreover, the parameter products δπ and Kδπ are globally identifiable. Since the correlations are known, fixing δ, π, or Kδ makes the Model 2 globally structurally identifiable from unlimited observations V(t). Moreover, Model 2 is globally structural identifiable under known initial conditions.

    We assume that all parameters p={β,δ,k,δEKδ,π,c} of Model 3, given by Eq (2.3), are unknown and that we have unlimited empirical observations of the viral load, y(t)=V(t). Using DAISY, we can derive the input-output equations (they are too messy and will not be presented here). As before, we examine whether another set of parameters, ˆp, can produce the same empirical observation V(t), making the map from parameter space p to coefficients of input-output equation (not shown) one-to-one. If we set c(p)=c(ˆp), we obtain

    {c=ˆc,β=ˆβ,k=ˆk,δKδ=ˆδ^Kδ,πKδ=ˆπ^Kδ}.

    Hence, Model 3 is not structurally identifiable. In particular, the infection rate β, the eclipse parameter k, the viral clearance rate c, the ratio δ/Kδ, and the product Kδπ (but not the individual parameters δ, π, and Kδ) are globally identifiable. Since the correlations are known, fixing one of these parameters makes the model Eq (2.3) identifiable. We summarize the results for Model 3 below (see Table 1).

    Proposition 3. Model 3 given by Eq (2.3) is not structured to identify all of its parameters from unlimited viral load observations, V(t). More precisely, parameters β, k, and c are globally structurally identifiable. Moreover, the parameter ratio δ/Kδ and the parameter product Kδπ are globally identifiable. Since the correlations are known, fixing the parameter δ, π, or Kδ makes Model 3 globally structurally identifiable from unlimited observations V(t). Moreover, Model 3 is globally structural identifiable under known initial conditions.

    To study the structural identifiability of Model 4 (given by Eq (2.7)), we assume that all parameters, p={β,δ,k,δEKδ,π,cλ,η,dE,τI,E0}, are unknown and that we have unlimited empirical observations for the viral load y1(t)=V(t) and the effector cell CD8+ T cell data y2(t)=E(t). Using DAISY, we can obtain input-output equations (they are messy and will not be presented here). As before, we examine whether another set of parameters, ˆp, can produce the same empirical observations V(t) and E(t), making the map from the parameter space p to the coefficients of input-output equations (not shown) one-to-one. If we set c(p)=c(ˆp), we obtain

    {c=ˆc,β=ˆβ,k=ˆk,dE=ˆdE,δ=ˆδ,KE=ˆKE,τI=ˆτI,δEη=ˆδEˆη,Kδη=ˆKδˆη,λη=ˆλˆη,πη=ˆπˆη}.

    Hence, Model 4 is not structurally identifiable. In particular, the infection rate β, the eclipse parameter k, the viral clearance rate c, the effector cells death rate dE, the generic killing rate δ, the half-maximal level KE, the delay τI, the ratios λ/η and π/η, and the products dEη and Kδη (but not the individual parameters δE,η,Kδ,π,λ) are globally identifiable. If the parameter η is fixed, then the model Eq (2.7) becomes identifiable. We summarize the results for Model 4 below (see Table 1).

    Proposition 4. Model 4 given by Eq (2.7) is not structured to identify all of its parameters from unlimited viral load and effector cell observations, V(t) and E(t). More precisely, parameters β, k, c, dE, δ, KE, and τI are globally structurally identifiable. Moreover, the parameter ratios λ/η and π/η and the parameter products dEη and Kδη are globally identifiable. If the parameter η is fixed, then Model 4 becomes globally structurally identifiable from unlimited observations V(t) and E(t).

    We do not know (from DAISY) whether knowing initial conditions guarantees global stability of Model 4 (see Table 1).

    Studying structural identifiability of ODE models can be achieved using software other than DAISY To determine how these methods compare, results from three platforms, DAISY, COMBOS [33], and StructuralIdentifiability.jl in JULIA [34], for Models 1–4 are presented side by side in Table 1.

    We find that all three software uncover the same structural identifiability results for Models 1–3. On the other hand, DAISY and StructuralIdentifiability.jl in JULIA uncover the same identifiability results (while COMBOS cannot find results) for Model 4 under unknown initial conditions. Even though Models 3 and 4 employ different interpretations of the parameter correlations among platforms, simple algebraic manipulations show that the obtained correlations are equivalent. Given the similarity in the results among Models 1–3, it is up to the user to decide which of the three software is best suited for their analysis. Similarly, given the similarity in the results among DAISY and StructuralIdentifiability.jl in JULIA for Model 4 with unknown initial conditions, it is up to the user to decide which of the two software is best suited for their analysis. However, only StructuralIdentifiability.jl in JULIA can be used to determine the structural identifiability of Model 4 with unknown E0 and known other initial conditions. Hence, for larger systems with nonlinear terms of interactions, this method should be employed.

    We use previously published longitudinal influenza A infectious virus titer and CD8+ T cell data in mice from Smith et al. [5]. Adult mice were inoculated intranasally with 75 TCID50 of mouse adapted influenza A/Puerto Rico/8/34 (H1N1) (PR8) virus.

    Total infectious virus (log10 TCID50 per lung) was measured for ten mice each day. Nine days after inoculation, the infectious virus was no longer detectable in any of the mice. Therefore, we only consider infectious virus titer data from the first nine days post inoculation in our analyses. We let E(Vdata(i)) be the mean infectious virus titer data at day i={1,...,9} and Var(Vdata(j)) be the infectious virus titer variance at days i={1,...,9} among the ten mice.

    Moreover, total effector CD8+ T cells (cells per lung) were measured daily for five mice. Since influenza A-specific effector CD8+ T cells were detectable for all twelve days of the study, we consider effector CD8+ T cells data from the first twelve days post inoculation in our analyses. We let E(Edata(j)) be the mean CD8+ T cell data (per lung) at day j={1,...,12} and Var(Edata(j)) be the CD8+ T cell data variance at days j={1,...,12} among the five mice.

    For all models, we assume known initial conditions T(0)=107 cells/ml, I(0)=75 cells/ml, and V(0)=0 virus/ml as in [5]. For Models 3 and 4, we additionally assume that I2(0)=0, and for Model 4, we assume yi(0)=0, for i=1,2,3. For Model 4, we assume E(0)=E0 is unknown, therefore adding E0 to the parameter vector to be estimated from the data. Lastly, we assume all parameters are unknown. When parameters are either very large or very small, we estimate their value on natural logarithmic scale. In particular, we estimate p1={ln(β),δ,π,c} for Model 1, p2={ln(β),ln(δ),ln(Kδ),π,c} for Model 2, p3={ln(β),ln(δ),ln(Kδ),π,c,k} for Model 3, and p4={ln(β),δ,ln(Kδ),π,c,k,δE,ln(η),ln(λ),dE,ln(KE),τI,ln(E0)} for Model 4.

    To estimate parameters p1p3, we fit the predicted viral load log10Vwmodel(i) given by Models 1–3 to the longitudinal mean (among the ten mice) infectious virus (log10 TCID50 per lung) E(Vdata(i)), knowing that the variance in the data at day i is Var(Vdata(i)), for i={1...9} days. We assume that the data satisfies the following statistical model [19,37]

    E(Vdata(i))=log10Vwmodel(i,pw)+ϵiVar(Vdata(i)), (4.1)

    where Vwmodel(i) is the predicted virus trajectory given by Model w at days i={1,...,9} post infection; p1={ln(β),δ,π,c}, p2={ln(β),ln(δ),ln(Kδ),π,c}, and p3={ln(β),ln(δ),ln(Kδ),π,c,k}; and ϵi are independent and identically distributed with mean zero and standard deviation σ. Given the statistical model Eq (4.1), we assume that the measured data, E(Vdata(i)), follows a normal distribution with a mean equal to the model prediction log10Vwmodel(i) and with variance equal to σ2Var(Vdata(i)). Moreover, the availability of measurements from several animals that vary with time allows us to account for the change in data variance over time, Var(Vdata(i)). Therefore, we consider the following functional (weighted residual sum of squares), to estimate the model parameters,

    RSSw(pw)=9i=1(log10Vwmodel(i,pw)E(Vdata(i)))2Var(Vdata(i)). (4.2)

    Consequently, parameters of Models 1–3 are estimated by minimizing the weighted least-squares given by

    Jw(pw)=minpwRSSw(pw). (4.3)

    Moreover, to estimate parameters p4, we fit both the predicted viral load log10V4model(i), given by Model 4, to the longitudinal mean (among ten mice) infectious virus E(Vdata(i)) (knowing that the variance in the data at days i={1...9} is Var(Vdata(i))) and the predicted effector cell population log10E4model(j), given by Model 4, to the longitudinal mean (among five mice) CD8+ T cell data E(Edata(j)) (knowing that the variance in the data at days j={1...12} is Var(Edata(j))). We assume that the data is satisfying the following statistical model [19,37]

    E(Vdata(i))=log10V4model(i,p4)+ϵiVar(Vdata(i)), (4.4)
    E(Edata(j))=log10E4model(j,p4)+ηjVar(Edata(j)), (4.5)

    where V4model(i) is the predicted virus trajectory given by Model 4 at days i={1,...,9} post infection, E4model(j) is the predicted CD8+ T cell population given by Model 4 at days j={1,...,12} post infection, and p4={ln(β),δ,ln(Kδ),π,c,k,δE,ln(η),ln(λ),dE, ln(KE),τI,ln(E0)}. Here, ϵi and ηj are independent and identically distributed with mean zero and standard deviations σV and σE, respectively. As before, the measured data E(Edata(j)) follows a normal distribution whose mean is the model prediction log10Ewmodel(i) and whose variance is σ2EVar(Edata(j)). We consider the following functional (weighted residual sum of squares),

    RSS4(p4)=u19i=1(log10V4model(i,p4)E(Vdata(i)))2Var(Vdata(i))+u212j=1(log10Emodel(j,p4)E(Edata(j)))2Var(Edata(j)). (4.6)

    Consequently, parameters of Model 4 are estimated by minimizing the weighted least-squares given by,

    J4(p4)=minp4RSS4(p4). (4.7)

    Note that we weighted the virus and effector cells contributions, with weights u1=1 and u2=maxjVar(Edata(j))/maxiVar(Vdata(i)). We minimize all least-square functionals RSSw using the fmincon function in MATLAB with ranges for parameters pw given in Table 2.

    Table 2.  Upper and lower bounds for parameters estimated by fitting Models 1–3 to influenza A virus titer and by fitting Model 4 to both influenza A virus titer and effector CD8+ T cell data from infected mice.
    Parameter Model 1 Model 2 Model 3 Model 4
    β×105 0.110 0.110 0.110 0.110
    δ 0–25 102108 102108 015
    Kδ - 103107 103107 101105
    π 0–102 0–102 0–102 0–102
    c 0–25 0–25 0–25 0–25
    k - - 46 46
    δE - - - 025
    η×107 - - - 102100
    λ×103 - - - 102102
    dE - - - 025
    KE×106 - - - 0.1103
    τI - - - 0–10
    E0×103 - - - 0.110

     | Show Table
    DownLoad: CSV

    To compare Models 1–4, we calculate the corrected Akaike information criteria (AICc), given below

    AICc=nln(Jwn)+2(M+1)+2(M+1)(M+2)nM, (4.8)

    where n is the number of data points used to estimate parameters pw and M is the number of estimated parameters. In Models 1–3, n=9 and M=4,5, and 6, respectively. In Model 4, n=21 and M=13.

    To quantify the uncertainty associated with predicted solutions of each model, we perform parametric bootstrapping. It is a simulation-based method which assumes that data comes from a known distribution with unknown parameters. For Models 1–3, we assume that the predicted viral population for best parameter estimates, log10Vwmodel(i,pw), is the mean of the data's normal distribution and σ2Var(Vdata(i)) is its variance (see Eq (4.1)). Then, σ can be approximated as follows,

    σ21nMni=1(log10Vwmodel(i,pw)E(Vdata(i)))2Var(Vdata(i)),

    (see Banks et al. for a full derivation [37]). Here, n=9 is the number of viral samples and M is the number of parameters (M=4, M=5, and M=6 for Models 1–3, respectively). To find a confidence region in our model predictions, we generate 1000 simulated datasets using the distribution space given by Eq (4.1), and fit Models 1–3 to each datasets.

    Similarly, for Model 4, assuming that viral data and effector cell data come from distributions with means log10V4model(i,p4) and log10E4model(i,p4) (the predicted variables for best parameter fits) and that σ2VVar(Vdata(i)) and σ2EVar(Edata(j)) are the variances, then

    σ2V1ntotMnVi=1(log10V4model(i,p4)E(Vdata(i)))2Var(Vdata(i)),

    and

    σ2E1ntotMnEj=1(log10(Emodel(j,p4)E(Edata(j)))2Var(Edata(j)),

    as before. Here, nV=9 is the number of viral samples, nE=12 is the number of CD8+ T cell samples, ntot=nV+nE=21 is the number of total data samples, and M=13 is the number of parameters fitted.

    We fitted Models 1–3 to previously published longitudinal influenza A infectious virus titer and we fitted Model 4 to both longitudinal influenza A infectious virus titer and longitudinal CD8+ T cell data in infected mice [5], using a normalized least-square optimization algorithm (see Section 4). The results from fitting V(t) given by Models 1–3 to viral load data are shown in Figure 2A–C and the best parameter fits are given in Table 3. The results from fitting both V(t) and E(t) given by Model 4 to viral titer and effector cell data are shown in Figure 2D and the best parameter fits are given in Table 3. Model selection, using the corrected AICc, predicts that Model 4 best describes the data (see Table 3). To quantify the uncertainty associated with predicted solutions of each model, we find a 90% confidence region in our model predictions (see Section 4.5), illustrated by shaded gray areas in Figure 2A–C for Models 1–3 and by gray and blue shaded regions in Figure 2D for Model 4. We see large error regions in virus population predictions for all models during the decay phase (gray regions in Figure 2A–D). Moreover, Models 2–4 better capture the virus population expansion phase compared to Model 1 (gray regions in Figure 2B–D versus gray region in Figure 2A). Lastly, the largest error in CD8+ T cell prediction in Model 4 occurs in the second week of infection (blue region in Figure 2D).

    Table 3.  Parameter estimates found by fitting Model 1, Model 2, and Model 3 to virus titer data and Model 4 to virus titer and effector CD8+ T cell data from mice infected with influenza A virus using fmincon routine in MATLAB.
    Parameter Model 1 Model 2 Model 3 Model 4
    β×105 0.48 6.88 6.20 8.39
    δ 1.47 1.59×106 1.72×106 0.196
    Kδ - 4.69×104 2.37×105 5.49×103
    π 1.18 0.86 1.69 0.69
    c 1.48 6.49 12.34 6.19
    k - - 4.82 5.02
    δE - - - 14.20
    η×107 - - - 3.59
    λ×103 - - - 1.31
    dE - - - 0.20
    KE×106 - - - 9.68
    τI - - - 1.69
    E0×103 - - - 1.21
    Jw 21.12 2.09 2.39 2.14
    AICc 37.67 40.88 114.1 36.78

     | Show Table
    DownLoad: CSV
    Figure 2.  Model predictions (solid lines) and 95% model confidence regions (dashed areas) obtained by fitting V(t) (black lines) given by A: Model 1, B: Model 2, C: Model 3 to virus load data (black circles) and by fitting V(t) (black line) and E(t) (blue line) given by D: Model 4 to o virus load data (black circles) and CD8+ T cell data (blue circles). Model parameters are given in Table 3.

    While structural identifiability investigates whether parameters can be uniquely determined from a model given unlimited data in the absence of measurement error or noise, practical identifiability determines whether parameters can be accurately identified in real-world scenarios, where observed discrete and variable among subject data is contaminated with measurement errors. We and others have employed several methods to study practical identifiability of within-host and infectious disease models, such as Monte Carlo simulations [18,19,20], Fisher information matrix or correlation matrix [16,21,22,23], and Bayesian methods [24]. In this study, we use the profile likelihood method [25,38] described in detail below.

    Consider that the vector of parameters p is partitioned into p = (r, s), where r represents the parameter whose practical identifiability we are investigating and s represents the vector of remaining parameters. The profile likelihood of r is given by,

    PL(r)=minsRSS(r,s), (6.1)

    where RSS is the objective functional used for data fitting (in our case, Eq (4.2) for Models 1–3 and Eq (4.6) for Model 4). In other words, PL(r) finds the minimum of the objective functional RSS(r,s) for an array of fixed r values over the space of the remaining parameters s. The shape of PL(r) informs the identifiability of r, with a u-shaped PL(r) that exceeds a threshold (corresponding to a chosen confidence level) indicating practical identifiability of r and a flat PL(r) indicating nonpractical identifiability of r.

    We estimate PL(r) over a mesh of equally spaced values of r, centered at the best-fit estimate ˆr, with the number of mesh points chosen to have enough data to generate a confidence interval for ˆr, as follows. If we consider a model with parameters p unknown and a model with parameters s unknown, we obtain two nested models that differ by parameter r. It has been shown that the likelihood ratio of the nested models converges to a χ2 distribution with one degree of freedom, df=1 (see Corrollary 2 in [38] for more detail). This helps us define the Δ-level likelihood-based confidence interval for parameter r to be

    CI={r|PL(r)<J+Δ}, (6.2)

    where Δ is the percentile of the χ2 distribution with df=1, and J is the weighted least-squares functional at the best parameter estimate [38]. This can be summarized, as follows.

    Definition 4. Let CI={r|PL(r)<J+Δ} be the likelihood-based confidence interval for parameter r.

    i. If CI[r1,r2], where r1 and r2 are finite, then parameter r is practically identifiable.

    ii. If either r1 or r2 is infinite, then parameter r is not practically identifiable.

    A model is practically identifiable if all parameters are practically identifiable.

    For Models 1–3, we generated the profile likelihoods PL(r) for each parameter r{pw} for w={1,2,3} by fitting the functional RSSw(pw) given by Eq (4.2) and Model w to mean population viral load data. We obtained best estimates for the remaining parameters s over a mesh of equally spaced, known r values. Similarly, for Model 4, we generated the profile likelihood PL(r) for unknown parameters r{p4} by fitting functional RSS4(p4) given by Eq (4.6) simultaneously to the mean population viral titer and the mean population effector cell data. We obtained best estimates for the remaining parameters s over a mesh of equally spaced, known r values.

    Additionally, to further explore the relationship between data availability and practical identifiability, we generated profile likelihoods for parameters pw and Model w using simulated, noise-free, high frequency datasets. In particular, we assumed that the virus titer data was collected every fourteen minutes (for a total of 899 evenly spaced points) for Models 1–3 and that both the virus titer data and the effector cell data were collected every fourteen minutes (for a total of 899 evenly spaced points of each data type) for Model 4. We fitted each model to this high frequency data and generated profile likelihoods of the resulting parameter values. In all of our models, we chose Δ to be the 90-th percentile of the χ2-distribution, χ290 [38]. This guaranteed us a 90% confidence interval in the estimated parameter r.

    Since we determined that Models 1–4 are structurally identifiable under known initial conditions and unlimited data, we were allowed to search for best estimates for all models' parameters. The resulting fitting routine may still be ill-posed, given that the data consisted of discrete (daily) datasets that varied among the infected mice, rather than the unlimited data required by the structural identifiability theory. Therefore, we performed practical identifiability for Models 1–4 under the discrete subject data in [28]. We used the Profile Likelihood practical identifiability method [23,25,26,27], which has the advantage of not only determining whether a parameter is practically identifiable, but also of determining a 90% confidence interval for the parameter space where practical identifiability occurs.

    When using the empirical population mean virus titer data in [5] for Model 1, we found that β is practically identifiable with 90% confidence interval CI[1.64×106,2.94×105] and π is practically identifiable with 90% confidence interval CI[0.31,3.18], respectively. On the other hand, both δ and c are not practically identifiable, with identical 90% confidence intervals CI[0.87,] (see Figure 3A). Adding high frequency data and rerunning the profile likelihood analysis resulted in practical identifiability of all four parameters, consistent with the structural identifiability results for Model 1 (see Figure 3B).

    Figure 3.  Profile likelihood curves generated using empirical data for A: Model 1, C: Model 2, E: Model 3; and profile likelihood curves generated using high frequency simulated data for B: Model 1, D: Model 2, F: Model 3. The red circles indicate best parameter estimates given in Table 3 and the dashed lines represent a threshold equivalent to 90% confidence level in the parameter estimate.

    Similar to Model 1, when using the empirical population mean virus titer data in [5] for Model 2, we found that δ, Kδ, and π are practically identifiable with 90% confidence intervals CI[1.41×106,2.06×106], CI[1.74×104,4.19×105], and CI[0.44,5.24], respectively. On the other hand, both β and c are not practically identifiable, with 90% confidence intervals CI[6.62×106,] and CI[3.98,], respectively (see Figure 3C). Adding high frequency data and rerunning the profile likelihood analysis resulted in practical identifiability of all five parameters, consistent with the structural identifiability results for Model 2 (see Figure 3D).

    For Model 3, when using the empirical population mean virus titer data in [5], we found that β, δ, Kδ, and π are practically identifiable with 90% confidence intervals CI[2.17×105,1.60×104], CI[1.47×106,2.03×106], CI[1.59×105,4.32×105], and CI[1.07,4.49], respectively. On the other hand, both c and k are not practically identifiable, with 90% confidence intervals CI[3.98,] and CI[,], respectively (see Figure 3E). Adding high frequency data and rerunning the profile likelihood analysis did not result in practical identifiability of all six parameters (see Figure 3F). However, if we additionally relaxed constraints on parameters c and k to range in the [0,1000] and [0,50] intervals, compared to the constraints chosen in Table 2, we observed practical identifiability of all five parameters, consistent with the structural identifiability results for Model 3 (see Figure 4).

    Figure 4.  Profile likelihood curves generated using empirical data for Model 3 when we relax constraints, such that c[0,1000] and k[0,50]. The red circles indicate the best parameter estimate given in Table 3 and the dashed line represents a threshold equivalent to 90% confidence level in the parameter estimate.

    For Model 4, when using the discrete empirical population mean virus titer data and the empirical population mean effector cell data in [5] simultaneously, we found that π and E0 are practically identifiable with 90% confidence intervals CI[0.29,2.49] and CI[66,5.41×104]. Parameters k, λ, and KE are not practically identifiable with the same 90% confidence interval CI[,]. Parameters β, δE, Kδ, η, and c are also not practically identifiable with 90% confidence intervals CI[8.00×106,], CI[1.10,], CI[42.98,], CI[5.37×108,], and CI[3.89,], respectively. Lastly, parameters δ, dE, and τ are not practically identifiable on the positive domain with an undefined lower bound (ULB) for the 90% confidence interval and a finite upper bound. In particular, CI[ULB,0.63], CI[ULB,3.20], CI[ULB,6.99], for δ, dE, and τI, respectively (see Figure 5A). Adding high frequency data and rerunning the profile likelihood analysis resulted in practical identifiability of all thirteen parameters, consistent with the structural identifiability results for Model 4 (see Figure 5B).

    Figure 5.  Profile likelihood curves generated using empirical data for A: Model 4, and profile likelihood curves generated using high frequency simulated data for B: Model 4. The red circles indicate best parameter estimates given in Table 3 and the dashed lines represent a threshold equivalent to 90% confidence level in the parameter estimate.

    In this study, we investigated the conditions needed to ensure model identifiability in four models of influenza A virus dynamics in infected mice. To apply the same methodology and software, all considered models were either modeled by systems of ODEs (Models 1–3 given by equations Eqs (2.1)–(2.3)) or approximated by a system of ODEs (Model 4 given by Eq (2.7)). The considered models differ in the number of equations (corresponding to the number of variables) from three in Models 1 and 2, to four in Model 3 to eight in Model 4. Consequently, the number of unknown parameters is a maximum of four in Model 1, a maximum of five in Model 2, a maximum of six in Model 3, and a maximum of thirteen in Model 4. Lastly, the terms of interaction include only mass-action and linear terms for Model 1 and mass-action, linear terms, and density dependence terms for Models 2–4.

    We found that the increased complexity needed to capture biological realism comes with a cost. It resulted in increased uncertainty in parameter estimates not only when discrete and noisy virus and immune cell empirical data is used for validation but also when we assumed (hypothetically) that unlimited data is available. This means that data fitting should not be conducted until it is established that parameters can be revealed from unlimited data under the considered model structure. In other words, the first step in the model validation is determining whether all unknown parameters are structurally identifiable (see Figure 6).

    Figure 6.  Flow chart of performing identifiability theory to ODE models.

    When it comes to investigating the structural identifiability of systems of ODEs several software platforms are available. Here, we compared results from three of them: DAISY [32], COMBOS [33], and StructuralIdentifiability.jl in JULIA [34]. For Models 1–3 and unlimited virus titer data, we found the same classification for the structurally identifiable parameters and the same (or equivalent) correlations between the nonstructurally identifiable parameters, regardless of the software used (Table 1). For the more complex Model 4 and unlimited virus titer and effector CD8+ T cell data, however, only StructuralIdentifiability.jl in JULIA found that the model is structurally identifiable under known (with the exception of initial effector population, E0) initial conditions (Table 1). When initial conditions are unknown, we found identical classification for structurally identifiable parameters and equivalent correlations between the nonstructurally identifiable parameters among StructuralIdentifiability.jl in JULIA and DAISY (Table 1). COMBOS cannot handle the structural stability analyses for Model 4, regardless of whether initial conditions are known or not (Table 1). While increased difficulty in analyzing the structural identifiability of Model 4 is not surprising given its increased dimensionality (eight equations), multiple parameters (thirteen), and complex terms of interaction, this model is validated with two datasets (virus titer load and effector CD8+ T cells). Our analysis showed that the addition of data for another model variable did not compensate for the size of the model and number of unknown parameters.

    Interestingly, we found that all parameters (for all models) are structurally identifiable under known (with the exception of initial effector population, E0) initial conditions. Given that this is an inoculation study, the assumption of known viral inoculum (set by the experiment) and of known initial target cell population (approximated based on the animal body weight) is not unreasonable. When Models 1–4 are used to model natural infection, however, such initial conditions would be unknown due to differences in individual routes of infection, heterogeneity in individual immune responses, and variability in patient susceptibility. Under such unknowns, Models 1–4 would become structurally unidentifiable. Hence, it would be impossible to estimate all parameters even in the presence of unlimited data. A reduction of the unknown parameter space (based on the reported parameter correlations) would be needed before model validation can be attempted.

    We next validated Models 1–3 with discrete (daily) virus titer data (for the first nine days) and validated Model 4 with discrete (daily) virus titer data (for the first nine days) and discrete (daily) CD8+ T cell data (for the first twelve days). Model selection (based on AICc) favored Model 4 as the best model (Table 3). Interestingly, Model 1 was the second best model, even though it had the largest 90% error region around the predicted mean virus fit (Figure 2, gray shaded regions). All models perform the worst during the contraction interval (Figure 2, gray and blue shaded regions), suggesting uncertainty in death rates estimates (for the virus and infected cells).

    We used the best parameter estimates obtained through data fitting for Models 1–4 to further investigate their practical identifiablity. Knowing that data used for validation was collected daily and that there was variability among the subjects at each time point, we wanted to determine whether there is uncertainty in estimated parameters. When it comes to practical identifiability, several methods are available, such as the Monte Carlo simulations [18,19,20], the Fisher information matrix or correlation matrix [16,21,22,23], and Bayesian methods [24]. In this study, we used the profile likelihood method [25,38] for two main reasons. First, it allowed us to not just classify the models as practically or non-practically identifiable, but to determine a 90% confidence interval for each practically identifiable parameter. Second, it allowed us to determine the required assumptions needed to improve practical identifiability, while maintaining biological realism (for example, by imposing positivity for all parameters).

    We found that none of the models are practically identifiable for the daily empirical data collected in [28] and the parameter range restrictions imposed in Table 2 (see Figures 3A, C, E, and 5A). While Model 1, Model 2, and Model 4 become practically identifiable if we assume data is collected every fourteen minutes (see Figures 3A, D, and 5B), Model 3 does not (see Figure 3F). For this model, we can achieve practical identifiability only when we assume that the viral clearance rate can reach values as high as c=500 per day (corresponding to a life-span for the influenza virus of 2.9 minutes), and that the epithelial cells spend 1/k=1.2 hours in the eclipse phase before they become productively infected (see Figure 4). While large influenza clearance rates have been reported before [28], the eclipse phase 1/k is assumed to be conserved in a tight interval of 4–6 hours in most respiratory infections [6,11,28]. Therefore, this parameter is not practically identifiable from Model 3 even when data is collected at high frequency. This is a situation where a parameter should be removed from the data fitting routine in order to improve the uncertainty in the estimates of the remaining parameters (see Figure 6).

    Our study has several limitations. First, Model 4 was originally expressed as a five order system of DDEs. Given the lack of methods that can be used to determine the structural identifiability of DDEs, we approximated it with an eight order system of ODEs. More work is needed to determine whether we maintain (or improve) the practical identifiability results when the DDE system is used in the place of the ODE system. Second, we assumed that all model parameters are unknown. It is feasible that the practical identifiability will be improved if certain parameters (such as the eclipse phase) were assumed known. Lastly, all our practical identifiability results come in the context of daily data collection. It would be interesting to see how data collected with random frequency (especially unavailability of data measurements before peak viral load) changes the results. In conclusion, we investigated the structural and practical identifiability of four nested ODE models of influenza A virus infection in mice. We determined the tradeoff between model complexity (defined as combined system dimension, number of unknown parameters, nonlinearity in model interactions), data availability, and our ability to reliably estimate model parameters. We presented solutions for improving model identifiability. While our results dependent on the structure of the models considered the available data, the methods are generalizable and their use is needed to improve reliability and reproducibility of parameter estimates in other systems of ODEs applied to discrete biological data.

    Identifiability analysis has critical implications for experimental design, particularly when it comes to ensuring that the data obtained in these experiments will provide accurate estimation of parameters. For instance, if a model is not structurally identifiable even with noise-free unlimited data, then no experimental design will allow for estimation of certain parameters. Furthermore, structural identifiability analysis informs us which variables need to be measured in order to obtain reliable parameter estimates. Therefore, these experiments can be designed with some structurally identifiable models in mind. On the other hand, practical identifiability reveals the optimal data sampling frequency, where the data is more informative for certain parameters. These results will refine the experimental design to obtain data at those times and reduce the uncertainty in parameter estimates.

    The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

    Conceptualization: NT and SMC; analysis: YRL and NHB; funding acquisition SMC and NT; writing - original draft preparation: YRL, NHB, NT and SMC; writing - review and editing: YRL, NHB, NT and SMC.

    SMC and NHB acknowledge partial support from National Science Foundation grant No. 2051820 and NIH NIGMS 1R01GM152743-01. This research was enabled in part through the Virginia Tech Center for the Mathematics of Biosystems (VTCMB-033500). NT acknowledges partial support from National Science Foundation grant DMS 1951626. NT and YRL acknowledge support from NIH NIGMS 1R01GM152743-01.

    Necibe Tuncer is one of the special issue editors for Mathematical Biosciences and Engineering (MBE) and was not involved in the editorial review or the decision to publish this article. All authors declare no conflict of interest.



    [1] M. A. Stafford, L. Corey, Y. Cao, E. S. Daar, D. D. Ho, A. S. Perelson, Modeling plasma virus concentration during primary HIV infection, J. Theor. Biol., 203 (2000), 285–301. https://doi.org/10.1006/jtbi.2000.1076 doi: 10.1006/jtbi.2000.1076
    [2] A. S. Perelson, A. U. Neumann, M. Markowitz, J. M. Leonard, D. D. Ho, HIV-1 dynamics in vivo: virion clearance rate, infected cell life-span, and viral generation time, Science, 271 (1996), 1582–1586. https://doi.org/10.1126/science.271.5255.1582 doi: 10.1126/science.271.5255.1582
    [3] S. M. Ciupe, R. M. Ribeiro, P. W. Nelson, A. S. Perelson, Modeling the mechanisms of acute hepatitis B virus infection, J. Theor. Biol., 247 (2007), 23–35. https://doi.org/10.1016/j.jtbi.2007.02.017 doi: 10.1016/j.jtbi.2007.02.017
    [4] S. M. Ciupe, H. Dahari, A. Ploss, Mathematical models of early hepatitis B virus dynamics in humanized mice, Bull. Math. Biol., 86 (2024), 53. https://doi.org/10.1007/s11538-024-01284-2 doi: 10.1007/s11538-024-01284-2
    [5] M. A. Myers, A. P. Smith, L. C. Lane, D. J. Moquin, R. Aogo, S. Woolard, et al., Dynamically linking influenza virus infection kinetics, lung injury, inflammation, and disease severity, eLife, 10 (2021), e68864. https://doi.org/10.7554/eLife.68864 doi: 10.7554/eLife.68864
    [6] P. Baccam, C. Beauchemin, C. A. Macken, F. G. Hayden, A. S. Perelson, Kinetics of influenza A virus infection in humans, J. Virol., 80 (2006), 7590–7599. https://doi.org/10.1128/jvi.01623-05 doi: 10.1128/jvi.01623-05
    [7] R. Ben-Shachar, K. Koelle, Minimal within-host dengue models highlight the specific roles of the immune response in primary and secondary dengue infections, J. R. Soc. Interface, 12 (2015), 20140886. https://doi.org/10.1098/rsif.2014.0886 doi: 10.1098/rsif.2014.0886
    [8] R. Nikin-Beers, S. M. Ciupe, Modelling original antigenic sin in dengue viral infection, Math. Med. Biol., 35 (2018), 257–272. https://doi.org/10.1093/imammb/dqx002 doi: 10.1093/imammb/dqx002
    [9] R. Nikin-Beers, S. M. Ciupe, The role of antibody in enhancing dengue virus infection, Math. Biosci., 263 (2015), 83–92. https://doi.org/10.1016/j.mbs.2015.02.004 doi: 10.1016/j.mbs.2015.02.004
    [10] K. Best, J. Guedj, V. Madelain, X. de Lamballerie, S. Lim, C. E. Osuna, et al., Zika plasma viral dynamics in nonhuman primates provides insights into early infection and antiviral strategies, Proc. Natl. Acad. Sci., 114 (2017), 8847–8852. https://doi.org/10.1073/pnas.1704011114 doi: 10.1073/pnas.1704011114
    [11] R. Ke, C. Zitzmann, D. D. Ho, R. M. Ribeiro, A. S. Perelson, In vivo kinetics of SARS-CoV-2 infection and its relationship with a person's infectiousness, Proc. Natl. Acad. Sci., 118 (2021), e2111477118. https://doi.org/10.1073/pnas.2111477118 doi: 10.1073/pnas.2111477118
    [12] N. Heitzman-Breen, S. M. Ciupe, Modeling within-host and aerosol dynamics of SARS-CoV-2: The relationship with infectiousness, PLoS Comput. Biol., 18 (2022), e1009997. https://doi.org/10.1371/journal.pcbi.1009997 doi: 10.1371/journal.pcbi.1009997
    [13] S. M. Ciupe, J. M. Heffernan, In-host modeling, Infect. Dis. Model., 2 (2017), 188–202. https://doi.org/10.1016/j.idm.2017.04.002
    [14] S. M. Ciupe, J. M. Conway, Incorporating intracellular processes in virus dynamics models, Microorganisms, 12 (2024), 900. https://doi.org/10.3390/microorganisms12050900 doi: 10.3390/microorganisms12050900
    [15] M. Chung, M. Binois, R. B. Gramacy, J. M. Bardsley, D. J. Moquin, A. P. Smith, et al., Parameter and uncertainty estimation for dynamical systems using surrogate stochastic processes, SIAM J. Sci. Comput., 41 (2019), A2212–A2238. https://doi.org/10.1137/18M1213403 doi: 10.1137/18M1213403
    [16] H. Miao, C. Dykes, L. M. Demeter, J. Cavenaugh, S. Y. Park, A. S. Perelson, et al., Modeling and estimation of kinetic parameters and replicative fitness of HIV-1 from flow-cytometry-based growth competition experiments, Bull. Math. Biol., 70 (2008), 1749–1771. https://doi.org/10.1007/s11538-008-9323-4 doi: 10.1007/s11538-008-9323-4
    [17] M. C. Eisenberg, S. L. Robertson, J. H. Tien, Identifiability and estimation of multiple transmission pathways in Cholera and waterborne disease, J. Theor. Biol., 324 (2013), 84–102. https://doi.org/10.1016/j.jtbi.2012.12.021 doi: 10.1016/j.jtbi.2012.12.021
    [18] N. Tuncer, M. Martcheva, Determining reliable parameter estimates for within-host and within-vector models of Zika virus, J. Biol. Dyn., 15 (2021), 430–454. https://doi.org/10.1080/17513758.2021.1970261 doi: 10.1080/17513758.2021.1970261
    [19] N. Tuncer, H. Gulbudak, V. L. Cannataro, M. Martcheva, Structural and practical identifiability issues of immuno-epidemiological vector–host models with application to rift valley fever, Bull. Math. Biol., 78 (2016), 1796–1827. https://doi.org/10.1007/s11538-016-0200-2 doi: 10.1007/s11538-016-0200-2
    [20] N. Heitzman-Breen, Y. R. Liyanage, N. Duggal, N. Tuncer, S. M. Ciupe, The effect of model structure and data availability on Usutu virus dynamics at three biological scales, Royal Society Open Science, 11 (2024), 231146. https://doi.org/10.1098/rsos.231146 doi: 10.1098/rsos.231146
    [21] N. Tuncer, T. T. Le, Structural and practical identifiability analysis of outbreak models, Math. Biosci., 299 (2018), 1–18. https://doi.org/10.1016/j.mbs.2018.02.004 doi: 10.1016/j.mbs.2018.02.004
    [22] P. Das, M. Igoe, A. Lacy, T. Farthing, A. Timsina, C. Lanzas, et al., Modeling county level COVID-19 transmission in the greater St. Louis area: Challenges of uncertainty and identifiability when fitting mechanistic models to time-varying processes, Math. Biosci., 371 (2024), 109181. https://doi.org/10.1016/j.mbs.2024.109181 doi: 10.1016/j.mbs.2024.109181
    [23] Y. Kao, M. C. Eisenberg, Practical unidentifiability of a simple vector-borne disease model: Implications for parameter estimation and intervention assessment, Epidemics, 25 (2018), 89–100. https://doi.org/10.1016/j.epidem.2018.05.010 doi: 10.1016/j.epidem.2018.05.010
    [24] S. M. Ciupe, N. Tuncer, Identifiability of parameters in mathematical models of SARS-CoV-2 infections in humans, Sci. Rep., 12 (2022), 14637. https://doi.org/10.1038/s41598-022-18683-x doi: 10.1038/s41598-022-18683-x
    [25] A. Raue, C. Kreutz, T. Maiwald, J. Bachmann, M. Schilling, U. Klingmüller, et al., Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood, Bioinformatics, 25 (2009), 1923–1929. https://doi.org/10.1093/bioinformatics/btp358 doi: 10.1093/bioinformatics/btp358
    [26] A. Boianelli, V. K. Nguyen, T. Ebensen, K. Schulze, E. Wilk, N. Sharma, et al., Modeling influenza virus infection: A roadmap for influenza research, Viruses, 7 (2015), 5274–5304. https://doi.org/10.3390/v7102875 doi: 10.3390/v7102875
    [27] M. J. Simpson, A. P. Browning, D. J. Warne, O. J. Maclaren, R. E. Baker, Parameter identifiability and model selection for sigmoid population growth models, J. Theor. Biol., 535 (2022), 110998. https://doi.org/10.1016/j.jtbi.2021.110998 doi: 10.1016/j.jtbi.2021.110998
    [28] A. P. Smith, D. J. Moquin, V. Bernhauerova, A. M. Smith, Influenza virus infection model with density dependence supports biphasic viral decay, Front. Microbiol., 9 (2018), 1554. https://doi.org/10.3389/fmicb.2018.01554 doi: 10.3389/fmicb.2018.01554
    [29] J. J. Sedmak, S. E. Grossberg, Interferon bioassay: Reduction in yield of myxovirus neuraminidases, J. Gen. Virol., 21 (1973), 1–7. https://doi.org/10.1099/0022-1317-21-1-1 doi: 10.1099/0022-1317-21-1-1
    [30] Y. Sun, W. J. Jusko, Transit compartments versus gamma distribution function to model signal transduction processes in pharmacodynamics, J. Pharm. Sci., 87 (1998), 732–737. https://doi.org/10.1021/js970414z doi: 10.1021/js970414z
    [31] A. S. Perelson, P. W. Nelson, Modeling viral infections, in Proceedings of Symposia in Applied Mathematics, 59 (2002), 139–172.
    [32] G. Bellu, M. P. Saccomani, S. Audoly, L. D'Angiò, DAISY: A new software tool to test global identifiability of biological and physiological systems, Comput. Methods Programs Biomed., 88 (2007), 52–61. https://doi.org/10.1016/j.cmpb.2007.07.002 doi: 10.1016/j.cmpb.2007.07.002
    [33] N. Meshkat, C. E. Kuo, J. DiStefano III, On finding and using identifiable parameter combinations in nonlinear dynamic systems biology models and COMBOS: A novel web implementation, PLoS One, 9 (2014), e110261. https://doi.org/10.1371/journal.pone.0110261 doi: 10.1371/journal.pone.0110261
    [34] R. Dong, C. Goodbrake, H. Harrington, G. Pogudin, Differential elimination for dynamical models via projections with applications to structural identifiability, SIAM J. Appl. Algebra Geom., 7 (2023), 194–235. https://doi.org/10.1137/22M1469067 doi: 10.1137/22M1469067
    [35] H. Hong, A. Ovchinnikov, G. Pogudin, C. Yap, SIAN: Software for structural identifiability analysis of ODE models, Bioinformatics, 35 (2019), 2873–2874. https://doi.org/10.1093/bioinformatics/bty1069 doi: 10.1093/bioinformatics/bty1069
    [36] X. R. Barreiro, A. F. Villaverde, Benchmarking tools for a priori identifiability analysis, Bioinformatics, 39 (2023), btad065. https://doi.org/10.1093/bioinformatics/btad065 doi: 10.1093/bioinformatics/btad065
    [37] H. T. Banks, S. Hu, W. C. Thompson, Modeling and Inverse Problems in the Presence of Uncertainty, Chapman and Hall/CRC, 2014. https://doi.org/10.1201/b16760
    [38] S. A. Murphy, A. W. Van der Vaart, On profile likelihood, J. Am. Stat. Assoc., 95 (2000), 449–465. https://doi.org/10.1080/01621459.2000.10474219
  • Reader Comments
  • © 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(984) PDF downloads(57) Cited by(0)

Figures and Tables

Figures(6)  /  Tables(3)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog