Loading [MathJax]/jax/output/SVG/jax.js
Research article Topical Sections

Genome-wide maps of nucleosomes of the trichostatin A treated and untreated archiascomycetous yeast Saitoella complicata

  • Received: 28 January 2016 Accepted: 11 March 2016 Published: 15 March 2016
  • We investigated the effects of trichostatin A (TSA) on gene expression and nucleosome position in the archiascomycetous yeast Saitoella complicata. The expression levels of 154 genes increased in a TSA-concentration-dependent manner, while the levels of 131 genes decreased. Conserved genes between S. complicata and Schizosaccharomyces pombe were more commonly TSA-concentration-dependent downregulated genes than upregulated genes. We calculated the correlation coefficients of nucleosome position profiles within 300 nucleotides (nt) upstream of a translational start of S. complicata grown in the absence and the presence of TSA (3 μg/mL). We found that 20 (13.0%) of the 154 TSA-concentration-dependent upregulated genes and 22 (16.8%) of the 131 downregulated genes had different profiles (r < 0.4) between TSA-free and TSA-treated. Additionally, 59 (38.3%) of the 154 upregulated genes and 58 (44.3%) of the 131 downregulated genes had similar profiles (r > 0.8). We did not observe a GC content bias between the 300 nt upstream of the translational start of the TSA-concentration-dependent genes with conserved nucleosome positioning and the genes with different nucleosome positioning, suggesting that TSA-induced nucleosome position change is likely not related to DNA sequence. Most gene promoters maintained their nucleosome positioning even after TSA treatment, which may be related to DNA sequence. Enriched and depleted dinucleotides distribution of S. complicata around the midpoints of highly positioned nucleosome dyads was not similar to that of the phylogenetically close yeast Schizosaccharomyces pombe but similar to the basidiomycete Mixia osmundae, which has similar genomic GC content to that of S. complicata.

    Citation: Kenta Yamauchi, Shinji Kondo, Makiko Hamamoto, Yutaka Suzuki, Hiromi Nishida. Genome-wide maps of nucleosomes of the trichostatin A treated and untreated archiascomycetous yeast Saitoella complicata[J]. AIMS Microbiology, 2016, 2(1): 69-91. doi: 10.3934/microbiol.2016.1.69

    Related Papers:

    [1] Khaphetsi Joseph Mahasa, Rachid Ouifki, Amina Eladdadi, Lisette de Pillis . A combination therapy of oncolytic viruses and chimeric antigen receptor T cells: a mathematical model proof-of-concept. Mathematical Biosciences and Engineering, 2022, 19(5): 4429-4457. doi: 10.3934/mbe.2022205
    [2] Ruohan Li, Jinzhi Lei . Optimal therapy schedule of chimeric antigen receptor (CAR) T cell immunotherapy. Mathematical Biosciences and Engineering, 2025, 22(7): 1653-1679. doi: 10.3934/mbe.2025061
    [3] Prathibha Ambegoda, Hsiu-Chuan Wei, Sophia R-J Jang . The role of immune cells in resistance to oncolytic viral therapy. Mathematical Biosciences and Engineering, 2024, 21(5): 5900-5946. doi: 10.3934/mbe.2024261
    [4] Rujing Zhao, Xiulan Lai . Evolutionary analysis of replicator dynamics about anti-cancer combination therapy. Mathematical Biosciences and Engineering, 2023, 20(1): 656-682. doi: 10.3934/mbe.2023030
    [5] Abeer S. Alnahdi, Muhammad Idrees . Nonlinear dynamics of estrogen receptor-positive breast cancer integrating experimental data: A novel spatial modeling approach. Mathematical Biosciences and Engineering, 2023, 20(12): 21163-21185. doi: 10.3934/mbe.2023936
    [6] Avner Friedman, Kang-Ling Liao . The role of the cytokines IL-27 and IL-35 in cancer. Mathematical Biosciences and Engineering, 2015, 12(6): 1203-1217. doi: 10.3934/mbe.2015.12.1203
    [7] Tin Phan, Changhan He, Alejandro Martinez, Yang Kuang . Dynamics and implications of models for intermittent androgen suppression therapy. Mathematical Biosciences and Engineering, 2019, 16(1): 187-204. doi: 10.3934/mbe.2019010
    [8] Cassidy K. Buhler, Rebecca S. Terry, Kathryn G. Link, Frederick R. Adler . Do mechanisms matter? Comparing cancer treatment strategies across mathematical models and outcome objectives. Mathematical Biosciences and Engineering, 2021, 18(5): 6305-6327. doi: 10.3934/mbe.2021315
    [9] Maher Alwuthaynani, Raluca Eftimie, Dumitru Trucu . Inverse problem approaches for mutation laws in heterogeneous tumours with local and nonlocal dynamics. Mathematical Biosciences and Engineering, 2022, 19(4): 3720-3747. doi: 10.3934/mbe.2022171
    [10] Hsiu-Chuan Wei . Mathematical modeling of tumor growth: the MCF-7 breast cancer cell line. Mathematical Biosciences and Engineering, 2019, 16(6): 6512-6535. doi: 10.3934/mbe.2019325
  • We investigated the effects of trichostatin A (TSA) on gene expression and nucleosome position in the archiascomycetous yeast Saitoella complicata. The expression levels of 154 genes increased in a TSA-concentration-dependent manner, while the levels of 131 genes decreased. Conserved genes between S. complicata and Schizosaccharomyces pombe were more commonly TSA-concentration-dependent downregulated genes than upregulated genes. We calculated the correlation coefficients of nucleosome position profiles within 300 nucleotides (nt) upstream of a translational start of S. complicata grown in the absence and the presence of TSA (3 μg/mL). We found that 20 (13.0%) of the 154 TSA-concentration-dependent upregulated genes and 22 (16.8%) of the 131 downregulated genes had different profiles (r < 0.4) between TSA-free and TSA-treated. Additionally, 59 (38.3%) of the 154 upregulated genes and 58 (44.3%) of the 131 downregulated genes had similar profiles (r > 0.8). We did not observe a GC content bias between the 300 nt upstream of the translational start of the TSA-concentration-dependent genes with conserved nucleosome positioning and the genes with different nucleosome positioning, suggesting that TSA-induced nucleosome position change is likely not related to DNA sequence. Most gene promoters maintained their nucleosome positioning even after TSA treatment, which may be related to DNA sequence. Enriched and depleted dinucleotides distribution of S. complicata around the midpoints of highly positioned nucleosome dyads was not similar to that of the phylogenetically close yeast Schizosaccharomyces pombe but similar to the basidiomycete Mixia osmundae, which has similar genomic GC content to that of S. complicata.


    In the mathematical modeling of physical and biological systems, situations often arise in which some facet of the underlying dynamics (in the form of a parameter) is not constant but rather may be distributed probabilistically within the system or across the population under study. While traditional inverse problems involve the estimation, given a set of data/observations, of a fixed set of parameters contained within some finite dimensional admissible set, models with parameters distributed across the population require the estimation of a probability measure or distribution over the set of admissible 'parameters'. The techniques for such measure estimation (along with their theoretical justifications) are widely scattered throughout the literature in applied mathematics and statistics, often with few cross references to related ideas; reviews are given in [1,2]. Of course, it is highly likely that individual parameters might vary from one individual to the next within the sampled population. Thus, our goal in this case is to use the sample of individuals to estimate the probability measure describing the distribution of certain parameters in the full population. In some situations (e.g., in certain pharmacokinetics and/or pharmacodynamics examples), one is able to follow each individual separately and collect longitudinal time course data for each individual. In other investigations, situations arise in which one is only able to collect agammaegate data. Such might be the case, for example, in marine or insect catch and release experiments in which one cannot be certain of measuring the same individual multiple times. In other situations, one might be required to sacrifice each individual in the process of collecting the data (precisely the situation in the data we present below). In these cases, one does not have individual longitudinal data, but rather histograms showing the agammaegate number of individuals sampled from the population at a given time, having a given size/weight or other characteristic of interest ([3] and Chapter 9 of [4]). The goal then is to estimate the probability distributions describing the variability of the parameters across the population.

    In our problem below, while one again has a proposed individual model, the data collected cannot be identified with individuals and is considered to be sampled longitudinally from the agammaegate population. It is worth noting that special care must be taken in this case to identify the model such as that introduced below as an individual model in the sense that it describes an individual subpopulation. That is, we have all 'individuals' (i.e., shrimp [5] or mosquitofish [3,6,7] or cancerous mice [8,9]) described by the model sharing a common growth/birth/death rate function. Mathematically, here we define the 'individual' in terms of the underlying parameters, using 'individual' to describe the unit characterized by a single parameter set (tumor expansion rate, excretion rate, growth/birth/death rate, damping rate, relaxation times, etc.).

    One can also distinguish two generic estimation problems. In the example presented in this manuscript, we consider the case, as in a structured density model, that one has a mathematical model for individual dynamics but only agammaegate data (we refer to this as the individual dynamics/agammaegate data problem). The second possibility–that one has only an agammaegate model (i.e., the dynamics depend explicitly on a distribution of parameters across the population) with agammaegate data is not examined in this manuscript (we refer to this as the agammaegate dynamics/agammaegate data problem). Such examples arise in electromagnetic models with a distribution of polarization relaxation times for molecules (e.g., [10,11]); in biology with HIV cellular models [12,13,14]; and in wave propagation in viscoelastic materials such as biotissue [15,16,17,18]. The measure estimation problem for such examples is sufficiently similar to the individual dynamics/agammaegate data situation and accordingly we do not discuss agammaegate dynamics models here.

    In the generic estimation problems mentioned above, the underlying goal is the determination of the probability measure which describes the distribution of parameters across all members of the population. Thus, two main issues are of interest. First, a sensible framework (the Prohorov Metric Framework as we have developed it–see Chapter 14 of [19] and Chapter 5 of [2]) must be established for each situation so that the estimation problem is mathematically meaningful. Thus we must decide what type of information and/or estimates are desired (e.g., mean, variance, complete distribution function, etc. for the tumor size) and determine how these decisions will depend on the type of data available. Second, we must examine what mathematical techniques are available for the computation of such estimates. Because the space of probability measures is an infinite dimensional space (again a mathematical notion), we must make some type of finite dimensional approximations so that the estimation problem is amenable to convergent computations. A thorough discussion of associated mathematical, statistical and computational issues can be found in [2].

    We now turn to two important questions: the first is how to deal with inverse problems (parameter estimation) for dynamical systems when longitudinal data does not exist? A second question that we shall consider below entails how to efficiently design experiments to collect data (i.e., how much data and when to collect it) that is necessary to validate models in such agammaegate data situations. Our efforts here are motivated by a set of data collected with NSG (NOD/SCID/GAMMA), or NOD.Cg-PrkdcscidIl2rgtm1Wjl/SzJ mice injected with with cancer and then sequentially sacrificed to collect the needed data about T-cell counts. The resulting agammaegate data is absolutely common in biological experiments where data collection requires sacrifice of the subjects. Colleagues at Moffitt Cancer Center have carried out preliminary trial experiments (data collection already completed) as follows: At $ t = -14 $ (14 days before the trial begins), NSG mice are injected with cancer, which should take 12–14 days to begin growing. At $ t = 0 $ (as determined by tumor volume) the mice are further injected with chimeric antigen receptor [8] or CAR T-cells (engineered T-cells to specifically target cancer cells). Mice are divided into four groups with different treatments. Autopsies are performed on each of 5 mice sacrificed at $ t = 5 $, $ 10 $, and $ 15 $ days, respectively, to determine the concentration of the engineered T-cells within the blood, spleen, and within the tumor. Because of nature of data collection, longitudinal data is not possible as mice must be killed for data to be collected.

    In the present investigation we begin by describing an individual model (as opposed to an agammaegate model) to describe T-cell counts in an individual cancerous mouse. We first perform stability analysis on the model to understand better its behavior, and carry out sensitivity analysis to identify the most 'important' parameters. Next, we describe the agammaegate model associated with the individual model, and some techniques to estimate the probability distribution over a chosen parameter. Finally, we simulate the Moffitt Cancer Center data set and demonstrate that parameters cannot be properly estimated with only a handful of time points.

    We use the system of ordinary differential equations,

    $ dTdt=ρβExtBβTBT $ (2.1)
    $ dBdt=(βTBTρβExtB)+(βSBSβExtB) $ (2.2)
    $ dSdt=βExtBβSBS, $ (2.3)

    based on simple mass balance as described in [4], [20], and [21] and depicted in Figure 1 to model the flow of T-cells in the tumor, $ T $, T-cells in the blood, $ B $, and T-cells in the spleen, $ S $, in a cancerous body. The number of T-cells in the blood, $ B $, travel to the tumor, $ T $, at a rate $ \rho_{Tx} = \rho \beta_{Ext} $. For each T-cell that leaves the blood, there may be a transient expansion, $ \rho $, in the tumor due to antigen recognition. If there is no antigen recognition, $ \rho_{Tx} = \beta_{Ext} $ and $ \rho = 1 $, and if there is any antigen recognition, $ \rho_{Tx} > \beta_{Ext} $ and $ \rho > 1 $. The T-cells in the tumor, $ T $, then flow back to the blood, $ B $, at a rate $ \beta_{TB} $. T-cells in the blood flow to the spleen, $ S $, at a rate $ \beta_{BS} = \beta_{Ext} $. That is, the flow rate of T-cells out of the blood to the spleen is the same as the exit rate of T-cells from the blood to the tumor if $ \rho = 1 $ and there is no antigen recognition. Then the T-cells flow from the spleen back to the blood at a rate $ \beta_{SB} $. Our four parameters of interest are thus $ \rho $, $ \beta_{Ext} $, $ \beta_{TB} $, and $ \beta_{SB} $, which are all strictly positive. In order to solve the system of ODE's the initial number of T-cells in the tumor, $ T_0 = T(t_1) $, the blood $ B_0 = B(t_1) $, and the spleen $ S_0 = S(t_1) $ also need to be specified.

    Figure 1.  Schematic of the simple compartmental model described in (2.1), (2.2), and (2.3).

    In the experiment of interest, cancerous mice are separated into four treatment categories: untransduced T-cells (UT), chimeric antigen receptor therapy (CAR), CAR treatment with added CXCR1 chemokine receptors (CAR+CXCR1), and CAR treatment with added CXCR2 chemokine receptors (CAR+CXCR2). Each treatment has a different effect on antigen recognition in the tumor, or parameter $ \rho $. The parameters and their values are listed in Table 1 and are partially motivated by reference to other tumor-related experiments [9]. The $ \rho $ values for each treatment are estimations around which we can study the behavior for each system. The T-cell movement rates, $ \beta_{Ext} $, $ \beta_{TB} $, and $ \beta_{SB} $, are estimated based on knowledge that the exit spleen rate is considerably lower than the exit tumor rate, such that $ \beta_{Ext} > \beta_{TB} > \beta_{SB} $. It is also known that the initial T-cell counts in the tumor and spleen, $ T_0 $ and $ S_0 $, respectively, are zero, and the initial T-cell count in the blood, $ B_0 $, ranges from 0 to 10 million. Thus, $ B_{0} = 10^{6} $ is chosen, and we set $ T_0 = B_0 = 0 $

    Table 1.  Parameters, initial conditions, and their descriptions in equations (2.1)–(2.3).
    $ {\boldsymbol{\theta}} $ Definition Chosen Values Units
    $ \beta_{Ext} $ Rate at which T-cells exit blood $ 0.01 $ 1/day
    $ \beta_{TB} $ Rate at which T-cells exit tumor and enter blood 0.001 1/day
    $ \beta_{SB} $ Rate at which T-cells exit spleen and enter blood 0.0001 1/day
    $ {\boldsymbol{\rho }}$ Transient expansion factor of T-cells in the tumor 1
    $ \rho_{UT} $ With no treatment 14 1
    $ \rho_{CAR} $ With CAR treatment 15 1
    $ \rho_{CXCR1} $ With CAR+CXCR1 treatment 20 1
    $ \rho_{CXCR2} $ With CAR+CXCR2 treatment 30 1
    $ T_0 $ Initial condition of $ T $ at the first time point ($ T_0 = T(t_1) $) 0 T-cells
    $ B_0 $ Initial condition of $ B $ at the first time point ($ B_0 = B(t_1) $) $ 10^{6} $ T-cells
    $ S_0 $ Initial condition of $ S $ at the first time point ($ S_0 = S(t_1) $) 0 T-cells

     | Show Table
    DownLoad: CSV

    Before comparing data to a mathematical model, it is important to understand the behavior of the mathematical model, especially the limiting behavior. Since the values in Table 1 are set arbitrarily for simulation purposes, ideally we want to understand the behavior of the mathematical model for any realistic parameter values (i.e., values of $ \beta_{Ext} $, $ \beta_{TB} $, etc.). Although in reality we will not observe the number of T-cells for longer than a few days, it is important to understand what happens to these state values in the limit as time becomes large. Equations (2.1), (2.2), and (2.3) make up the first-order linear homogeneous system of differential equations

    $ \frac{{d\mathit{\boldsymbol{X}}}}{{dt}}{\rm{ = }}J\mathit{\boldsymbol{X}}\left( \mathit{t} \right)$ (2.4)

    where ${\boldsymbol {{X}} }(t) = [T(t), B(t), S(t)]^T $ is a $ 3\times1 $ vector and

    $ J=[βTBρβExt0βTBρβExtβExtβSB0βExtβSB]. $ (2.5)

    is a $ 3\times3 $ matrix sometimes referred to as the Jacobian matrix [22]. Theoretically, the solution of this system can be found from the eigenvalues and eigenvectors of $ J $ (see Chapter 3 of [23]), but for arbitrary parameter values ($ \beta_{TB} $, $ \beta_{SB} $, etc.), these may be difficult or impossible to find. However, by examining the eigenvalues, we learn about the stability of the system.

    The characteristic equation of $ J $ is

    $ λ3+mλ2+qλ=0 $ (2.6)

    where

    $ m=βTB+βSB+βExt(ρ+1)q=βTBβSB+βTBβExt+βSBβExtρ.$

    Thus, one of the eigenvalues is $ \lambda_1 = 0 $, and by Decartes' Rule of Signs [24], the other two eigenvalues $ \lambda_2 $ and $ \lambda_3 $ are either complex or negative (all parameter values defined in Table 1 are strictly positive). However, the discriminant

    ${m^2} - 4q = {({\beta _{TB}} - {\beta _{SB}} + {\beta _{Ext}}(\rho - 1))^2} + 4{\beta ^2}_{Ext}\rho $

    is strictly positive, so $ \lambda_2 $ and $ \lambda_3 $ are both real and negative. Hence, the solution to the system takes the form

    $ X(t)=c1v1+c2v2eλ2t+c3v3eλ3t $

    where $ c_1 $, $ c_2 $, and $ c_2 $ are constants of integration determined by the initial conditions $ T_0 $, $ B_0 $, and $ S_0 $, and $ {\boldsymbol{v}}_1 $, $ {\boldsymbol{v}}_2 $, and $ {\boldsymbol{v}}_3 $ are the eigenvectors corresponding to $ \lambda_1 = 0 $, $ \lambda_2 $, and $ \lambda_3 $. Hence, as $ t\longrightarrow \infty $, the solution of the mathematical model approaches the positive steady state $ {\boldsymbol{X}}(t) \longrightarrow c_1{\boldsymbol{v}}_1 $. The eigenvector $ {\mathit{\boldsymbol{v}}_1} = {[{\beta _{SB}}{\beta _{Ext}}\rho, {\rm{ }}{\beta _{TB}}{\beta _{SB}}, {\rm{ }}{\beta _{TB}}{\beta _{Ext}}]^T}$ is easily found, but $ \lambda_2 $, $ \lambda_3 $, $ {\boldsymbol{v}}_2 $, $ {\boldsymbol{v}}_3 $, and the constants $ c_1 $, $ c_2 $, and $ c_3 $ are complicated algebraically and are not necessary to examine steady state behavior.

    Under the biologically realistic conditions that $ \beta_{Ext} > \beta_{TB} > \beta_{SB} > 0 $ and $ \rho \geq 1 $, the model has two different cases of long term behavior, found through $ {\boldsymbol{v}}_1 $:

    $ {\rm{Tumor-Ignoring \;Case:}} \;\; {\rm{If}} \; \rho{{\beta _{SB}}} \leq {{\beta _{TB}}} , \;\; {\rm{then}}\; \bar{B} \lt \bar{T} \leq \bar{S} . \\ {\rm{Tumor-Targeting\; Case:}} \;\; {\rm{If}} \; \rho{{\beta _{SB}}} \gt {{\beta _{TB}}} , \;\; {\rm{then}}\; \bar{B} \lt \bar{S} \lt \bar{T} , $

    where $ \lim_{t\rightarrow \infty} {\boldsymbol{X}}(t) = \bar{{\boldsymbol{X}}} = [\bar{T}, \bar{B}, \bar{S}]^T $ is the steady state of the mathematical model. In the Tumor-Targeting Case, more T-cells are going to the site of the tumor, which is ideal for the patient, but in the Tumor-Ignoring Case, T-cells are either favoring the spleen or treating the spleen and tumor equally. At the current parameter values in Table 1, the Tumor-Ignoring Case occurs when $ \rho \leq 10 $, and the Tumor-Targeting Case occurs when $ \rho > 10 $. Thus, at all hypothesized values of the transient expansion factor $ \rho $, T-cells should target the tumor in the long term. Using values from Table 1 with $ \rho = 2 $ for the Tumor-Ignoring Case and $ \rho = 15 $ for the Tumor-Targeting Case, Figure 2 below captures this behavior.

    Figure 2.  Numerical short-term and long-term solutions for our model, where the state variables, $ T $, $ B $, and $ S $ are the number of T-cells in the tumor, blood, and spleen, respectively. The other parameter values are fixed at values from Table 1. We explore both the Tumor-Ignoring cases (when $ \rho \leq 10 $) in (a) and (b) and Tumor-Targeting cases (when $ \rho > 10 $) in (c) and (d).

    Figures 2a and 2b display short and long-term solution behavior for the solutions of the system in the Tumor-Ignoring case, when $ \rho = 2 $. In this situation, the expansion parameter $ \rho $ is less than ideal, which means that the T-cells are not being shuttled to the tumor in a significant way. Figure 2b shows that in the long-term, T-cells in the blood will decrease to zero, and although they increase to an extent in the tumor, they eventually lose numbers and go to the spleen. Figure 2a shows this short-term behavior, where we can see the relatively slow decay of T-cells in the blood and relatively slow growth of T-cells in the tumor and spleen.

    If we look at short and long-term cases in which $ \rho > 10 $, the Tumor-Targeting case, we see a notable change in behavior of the system. In Figure 2c, we see that T-cells leave the blood and are shuttled to the tumor at a much higher rate than in the Tumor-Ignoring case, while T-cells in the spleen grow very slowly. In the long-term, as seen in Figure 2d, the behavior is indeed very dramatic initially, and then levels out. T-cells in the blood, again, go to zero, while T-cells in the tumor increase significantly and level out, with T-cells in the spleen remaining low over time. This behavior corresponds with a value of $ \rho $ that reinforces the Tumor-Targeting behavior: when $ \rho > 10 $, the body sends the T-cells to the unwanted tumor, which is what we would expect in a cancer treatment. Note that since T-cell movement in the body occurs relatively slowly, T, B, and S take a long time to reach their steady states at biologically relevant parameter values. Thus, using parameter values listed in Table 1, T, B, and S do not reach their steady state until $ t > 10,000 $ days.

    Statistical significance in an inverse problem (fitting data to a mathematical model) depends largely on the sensitivity of the parameter chosen to be estimated. Utilizing sensitivity analysis, we determine which parameters are most significant in affecting the behavior of the model. That is, parameters with high sensitivities dramatically affect the solution, as the observations (T-cell concentrations in the blood, $ B $, tumor, $ T $, and spleen, $ S $) are most sensitive to those parameters, while parameters with low sensitivities have little influence on the model. The sensitivity of observation $ f $ to parameter estimates $ \theta $ is

    $ χ(θ,t)=f(t;θ)θ $ (2.7)

    where $ t $ is time, and $ f = T $, $ f = B $, or $ f = S $, and $ \theta $ is one of the parameters defined in Table 1 ($ \theta = {{\beta _{Ext}}} $ or $ \theta = {{\beta _{TB}}} $ or etc.). Since many of the parameters have different orders of magnitude (for example, $ \rho \approx 10^1 $ while $ \beta_{SB} \approx 10^{-4} $), it is useful to observe the normalized sensitivities,

    $ χn(θ,t)=f(t;θ)θθf(t;θ). $ (2.8)

    While general sensitivities, $ \chi $ look at how the data reacts to the parameters as a whole by taking the partial derivative of the output factor with respect to the input factor, normalized sensitivities, $ \chi_n $, are scaled down by dividing the derivative by the observation in order to compare each parameter against the other.

    The sensitivities and normalized sensitivities depend on both time $ t $ and the value of the parameter to which we calculate the observation's sensitivity, $ \theta $. Thus, for the calculation of sensitivities all parameters are fixed at the values listed in Table 1, and time $ t $ is allowed to vary. Since all of the parameter values are fixed, this is local (as opposed to global) sensitivity analysis. We use complex step method [25] to numerically evaluate the sensitivities in (2.7) and (2.8).

    For all four categories (UT, CAR, CAR+CXCR1, and CAR+CXCR2) the concentration of T-cells in the tumor, blood, and spleen, are most sensitive to parameters $ \beta{Ext} $ and $ \rho $. Since graphs of the sensitivities look very similar for the different treatment categories, we only show graphs of the sensitivities for the last treatment, CAR+CXCR2, where $ \rho = 30 $. In order to better compare the sensitivities to each parameter, for each observation ($ T $, $ B $, and $ S $) and parameter (see Table 1) we find the maximum sensitivity and maximum normalized sensitivity over time and plot these results in Figure 3. The full time-dependent sensitivities are plotted in Figures 4 and 5. Since the observations $ T $, $ B $, and $ S $ are consistently not sensitive to the initial conditions, sensitivities to $ T_0 $, $ B_0 $, and $ S_0 $ are not plotted in Figures 4 and 5.

    Figure 3.  Maximum sensitivities (a) and maximum normalized sensitivities (b) of each observation, $ T $, $ B $, and $ S $, to each of the parameters over a time period of 30 days. The expansion factor of T-cells in the tumor $ \rho = 30 $, and the initial number T-cells in the tumor, blood, and spleen $ [T_0, B_0, S_0] = [0, 10^6, 0] $, since this is data from the CAR+CXCR2 treatment group.
    Figure 4.  Sensitivities, $ \chi(t) $, of observation $ T $, $ B $, and $ S $ to parameters $ \beta_{Ext} $, $ \beta_{TB} $, $ \beta_{SB} $, and $ \rho $ over a time period of 30 days. These sensitivities are calculated at parameter values from Table 1 with $ \rho = 30 $ to represent the CAR+CXCR2 treatment.
    Figure 5.  Normalized sensitivities, $ \chi_n(t) $, of observation $ T $, $ B $, and $ S $ to parameters $ \beta_{Ext} $, $ \beta_{TB} $, $ \beta_{SB} $, and $ \rho $ over a time period of 30 days. These sensitivities are calculated at parameter values from Table 1 with $ \rho = 30 $ to represent the CAR+CXCR2 treatment.

    The observations of T-cells in the tumor, blood, and spleen in Figure 3a are most sensitive to the T-cell movement rates, $ {{\beta _{Ext}}} $, $ {{\beta _{TB}}} $, and $ {{\beta _{SB}}} $, followed by antigen recognition in the tumor, $ \rho $. T-cell counts are not very sensitive to the initial conditions, $ B_0 $, $ T_0 $, and $ S_0 $. These results are consistent with our model design. When sensitivities are normalized, the T-cell counts in Figure 3b are most sensitive to the rate at which T-cell exit the blood, $ {{\beta _{Ext}}} $, and antigen recognition in the tumor, $ \rho $ followed by the initial T-cell count in the blood, $ B_0 $. Thus, according to the normalized sensitivities, T-cell counts are most effected by parameters $ {{\beta _{Ext}}} $ and $ \rho $. Since $ \rho $ is the transient expansion factor of antigen recognition in the tumor and changes depending on the treatment, we choose to estimate this parameter. We also notice that these sensitivities are time-dependent. The normalized sensitivities in Figure 5 shows that the observations are most sensitive to $ \rho $ and $ \beta_{Ext} $ initially, while they are sensitive to $ \beta_{TB} $ and $ \beta_{SB} $ later in time. This is an indication of the behavior of the ODE. Indeed, as T-cells are shuttled out of the blood ($ \beta_{Ext} $ via $ \rho $) initially, they do so at a very dramatic rate, and the number of T-cells in the tumor, blood, and spleen are sensitive to those rates.

    Equations (2.1), (2.2), and (2.3) model the T-cell counts in an individual mouse, which is assumed to have a single value for T-cell flow from the tumor to the blood, from the blood to the spleen, etc. (i.e., a single value for each of the parameters listed in Table 1). However, this assumption does not apply to the agammaegate data set, in which different mice sacrificed and sampled at each time point may have different parameter values (e.g., $ \rho $, $ \beta_{Ext} $, etc.), varying over some range of values. Thus, using the individual model we formulate an agammaegate model with which we can compare the agammaegate data (see Chapter 5 of [2] for a more complete discussion). Based on the sensitivity analysis and our interest in the different types of treatment, we choose to estimate the probability distribution of $ \rho $, the transient expansion of T-cells in the tumor due to antigen recognition.

    Consider the deterministic T-cell population vector $ {\boldsymbol{x}}(t; \rho) = [T(t; \rho), B(t; \rho), S(t; \rho)]^T $ which is a solution to Eqs (2.1), (2.2), and (2.3) given parameter values in Table 1, where $ t $ is time and $ \rho $ is the transient expansion factor which we choose to estimate (see Section 2.3). There is an agammaegate population vector $ {\boldsymbol{u}}(t; p) = [u_T(t; P), u_B(t; P), u_S(t; P)]^T $ corresponding to the individual population vector $ {\boldsymbol{x}} $ and given by

    $ u(t;P)=E(x(t;)|P)=Gx(t;ρ)dP(ρ), $

    where $ \rho $ is now a random variable, $ \mathcal{G} $ is the collection of admissible parameter values for $ \rho $, and $ P $ is a probability measure on $ \mathcal{G} $. Note that $ {\boldsymbol{u}} $ is the expected value of $ {\boldsymbol{x}} $, which is also a random vector since it depends on the random variable $ \rho $. Under the assumption that the probability distribution, $ P $, possesses a probability density, $ p $, and assuming that $ \mathcal{G} = [\rho_l, \rho_u] $ is some closed interval, the population count is given by

    $ u(t;P)=ρuρlx(t;ρ)p(ρ)dρ, $ (3.1)

    where the density $ P' = \frac{dP}{d\rho} = p(\rho) $.

    Now that we have an agammaegate model which can be compared to the data, we follow techniques from Banks, Hu, and Thompson [2] and Banks, Bekele-Maxwell, Everett, Stephenson, Shao, and Morgenstern [26] to estimate parameters in our mathematical model. For the inverse problem, consider the $ 3 $-dimensional dynamical system, defined in (3.1) to be estimated using the data. We are interested in determining the probability density $ P' = p(\rho) $ which gives the best fit of the underlying model to the agammaegate data. However, this parameter estimation problem involves an infinite dimensional parameter space (the space $ \mathcal{P}(\mathcal{G}) $ of probability measures defined on the set $ \mathcal{G} $). Instead of using a specific probability density function in the agammaegate model, we use a family of finite approximations $ \mathcal{P}^M(\mathcal{G}) $. Based on [14,27,28], we are guaranteed convergence in the Prohorov metric. In our case the finite dimensional approximation $ \mathcal{P}^M(\mathcal{G}) $ to the probability measure space $ \mathcal{P}(\mathcal{G}) $ is defined using $ M $ linear splines.

    Let $ {\boldsymbol{u}}_j = [T_j, B_j, S_j]^T $ represent the T-cell count data or observations collected from the mice at time $ t_j $ for $ j = 1, ..., N $. We note that there is some uncertainty between the actual phenomenon, which is represented through the data, and in the above observation process. This uncertainty is accounted for in the statistical model (again, see Chapter 3 of [2] where it is explained that both a mathematical model and a statistical model are required to carry out an inverse problem properly with uncertainty in observations)

    $ Ujdata representation=u(tj;PM0)aggregate model+uγ(tj;PM0)Ejweighted error, $

    where $ P^M_A $ is the nominal probability density approximation. Note that $ P^M_0 $ has a density $ P^{M'} = \frac{dP^M}{d\rho} = p^M(\rho) $ which is defined using linear splines. The values $ {\boldsymbol{\gamma}} = [\gamma_1, \gamma_2, \gamma_3]^T, \gamma_i\geq 0 $ in the error term are weighting factors corresponding to each of the three agammaegate observations (T-cells in the tumor, blood, and spleen) respectively. These weighting factors can be calculated from real data before an inverse problem is completed [29], but since this manuscript only deals with simulated data, these techniques are not discussed. These represent the dependency of error on the dynamics, and the model itself corresponds to the choices of data.

    Note that $ \circ $ is the Hammond or Schur product of component wise multiplication of two vectors. The random error vectors $ {\boldsymbol{{{\mathcal{E}}}}}_j = [\mathcal{E}^1_j, \mathcal{E}^2_j, \mathcal{E}^3_j]^T $ corresponding to each of the three agammaegate observations (T-cells in the tumor, blood, and spleen) respectively are assumed to be independent and identically distributed (i.i.d.) with mean zero, $ \text{Var}(\mathcal{E}_j^1) = \sigma^2_{01} $, $ \text{Var}(\mathcal{E}_j^2) = \sigma^2_{02} $, and $ \text{Var}(\mathcal{E}_j^3) = \sigma^2_{03} $. The corresponding realizations (for the random vector $ {\boldsymbol{U}}_j $) are

    $ ujaggregate data=u(tj;PM0)aggregate model+uγ(tj;PM0)ϵjweighted error. $

    This multiplicative structure of the observational error in the above statistical model exists, because often in biological models the size of the resulting observation error is proportional to the size of the observations. A rather thorough discussion of these issues, along with concrete examples, is given in Chapter 3 of [2]. For $ {\boldsymbol{\gamma}} \geq 0 $, a generalized least squares method or an iterative reweighted weighted least squares (IRWLS) method [26] is appropriate to perform the inverse problem. In order to estimate $ \hat{P}^M \approx P^M_0 $ (or the corresponding density $ \hat{p}^M(\rho) = p^M(\rho) $), we want to minimize the distance between the collected data and agammaegate mathematical model, where the observables are weighted according to their variability and, for each observable, the observations over time are weighted unequally (once again, we refer the reader to [26] and [2] for a more detailed discussion and relevant examples). A detailed description of the IRWLS method applied to an agammaegate model as described above is outlined in section 4.1 of [30].

    If we assume $ {\boldsymbol{\gamma}} = [\gamma_1, \gamma_2, \gamma_3] = [0, 0, 0] $, then our statistical model is called an absolute error model and an ordinary least squares method is appropriate for parameter estimation. However, we believe it is more biologically realistic to assume the observation error is proportional to the size of the observed quantity i.e., a relative error model for our data sets and models investigated here. In the next section, we simulate agammaegate data to test our agammaegate model and the inverse problem.

    Now that we have ascertained some specific features of the behavior of the mathematical model and its parameters, we want to carry out a series of inverse problems to estimate the approximate probability distribution $ P_0^M $ (or its corresponding density $ p^M(\rho) $) of the desired parameter $ \rho $ for a given number of time observations. We attempt to estimate $ P_0^M $ using observations of the agammaegate engineered T-cell concentrations in the tumor $ T $, blood $ B $, and spleen $ S $ compartments. However, since currently available data sets contain data at only three distinct time observations, our inverse problem might not be feasible. Nonetheless we proceed in our efforts using a rather straightforward if unsophisticated approach to the question of how many mice must be sacrificed in order to reliably estimate a finite approximation $ P^M_0 $ to the desired parameter distribution of $ \rho $.

    In order to see this problem from a bigger picture and determine how many time points are needed to accurately estimate parameters, we now step away from the experimental data and simulate our own data. We wish to mirror the experiment the best way possible. That is, our simulation will assume agammaegate data per time point, and we will focus upon a parameter estimation for the CAR treatment, which assumes an parameter value of $ \rho\approx15 $ for the individual model. Thus, we will initialize our problem with an expected or assumed distribution of $ \rho $, where $ 10\leq\rho\leq 20 $.

    Our methods of data simulation are as follows: We first generate a normal distribution for $ \rho $, where $ \rho \sim \mathcal{N}(15, 1) $, that is, we assume that $ \rho $ has a mean value of $ \mu = 15 $ and has a standard deviation of $ \sigma = 1 $ to account for the differences in the mice. We can call this "actual" distribution $ P_A $ and the corresponding probability density function $ p_A(\rho) $. Since our actual experimental data comes from a time frame of 0 to 15 days, we simulate data based on this same time frame and assume the same initial conditions and parameter values from Table 1. Although it may be advantageous to sample more frequently at certain times in the experiment, for simplicity we only generate uniformly spaced time points. In order to simulate the sacrifice of five mice per time point as in the experiments, we generate five data points from the model for each observation time point. To do this, we use the distribution $ P_A $, from which we randomly draw out one value of $ \rho $ for each mouse. Thus, if we want to simulate $ n $ time points of data, we will draw $ n\times5 $ values of $ \rho_{ij} $ for $ i = 1, ..., 5 $ mice per $ j = 1, ..., n $ time points. This generates a set of five data points per time observation to which we add noise.

    Since we are using the iterative reweighted weighted least squares approach described in the previous section, we set the variance of the random error vectors to be $ \text{Var}(\mathcal{E}^1_j) = \text{Var}(\mathcal{E}^2_j) = \text{Var}(\mathcal{E}^3_j) = 0.01 $, and we set the weighting factors to be $ {\boldsymbol{\gamma}} = [0.5, 0.5, 0.5]^T $. We then average these simulated data values to give us one agammaegate data point per time point for each of the observations $ T $, $ B $, and $ S $. Thus, our simulated data takes the form

    $ uj=[Tj,Bj,Sj]T=155i=1[x(tj;ρij)+xγ(tj;ρij)ϵij] $ (3.2)

    where $ {\boldsymbol{\epsilon}}_{ij} $ are realizations of the normally distributed random vector $ {\boldsymbol{{{\mathcal{E}}}}}_{ij} $ for each of the $ i = 1, ..., 5 $ mice at each of the $ j = 1, ..., n $ time points.

    A qualitative image of this simulation for a simple time grid of $ n = 4 $ (or $ n = 3 $ data points, since we do not consider the first point at $ t = 0 $ to be a data point), for T-cells in the tumor only, can be seen in Figure 6ac. For the first time point, $ t = 5 $, 5 mice are simulated, the T-cell counts in their tumors are plotted in Figure 6a, and each of these counts have weighted noise added to them. The same is done for the second ($ t = 10 $ in Figure 6b) and third time points ($ t = 15 $ in Figure 6c) for a total of 15 simulated mice each with a different value of $ \rho $ and added weighted noise. For each time point, we also plot the average solutions (or average T-cell counts in the tumors of the five mice), and the average noisy T-cell count in the tumor, which forms the simulated data points $ T_j $ as described in (3.2). In Figure 6d the density of the "true" distribution of $ \rho $, $ p_A(\rho) $ and the corresponding 15 randomly chosen $ \rho_{ij} $ realizations (for $ i = 1, ..., 5 $ mice per $ j = 1, ..., n $ time points, $ t $, after $ t = 0 $) to generate our simulated data. Using this simulated data, we investigate the results of the inverse problems assuming availability of different numbers of equally spaced time point observations of the agammaegate T-cell concentrations in the tumor, blood, and spleen, and attempt to answer the question: how many time points are necessary for accurate parameter estimation?

    Figure 6.  For $ n = 4 $ time points, data is simulated from different $ \rho $ values per mouse per time point (dotted lines), with an initial time point of $ t = 0 $, and parameter values and initial conditions from Table 1. We add noise to the solution found at each time point (stars). These solutions are averaged (solid line) to show one single data point per time observation (big dot). For the sake of simplicity and illustration, these solutions show the T-cells in the tumor only.

    Before performing inverse problems, we establish the methodology used to determine how accurate an estimated probability distribution of $ \rho $ is using a given simulated data set. The estimated distribution $ \hat{P}^M $ (and its corresponding density $ \hat{p}^M $ will most likely differ from the "actual" distribution $ P_A $ (and its corresponding density $ p_A $). While we can visually compared these two probability density functions, it is useful to quantify their differences. Since the density function $ \hat{p}^M $ is defined using $ M $ linear spline functions, this function will have $ k = 0, ..., M $ spline nodes $ \rho_k $. Thus, we use the $ L^2 $ norm to calculate the difference between $ \hat{p}^M $ and the "actual" density $ p_A $ at each of these nodes by measuring the sum of squared differences between the approximated spline nodes and their corresponding solution on the normal curve. Thus, this $ L^2 $ norm is defined by

    $ Δp2=Mk=0|pA(ρk)ˆpM(ρk)|2 $ (3.3)

    where $ p_A(\rho) $ is the true PDF (probability density function) of $ P_A $, and $ \hat{p}^M(\rho) $ is the estimated spline approximation of the true PDF, calculated for each $ \rho_k $ nodes, $ k = 0, ..., M $. Similarly, the infinity norm, $ \| \Delta p \| _{\infty} $, which takes the maximum vectored error, is defined by

    $ Δp=max(|Δp0|,|Δp1|,...,|ΔpM|) $ (3.4)

    where $ \Delta p = (p_A(\rho_k)-\hat{p}^M(\rho_k)) $ for each $ \rho_k $ node, $ k = 0, ..., M $.

    In Case 1, we simulate as few as 4 time points and attempt to estimate the probability distribution. Figure 7a shows the results of the estimated distribution from the inverse problem graphed against the "actual" distribution of $ \rho $, which is $ \mathcal{N}(15, 1) $. As we see in Figure 7a, the estimated probability density of $ \rho $, $ \hat{p}^M $, which is shaded in, does not overlap with the "actual" probability density of $ \rho $, $ p_A $, which was previously assumed. (Note that the probability densities $ \hat{p}^M $ and $ p_A $ each have a corresponding probability distributions $ \hat{P}^M $ and $ P_A $, respectively.) To quantify this difference, we can look at the $ L^2 $ norm, $ \| \Delta p \|_2 = 1.27 $. In Figure 7b we plot $ n = 4 $ time points of the simulated agammaegate data, which were simulated under the assumption that $ \rho \sim\mathcal{N}(15, 1) $. As can be seen, there is a significant amount of systematic (non-random) error between the simulated data and the estimated solution. Figure 7c plots the residual errors between the approximated solution and the simulated data points for each of the three state solutions, $ T $, $ B $, and $ S $ (tumor, blood, and spleen). We see they are fairly normally distributed and decrease per time point. Errors are quite high for the number of T-cells in the tumor and blood solutions, but quite small for the spleen, which is most likely because the flow of T-cells to and from the spleen does not rely on the $ \rho $ parameter.

    Figure 7.  Case 1: $ n = 4 $ time points of simulated aggregate observations of the number of T-cells in the tumor $ u_T $, blood $ u_B $, and spleen $ u_S $, (b), assuming that $ \rho $ is normally distributed with mean $ \mu = 15 $ and standard deviation $ \sigma = 1 $, used to estimate the probability distribution $ \hat{P}^M $, (a). All parameter values except for $ \rho $ are set at values from Table 1.

    This discrepancy in Figure 7a is most likely due to the fact that 4 time observations is too few to glean sufficient information. As such, we get a skewed distribution of the parameter $ \rho $, which does not provide an accurate result as to the true dynamics of the system. Finally, we investigate the condition number of the Fisher Information Matrix (FIM). This condition number is used to determine the uncertainty in the estimated probability distribution and compare the uncertainty in different cases. The FIM is approximated using the sensitivity and covariance matrices, the method through which is described in [30]. It is known that if the condition number of the Fisher Information Matrix is very large, then there is more uncertainty regarding the probability density coefficient estimates. For case 1, the condition number of the FIM is $ 2.93 \times 10^{17} $. We note that we are using $ n = 4 $ data points to estimate $ M = 6 $ linear spline functions (which is defined using 7 spline nodes), so this is an ill-posed least squares problem (i.e., there are more unknown parameters than known data points), [31] which is unsolvable. Even if we attempt to use $ M = 2 $ spline functions (3 spline nodes), we are left with 0 degrees of freedom. Thus, more data points are needed.

    We now consider $ n = 8 $ time points of simulated agammaegate data. In this case, the estimated distribution $ \hat{P}^M $of $ \rho $ is bimodal, encompassing extreme sides of $ P_A $, and still fails to capture the "actual" probability distribution that was assumed, as can be seen in Figure 8a. The $ L^2 $ norm in this case is $ \| \Delta p \|_2 = 0.577 $, however, which shows a significant decrease from when $ n = 4 $, meaning that the approximation is improving. Figure 8b shows considerable less error between the simulated points and the solution based on the estimated probability distribution for $ \rho $. Figure 8c plots the residuals, and we see a lower order of magnitude for each error. The condition number of the FIM is $ 1.18\times 10^{17} $, which is still the same magnitude as in Case 1.

    Figure 8.  Case 2: $ n = 8 $ time points of simulated aggregate observations, assuming that $ \rho $ is normally distributed with mean $ \mu = 15 $ and standard deviation $ \sigma = 1 $, of the number of T-cells in the tumor $ T $, blood $ B $, and spleen $ S $, used to estimate the probability distribution of $ \rho $, and compared to estimated aggregate observations of $ T $, $ B $, and $ S $ given the estimated distribution. All parameter values except for $ \rho $ are set at values from Table 1.

    In Case 3, we slightly increase the number of time points to $ n = 12 $, and already see a dramatic increase in accuracy. Figure 9a demonstrates that the approximated probability density of $ \rho $ has overlapped with the "actual" density, with only a small deviation on the left, which can be quantified by the $ L^2 $ norm of $ \| \Delta p \|_2 = 0.174 $, which is even lower than previous cases. It can be seen in Figure 9b that the approximated solutions are matching the simulated data quite well. The residuals of the tumor, $ T $, and blood, $ B $, observations in Figure 9c are much larger than the residuals of the spleen observation $ S $, because the T-cell counts in the blood and spleen are much larger. The FIM is $ 5.2\times 10^{16} $, which is one order magnitude smaller than previous cases.

    Figure 9.  Case 3: $ n = 12 $ time points of simulated aggregate observations, assuming that $ \rho $ is normally distributed with mean $ \mu = 15 $ and standard deviation $ \sigma = 1 $, of the number of T-cells in the tumor $ T $, blood $ B $, and spleen $ S $, used to estimate the probability distribution of $ \rho $, and compared to estimated aggregate observations of $ T $, $ B $, and $ S $ given the estimated distribution. All parameter values except for $ \rho $ are set at values from Table 1.

    In Case 4, we increase the number of time points to 16. From Figure 10a, we see that the estimated probability density of $ \rho $ still matches the "actual" density very well, with a slight leftward difference. The $ L^2 $ norm is $ \| \Delta p \|_2 = 0.186 $, which is slightly larger than in Case 3. We still see a reasonable fit between the simulated data and the solution curve in Figure 10b, while Figure 10c shows that the residuals between the solution and data are similar in magnitude to residuals in Case 3. The condition number of the FIM is $ 3.45 \times 10^{16} $.

    Figure 10.  Case 4: $ n = 16 $ time points of simulated aggregate observations, assuming that $ \rho $ is normally distributed with mean $ \mu = 15 $ and standard deviation $ \sigma = 1 $, of the number of T-cells in the tumor $ T $, blood $ B $, and spleen $ S $, used to estimate the probability distribution of $ \rho $, and compared to estimated aggregate observations of $ T $, $ B $, and $ S $ given the estimated distribution. All parameter values except for $ \rho $ are set at values from Table 1.

    In Case 5, we double the number of simulated time points to $ n = 32 $. As in Case 4, the estimated probability density of $ \rho $ is aligned with the "actual" distribution, as seen in Figure 11a, with an $ L^2 $ norm of $ \| \Delta p \|_2 = 0.197 $, which i slightly larger than in Case 5. Thus, the approximation is becoming slightly worse as the number of time points increase. The condition number of the FIM is $ 1.67 \times 10^{16} $, which has a similar magnitude as Case 3 as well. Figure 11b shows that the approximated solution curve fits better with the simulated data as time points are increased. The residuals, plotted in Figure 11c, are also similar to the residuals plotted in Case 3.

    Figure 11.  Case 5: $ n = 32 $ time points of simulated aggregate observations, assuming that $ \rho $ is normally distributed with mean $ \mu = 15 $ and standard deviation $ \sigma = 1 $, of the number of T-cells in the tumor $ T $, blood $ B $, and spleen $ S $, used to estimate the probability distribution of $ \rho $, and compared to estimated aggregate observations of $ T $, $ B $, and $ S $ given the estimated distribution. All parameter values except for $ \rho $ are set at values from Table 1.

    Now that we notice a pattern of convergence between the approximated distribution and the actual distribution to which we are comparing, with little change as we increase the time points from $ n = 16 $ to $ n = 32 $, we can assume that our approximation has reached its peak and can produce no better results. Indeed, in Case 6, we look at a very small increase in time points, for $ n = 40 $. Figure 12a, shows that the approximated probability distribution for $ \rho $ has multiple modes, and now has outliers on the far left and right, deviating further from the "actual" distribution. Indeed, the $ L^2 $ norm for the probability distribution, $ \| \Delta p \|_2 = 0.227 $, is larger than previous cases. Likewise, the condition number of the FIM is $ 1.45 \times 10^{17} $, which is an order of magnitude larger than that for our best $ n $.

    Figure 12.  Case 6: $ n = 40 $ time points of simulated aggregate observations, assuming that $ \rho $ is normally distributed with mean $ \mu = 15 $ and standard deviation $ \sigma = 1 $, of the number of T-cells in the tumor $ T $, blood $ B $, and spleen $ S $, used to estimate the probability distribution of $ \rho $, and compared to estimated aggregate observations of $ T $, $ B $, and $ S $ given the estimated distribution. All parameter values except for $ \rho $ are set at values from Table 1.

    While Figures 12b and 12c show a good line fit, with random residuals, our estimated probability density is clearly inaccurate. This indicates that increasing the number of time points without bound will not consistently result in a better approximation. In fact, an $ n $ that is too large may actually provide an incorrect estimate. This phenomenon may be due to the fact that it is possible to run into trouble with spline-based methods for a very large number of data points. Indeed, this may produce an ill-posedness due to excessive computational error in the inverse problems, and is discussed more in [31,32].

    Sensitivity Analysis Takeaway: For all four categories (UT, CAR, CAR+CXCR1, and CAR+CXCR2) the model observations $ T $, $ B $, and $ S $ (the number of T-cells in the tumor, blood, and spleen, respectively) are most sensitive to parameters $ \rho $ and $ \beta_{Ext} $. Because of this, we can consider these parameters the most important when comparing the data to the model.

    Stability Analysis Takeaway: At different values of $ \rho $, the transient expansion factor of tumor antigen recognition by the immune system, the mathematical model described in (2.1)–(2.3) has different long-term behaviors. Behavior changes when $ \rho = 10 $. For $ \rho \leq 10 $, we see that the transient expansion factor is not enough and the T-cells essentially ignore the tumor in the long term. While T-cells in the blood decrease to zero, they do so slowly with a rise in the T-cells in the spleen. Biologically speaking, a situation in which $ \rho \leq 10 $ indicates that the body is not fighting the foreign object as it should, and the T-cells are not doing their job. For $ \rho > 10 $, we see that there is a large-enough transient expansion term for the T-cells to exit the blood and target the tumor, which is to be expected in a cancer treatment, and is thus biologically relevant. In this case, T-cells in the blood both decrease to zero and increase in the tumor very quickly. It should be noted that in our analysis, we look at both short and long term behavior. In actuality, our specific model will only consider a time-line of, at most, a few years (several hundred days). However, it is important to understand long term behavior of the system.

    Parameter Estimation Takeaway: In order to save on costly experiments, we should use the minimum number of data points necessary to feasibly estimate the probability distribution of our parameter of interest, $ \rho $. Utilizing our estimation methods and the agammaegate version of our model with fixed parameters set at biologically relevant values (see Table 1), we find that between $ n = 12 $ time points results in the most accurate estimated distribution of $ \rho $. With $ n < 12 $ time points, we have inaccurate results, and with $ n > 32 $, not only do our results not improve, but they become less accurate.

    Since we are simulating data, it is possible to compare the estimated and "actual" probability distributions. By investigating norms of these differences, we can see how the number of time points, $ n $, influences the error between the true probability density function and the approximated spline estimation. Figure 13a shows that for smaller values of $ n $, the error between the true PDF of $ P_A $, $ p_A(\rho) $, and the approximated spline estimation, $ \hat{p}^M(\rho) $ is highest, and decreases the most at $ n = 12 $. It then slowly starts to rise again, but not dramatically. Still, though, the $ L^2 $ norm only increases as $ n $ increases for even the largest values, reinforcing the fact that the ideal approximation for this problem does depend on an ideal $ n $. Figure 13b shows the result of the infinity norms as we increase our number of time points taken. Again, we see that the maximum errors are highest for small values of $ n $, and decrease for $ n = 12 $ and $ n = 40 $. We do know that the approximation is not ideal for $ n = 40 $, from the $ L^2 $ norm, but its maximum norm is indeed smallest.

    Figure 13.  A comparison of errors and accuracy between the approximated spline estimation solution for the parameter $ \rho $, and the true probability density function of $ P_A $. We use both the $ L^2 $ norm and the infinity norm, as well as the condition number for the Fisher Information Matrix. All results are based on approximated and simulated data, assuming the CAR treatment in which we assume $ 10 \leq \rho \leq 20 $.

    Finally, we see from Figure 13c that the condition number of the Fisher Information Matrix (FIM) for the approximate solutions is highest at both very small and large values of $ n $. As we approach the ideal number of time points for an accurate approximation, the condition number dips down. We can see that for all solutions, the magnitude of the condition number for the FIM is $ 10^{16}-10^{17} $, which is somewhat high. While there is actually no ideal value for the condition number of the FIM, a smaller number indicates that the uncertainty factor in the approximation is improving. Regardless, it is a useful tool to compare conditions of different inverse problems, in a relative rather than absolute fashion.

    Although these methods lead to satisfactory results that inform future data collection, our method of experimental design is still very elementary. (For example, the agammaegate time points are assumed to be equally spaced, which limits the possibilities for experimental design.) None of these efforts involve use of the sophisticated optimal experimental design formulations outlined in Section 6 below. An obvious next step would include use of these design ideas to attempt to further refine the number of and specific times of the needed observations to successfully carry out the needed distributional estimations with agammaegate data.

    We turn to the question of how best to design experiments to collect data (how much data? and when to collect it?) necessary to validate models with only agammaegate data available. To this point we have discussed various aspects of uncertainty arising in inverse problem techniques. All discussions have been in the context of a given set or sets of data carried out under various assumptions on how (e.g., independent sampling, absolute measurement error, relative measurement error) the data were collected. For many years now [33,34,35,36,37,38,39,40] scientists (and especially engineers) have been actively involved in designing experimental protocols to best study engineering systems, including parameter- describing mechanisms. Recently, with increased involvement of scientists working in collaborative efforts with ecologists, biologists, and quantitative life scientists, renewed interest in design of "best" experiments to elucidate mechanisms has been seen [33]. Thus, a major question that experimentalists and inverse problem investigators alike often face is how best to collect the data to enable one to efficiently and accurately estimate model parameters. This is the well-known and widely studied optimal design problem. A rather through review is given in [2]. Briefly, traditional optimal design methods (D-optimal, E-optimal, c-optimal) [34,35,36,37] use information from the model to find the sampling distribution or mesh for the observation times (and/or locations in spatially distributed problems) that minimizes a design criterion, quite often a function of the Fisher Information Matrix (FIM). Experimental data taken on this optimal mesh are then expected to result in accurate parameter estimates. We briefly mention a framework based on the FIM for a system of ordinary differential equations (ODEs) to determine when an experimenter should take samples and what variables to measure when collecting information on a physical or biological process modeled by a dynamical system.

    Inverse problem methodologies are often discussed in the context of a dynamical system or mathematical model where a sufficient number of observations of one or more states (variables) are available. The choice of method depends on assumptions the modeler makes on the form of the error between the model and the observations (the statistical model). The most prevalent source of error is observation error, which is made when collecting data. (One can also consider model error, which originates from the differences between the model and the underlying process that the model describes. However, this is often quite difficult to quantify.) Measurement error is most readily discussed in the context of statistical models. The three techniques commonly addressed are maximum likelihood estimation (MLE), used when the probability distribution form of the error is known; ordinary least squares (OLS), for error with constant variance across observations; and generalized least squares (GLS), used when the variance of the data can be expressed as a non-constant function. Uncertainty quantification is also described for optimization problems of this type, namely in the form of observation error covariances, standard errors, residual plots, and sensitivity matrices. Techniques to approximate the variance of the error are also included in these discussions. In [41], the authors develop an experimental design theory using the FIM to identify optimal sampling times for experiments on physical processes (modeled by an ODE system) in which scalar or vector data is taken.

    In addition to when to take samples, the question of what variables to measure is also very important in designing effective experiments, especially when the number of state variables is large. Use of such a methodology to optimize what to measure would further reduce testing costs by eliminating extra experiments to measure variables neglected in previous trials [42]. In the CAR-T therapy example presented in this paper, it may only be necessary to measure, for example, the T-cells in the blood and the tumor or the tumor and the spleen, etc in order to obtain accurate estimations of $ \rho $. In [43], the best set of variables for an ODE system modeling the Calvin cycle is identified using two methods. The first, an ad-hoc statistical method, determines which variables directly influence an output of interest at any one particular time. Such a method does not utilize the information on the underlying time-varying processes given by the dynamical system model. The second method is based on optimal design ideas. Extension of this method is developed in [44,45]. Specifically, in [44] the authors compare the SE-optimal design introduced in [46] and [41] with the well-known methods of D-optimal and E-optimal design on a six-compartment HIV model [47] and a thirty-one dimensional model of the Calvin Cycle. Such models, in which a wide range of possible observational variables exist, are not only ideal through which to test the proposed methodology, but are also widely encountered in applications.

    This research was supported in part by the Air Force Office of Scientific Research under grant number AFOSR FA9550-18-1-0457.

    The authors declare there is no conflict of interest.

    [1] Igo-Kemenes T, Hörz W, Zachau HG (1982) Chromatin. Ann Rev Biochem 51: 89–121. doi: 10.1146/annurev.bi.51.070182.000513
    [2] Luger K, Mäder AW, Richmond RK, et al. (1997) Crystal structure of the nucleosome core particle at 2.8 Å resolution. Nature 389: 251–260.
    [3] Millar CB, Grunstein M (2006) Genome-wide patterns of histone modifications in yeast. Nat Rev Mol Cell Biol 7: 657–666. doi: 10.1038/nrm1986
    [4] Li B, Carey M, Workman JL (2007) The role of chromatin during transcription. Cell 128: 707–719. doi: 10.1016/j.cell.2007.01.015
    [5] Luger K, Richmond TJ (1998) The histone tails of the nucleosome. Curr Opin Genet Dev 8: 140–146. doi: 10.1016/S0959-437X(98)80134-2
    [6] Sasaki K, Ito T, Nishino N, et al. (2009) Real-time imaging of histone H4 hyperacetylation in living cells. Proc Natl Acad Sci USA 106: 16257–16262. doi: 10.1073/pnas.0902150106
    [7] Yoshida M, Horinouchi S, Beppu T (1995) Trichostatin A and trapoxin: novel chemical probes for the role of histone acetylation in chromatin structure and function. BioEssays 17: 423–430. doi: 10.1002/bies.950170510
    [8] Nishida H, Motoyama T, Suzuki Y, et al. (2010) Genome-wide maps of mononucleosomes and dinucleosomes containing hyperacetylated histones of Aspergillus fumigatus. PLOS ONE 5: e9916. doi: 10.1371/journal.pone.0009916
    [9] Liu Y, Leigh JW, Brinkmann H, et al. (2009) Phylogenomic analyses support the monophyly of Taphrinomycotina, including Schizosaccharomyces fission yeasts. Mol Biol Evol 26: 27–34.
    [10] Nishida H, Sugiyama J (1994) Archiascomycetes: detection of a major new lineage within the Ascomycota. Mycoscience 35: 361–366. doi: 10.1007/BF02268506
    [11] Goto S, Sugiyama J, Hamamoto M, et al. (1987) Saitoella, a new anamorph genus in the Cryptococcaceae to accommodate two Himalayan yeast isolates formerly identified as Rhodotorula glutinis. J Gen Appl Microbiol 33: 75–85. doi: 10.2323/jgam.33.75
    [12] Nishida H, Hamamoto M, Sugiyama J (2011) Draft genome sequencing of the enigmatic yeast Saitoella complicata. J Gen Appl Microbiol 57: 243–246. doi: 10.2323/jgam.57.243
    [13] Nishida H, Matsumoto T, Kondo S, et al. (2014) The early diverging ascomycetous budding yeast Saitoella complicata has three histone deacetylases belonging to the Clr6, Hos2, and Rpd3 lineages. J Gen Appl Microbiol 60: 7–12. doi: 10.2323/jgam.60.7
    [14] Ekwall K, Olsson T, Tumer BM, et al. (1997) Transient inhibitor of histone deacetylation alters the structural and functional imprint at fission yeast centromeres. Cell 91: 1021–1032. doi: 10.1016/S0092-8674(00)80492-4
    [15] Yamauchi K, Kondo S, Hamamoto M, et al. (2015) Draft genome sequence of the archiascomycetous yeast Saitoella complicata. Genome Announc 3: e00220–15.
    [16] Trapnell C, Pachter L, Salzberg SL (2009) TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25: 1105–1111. doi: 10.1093/bioinformatics/btp120
    [17] Li H, Handsaker B, Wysoker A, et al. (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25: 2078–2079. doi: 10.1093/bioinformatics/btp352
    [18] Ogasawara O, Mashima J, Kodama Y, et al. (2013) DDBJ new system and service refactoring. Nucleic Acids Res 41: D25–D29. doi: 10.1093/nar/gks1152
    [19] Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for different expression analysis of digital gene expression data. Bioinformatics 26: 139–140. doi: 10.1093/bioinformatics/btp616
    [20] Valouev A, Johnson SM, Boyd SD, et al. (2011) Determinants of nucleosome organization in primary human cells. Nature 474: 516–520. doi: 10.1038/nature10002
    [21] Kaplan N, Moore IK, Fondufe-Mittendorf Y, et al. (2008) The DNA-encoded nucleosome organization of a eukaryotic genome. Nature 458: 362–366.
    [22] Altschul SF, Gish W, Miller W, et al. (1990) Basic local alignment search tool. J Mol Biol 215: 403–410. doi: 10.1016/S0022-2836(05)80360-2
    [23] Amit M, Donyo M, Hollander D, et al. (2012) Differential GC content between exons and introns establishes distinct strategies of splice-site recognition. Cell Rep 1: 543–556. doi: 10.1016/j.celrep.2012.03.013
    [24] Nishida H, Katayama T, Suzuki Y, et al. (2013) Base composition and nucleosome density in exonic and intronic regions in genes of the filamentous ascomycetes Aspergillus nidulans and Aspergillus oryzae. Gene 525: 5–10. doi: 10.1016/j.gene.2013.04.077
    [25] Yu J, Hu S, Wang J, et al. (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296: 79–92.
    [26] Nishida H, Kondo S, Matsumoto T, et al. (2012) Characteristics of nucleosomes and linker DNA regions on the genome of the basidiomycete Mixia osmundae revealed by mono- and dinucleosome mapping. Open Biol 2: 120043. doi: 10.1098/rsob.120043
    [27] Graessle S, Dangl M, Haas H, et al. (2000) Characterization of two putative histone deacetylase genes from Aspergillus nidulans. Biochim Biophys Acta 1492: 120–126. doi: 10.1016/S0167-4781(00)00093-2
    [28] Andersson R, Enroth S, Rada-Iglesias A, et al. (2009) Nucleosomes are well positioned in exons and carry characteristic histone modifications. Genome Res 19: 1732–1741. doi: 10.1101/gr.092353.109
    [29] Chen W, Luo L, Zhang L (2010) The organization of nucleosomes around splice sites. Nucleic Acids Res 38: 2788–2798. doi: 10.1093/nar/gkq007
    [30] Schwartz S, Ast G (2010) Chromatin density and splicing density: on the cross-talk between chromatin structure and splicing. EMBO J 29: 1629–1636. doi: 10.1038/emboj.2010.71
    [31] Schwartz S, Meshorer E, Ast G (2009) Chromatin organization marks exon–intron structure. Nat Struct Mol Biol 16: 990–995. doi: 10.1038/nsmb.1659
    [32] Tilgner H, Nikolaou C, Althammer S, et al. (2009) Nucleosome positioning as a determinant of exon recognition. Nat Struct Mol Biol 16: 996–1001. doi: 10.1038/nsmb.1658
    [33] Tsankov A, Yanagisawa Y, Rhind N, et al. (2011) Evolutionary divergence of intrinsic and trans-regulated nucleosome positioning sequences reveals plastic rules for chromatin organization. Genome Res 21: 1851–1862. doi: 10.1101/gr.122267.111
    [34] Shim YS, Choi Y, Kang K, et al. (2012) Hrp3 controls nucleosome positioning to suppress non-coding transcription in eu- and heterochromatin. EMBO J 31: 4375–4387. doi: 10.1038/emboj.2012.267
  • This article has been cited by:

    1. Joseph Meyers, Jonathan Rogers, Adam Gerlach, Koopman operator method for solution of generalized aggregate data inverse problems, 2021, 428, 00219991, 110082, 10.1016/j.jcp.2020.110082
    2. Ujwani Nukala, Marisabel Rodriguez Messan, Osman N. Yogurtcu, Xiaofei Wang, Hong Yang, A Systematic Review of the Efforts and Hindrances of Modeling and Simulation of CAR T-cell Therapy, 2021, 23, 1550-7416, 10.1208/s12248-021-00579-9
    3. Tina Giorgadze, Henning Fischel, Ansel Tessier, Kerri-Ann Norton, Investigating Two Modes of Cancer-Associated Antigen Heterogeneity in an Agent-Based Model of Chimeric Antigen Receptor T-Cell Therapy, 2022, 11, 2073-4409, 3165, 10.3390/cells11193165
    4. Kyle Nguyen, Erica M. Rutter, Kevin B. Flores, Estimation of Parameter Distributions for Reaction-Diffusion Equations with Competition using Aggregate Spatiotemporal Data, 2023, 85, 0092-8240, 10.1007/s11538-023-01162-3
    5. Elena Villalón, Qian Yang, Carlos A. Sing Long, Tikhonov regularization as a nonparametric method for uncertainty quantification in aggregate data problems, 2024, 513, 00219991, 113141, 10.1016/j.jcp.2024.113141
    6. H.T. Banks, Annabel E. Meade, Celia Schacht, Jared Catenacci, W. Clayton Thompson, Daniel Abate-Daga, Heiko Enderling, Parameter estimation using aggregate data, 2020, 100, 08939659, 105999, 10.1016/j.aml.2019.105999
    7. Natalie Meacham, Erica M. Rutter, Estimating treatment sensitivity in synthetic and in vitro tumors using a random differential equation model, 2025, 11, 2056-7189, 10.1038/s41540-025-00530-0
  • Reader Comments
  • © 2016 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(6513) PDF downloads(1391) Cited by(2)

Figures and Tables

Figures(7)  /  Tables(2)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog