Research article Special Issues

Generalized Richards model for predicting COVID-19 dynamics in Saudi Arabia based on particle swarm optimization Algorithm

  • COVID-19 pandemic is spreading around the world becoming thus a serious concern for health, economic and social systems worldwide. In such situation, predicting as accurately as possible the future dynamics of the virus is a challenging problem for scientists and decision-makers. In this paper, four phenomenological epidemic models as well as Suspected-Infected-Recovered (SIR) model are investigated for predicting the cumulative number of infected cases in Saudi Arabia in addition to the probable end-date of the outbreak. The prediction problem is formulated as an optimization framework and solved using a Particle Swarm Optimization (PSO) algorithm. The Generalized Richards Model (GRM) has been found to be the best one in achieving two objectives: first, fitting the collected data (covering 223 days between March 2nd and October 10, 2020) with the lowest mean absolute percentage error (MAPE = 3.2889%), the highest coefficient of determination (R2 = 0.9953) and the lowest root mean squared error (RMSE = 8827); and second, predicting a probable end date found to be around the end of December 2020 with a projected number of 378,299 at the end of the outbreak. The obtained results may help the decision-makers to take suitable decisions related to the pandemic mitigation and containment and provide clear understanding of the virus dynamics in Saudi Arabia.

    Citation: Rafat Zreiq, Souad Kamel, Sahbi Boubaker, Asma A Al-Shammary, Fahad D Algahtani, Fares Alshammari. Generalized Richards model for predicting COVID-19 dynamics in Saudi Arabia based on particle swarm optimization Algorithm[J]. AIMS Public Health, 2020, 7(4): 828-843. doi: 10.3934/publichealth.2020064

    Related Papers:

    [1] Huanhai Yang, Shue Liu . A prediction model of aquaculture water quality based on multiscale decomposition. Mathematical Biosciences and Engineering, 2021, 18(6): 7561-7579. doi: 10.3934/mbe.2021374
    [2] Edwin Duque-Marín, Alejandro Rojas-Palma, Marcos Carrasco-Benavides . A soil water indicator for a dynamic model of crop and soil water interaction. Mathematical Biosciences and Engineering, 2023, 20(8): 13881-13899. doi: 10.3934/mbe.2023618
    [3] Anna Adamczak Bugno, Aleksandra Krampikowska . The basics of a system for evaluation of fiber-cement materials based on acoustic emission and time-frequency analysis. Mathematical Biosciences and Engineering, 2020, 17(3): 2218-2235. doi: 10.3934/mbe.2020118
    [4] Lemuel Clark Velasco, John Frail Bongat, Ched Castillon, Jezreil Laurente, Emily Tabanao . Days-ahead water level forecasting using artificial neural networks for watersheds. Mathematical Biosciences and Engineering, 2023, 20(1): 758-774. doi: 10.3934/mbe.2023035
    [5] Shu Xu, Yichang Wei, Abdul Hafeez Laghari, Xianming Yang, Tongchao Wang . Modelling effect of different irrigation methods on spring maize yield, water and nitrogen use efficiencies in the North China Plain. Mathematical Biosciences and Engineering, 2021, 18(6): 9651-9668. doi: 10.3934/mbe.2021472
    [6] Daniel Cervantes, Miguel angel Moreles, Joaquin Peña, Alonso Ramirez-Manzanares . A computational method for the covariance matrix associated with extracellular diffusivity on disordered models of cylindrical brain axons. Mathematical Biosciences and Engineering, 2021, 18(5): 4961-4970. doi: 10.3934/mbe.2021252
    [7] Kangsen Huang, Zimin Wang . Research on robust fuzzy logic sliding mode control of Two-DOF intelligent underwater manipulators. Mathematical Biosciences and Engineering, 2023, 20(9): 16279-16303. doi: 10.3934/mbe.2023727
    [8] Yangyan Zeng, Yidong Zhou, Wenzhi Cao, Dongbin Hu, Yueping Luo, Haiting Pan . Big data analysis of water quality monitoring results from the Xiang River and an impact analysis of pollution management policies. Mathematical Biosciences and Engineering, 2023, 20(5): 9443-9469. doi: 10.3934/mbe.2023415
    [9] Yikang Xu, Zhaohua Sun, Wei Gu, Wangping Qian, Qiangru Shen, Jian Gong . Three-dimensional inversion analysis of transient electromagnetic response signals of water-bearing abnormal bodies in tunnels based on numerical characteristic parameters. Mathematical Biosciences and Engineering, 2023, 20(1): 1106-1121. doi: 10.3934/mbe.2023051
    [10] Ming Chen, Meng Fan, Xing Yuan, Huaiping Zhu . Effect of seasonal changing temperature on the growth of phytoplankton. Mathematical Biosciences and Engineering, 2017, 14(5&6): 1091-1117. doi: 10.3934/mbe.2017057
  • COVID-19 pandemic is spreading around the world becoming thus a serious concern for health, economic and social systems worldwide. In such situation, predicting as accurately as possible the future dynamics of the virus is a challenging problem for scientists and decision-makers. In this paper, four phenomenological epidemic models as well as Suspected-Infected-Recovered (SIR) model are investigated for predicting the cumulative number of infected cases in Saudi Arabia in addition to the probable end-date of the outbreak. The prediction problem is formulated as an optimization framework and solved using a Particle Swarm Optimization (PSO) algorithm. The Generalized Richards Model (GRM) has been found to be the best one in achieving two objectives: first, fitting the collected data (covering 223 days between March 2nd and October 10, 2020) with the lowest mean absolute percentage error (MAPE = 3.2889%), the highest coefficient of determination (R2 = 0.9953) and the lowest root mean squared error (RMSE = 8827); and second, predicting a probable end date found to be around the end of December 2020 with a projected number of 378,299 at the end of the outbreak. The obtained results may help the decision-makers to take suitable decisions related to the pandemic mitigation and containment and provide clear understanding of the virus dynamics in Saudi Arabia.


    1. Introduction

    Eukaryotic DNA is spatially organized in a hierarchical manner, the first level being the chain of nucleosomes connected by DNA linkers (“beads on a string”). Stereochemical details of the second level of DNA organization, the so-called 30-nm fiber, still remain a subject of debate [1,2,3,4], despite the fact that the structure of the nucleosome was determined at a high resolution more than 10 years ago [5].

    During the past decade there has been a significant progress in the structural studies of chromatin, starting with the seminal studies of Richmond and co-authors, who first observed the two-start fibers (Dorigo et al. [6] and Schalch et al. [7]). Their results, together with the high-resolution Cryo-EM data obtained by Song et al. [8] strongly support the two-start organization of chromatin fibers for relatively short linkers L = 20, 30 and 40 bp. However, the electron microscopy (EM) images presented by Rhodes and co-authors [9,10] suggest that for L = 50 bp and longer, the one-start solenoid (or interdigitaded structure) is more stable, especially in the presence of linker histones.

    All these structures were obtained for arrays of strongly positioned “601” nucleosomes [11], with the nucleosome repeat length (NRL) varying from 167 to 237 bp in increments of 10 bp. Assuming that the nucleosome core is 147 bp [5], this means that the linker length varies from 20 to 90 bp—that is, L belongs to the {10n} series. On the other hand, it is known that in vivo, the linker sizes are close to {10n + 5} values, at least for yeast and mouse [12,13]; in particular, L = 15 bp is predominant in the baker’s yeast chromatin [12,14,15]. Thus, the structural data mentioned above correspond to linker lengths with occurrences in vivo that are relatively small.

    Only recently, Grigoryev and co-authors analyzed nucleosomal arrays with linker lengths belonging both to the series {10n} and {10n + 5} (Correll et al. [16]). Using sedimentation and EM visualization, they demonstrated that fibers with L = 25 bp (NRL = 172 bp) have less propensity to fold in a compact state compared to L = 20 or 30 bp. The increased fiber “plasticity” observed for L = 10n + 5 may be functionally significant because the {10n + 5} values are frequently found in vivo.

    The distinct folding pathways observed for fibers with L = 10n and L = 10n + 5 [16] indicate that these fibers may have different configurations, and in particular, different spatial arrangements of DNA linkers. This polymorphism may be related to the early studies by Worcel et al. [17], Woodcock et al. [18] and Williams et al. [19] whose space-filling models of the chromatin fiber were presented with the DNA linking number ΔLk varying from −1 to −2 per nucleosome, depending on the DNA trajectory. (At that time, the ΔLk values were not calculated precisely, but rather were estimated qualitatively.)

    Generally speaking, this variability of ΔLk can be attributed to the existence of different fiber topoisomers corresponding to different levels of DNA supecoiling, which may have far-reaching biological implications, from local protein-DNA interactions [20] to large-scale regulation of transcription [21]. However, the topological properties have not been analyzed in computational studies of regular chromatin fibers published recently [22,23]. Therefore, in this study, we performed a thorough computational analysis of regular two-start fibers, focusing on their topology.

    Earlier, we presented results of computations for relatively short linkers L < 40 bp [24]. Here, we analyze chromatin fibers with the linker lengths varying from 10 to 70 bp, and confirm the main conclusions made earlier. First, we describe two families of conformations—one (denoted T2) represents the structures visualized in the X-ray and Cryo-EM studies [7,8], and the other (denoted T1) comprises “novel” structures not observed experimentally so far. The fibers from the two families are characterized by different ΔLk values—that is, they are topologically different. Second, there is a strict relationship between the type of topoisomer and the linker length: for linkers L = 10n, the “old” topoisomer T2 is energetically optimal, whereas for linkers L = 10n + 5 the “novel”topoisomer T1 is more favorable. These features are valid for the whole interval of the linker lengths. There are certain differences between the “short” and “long” linkers, however. In particular, for linkers L > 50 bp the energy minima become more shallow and the energy barriers between the T1 and T2 minima less pronounced. This tendency is consistent with the experimental data mentioned above [9,10].

    Next, we analyze how the DNA linking number ΔLk depends on the finite size of chromatin fiber. The obtained results are critical for quantitative interpretation of the experimental results on the level of DNA supercoiling in circular closed DNA (ccDNA) with reconstructed nucleosomal arrays [25,26,27].

    Finally, we present convincing evidence that the NRL observed in vivo correlates with the level of gene expression in yeast. Since the NRL (and the inter-nucleosome linker L) define the DNA supercoiling density, we consider this particular result as a reflection of the more general interrelationship between the “local” topological polymorphism of chromatin fibers and the “global” topological changes in DNA occurring during transcription.

    2. Methods

    2.1. Geometry of chromatin fibers and energy calculation

    Positions of nucleosomes in a regular symmetrical fiber can be described by four parameters (Figure 1). (The chromatin fiber is called regular if all inter-nucleosome linkers have the same conformation and symmetrical if its structure does not change when it is turned upside down.) The three cylindricalcoordinates define positions of the nucleosome centers: radius r, rise h and polar angle φ. This angle determines the handedness of the fiber and number of stacks. For instance, the left-handed two-stack fibers correspond to φ between 150° and 180°. The fourth parameter, ρ, defines inclination of nucleosomes relative to the fiber axis. This corresponds to a rotation around the dyad axis X (Figure 1). This parameter plays a special role in our analysis because it determines the twist in linker DNA and the topological properties of the fiber in general [24].

    Figure 1. Definition of the four parameters describing the conformation of a regular symmetrical chromatin fiber [24]. These are three cylindrical parameters h (rise), r (radius) and polar angle φ, and the internal inclination angle ρ. Each nucleosome is associated with a right-handed coordinate frame in which axis Z represents the superhelical axis of the nucleosome (calculated as described earlier [62]), axis X points toward the nucleosome dyad, and axis Y is perpendicular to X and Z. The inclination angle ρ defines rotation of a nucleosome around the dyad axis X. The entry point of nucleosome ν1 is shown as a red ball. The image was prepared with the VMD software package [63].

    The nucleosome core particles (DNA and histones) are considered to be rigid in our computations, with the coordinates taken from the high-resolution crystal structure [5].

    To find the optimal conformation of the DNA linker connecting two nucleosomes (Figure 1) we use a “mesoscopic”approach [28] in which DNA is modeled at the level of dimeric steps, and its trajectory is described by the six base-pair step parameters Twist, Roll, Slide, etc. [29]. The elastic energy of the linker DNA deformation is calculated using the knowledge-based potential functions introduced by Olson et al. [30]. The stiffness constants, including the cross correlations (such as Twist-Roll) are taken as averages for all 16 dinucleotides. For rest-state values, we use the average helical parameters of B-DNA: Twist = 34.5° and Rise = 3.35 Å; the other rest-state values, such as Slide, are taken to be zero. The DNA linker minimization is nested in the outer cycle in which the total energy of the nucleosome fiber is minimized as a function of the four parameters defining the fiber configuration (Figure 1).

    Four energy terms are considered in our calculations: the elastic energy of linker DNA (see above), the repulsive and electrostatic interactions (calculated using the Coulomb potential with 30 Å distance cutoff), and the H3 tail—acidic patch interactions between the adjacent nucleosomes, which are modeled phenomenologically as described earlier [24].

    2.2. DNA topology

    The DNA topology is described by three parameters: ΔTw (the change in DNA twisting), DNA writhing, Wr, and the change in the linking number, ΔLk (compared to the relaxed state of DNA) [31,32,33]. They are related by the well-known equation: ΔLk = Wr + ΔTw. This equation is valid for closed circular DNA; therefore, we need to find an effective way to build the closed DNA trajectory. We add four extra points connecting the ends of DNA in a way that these points remain in one plane with the fiber axis and the closing chain does not pass through nucleosomes (Figure 2). In this case, the DNA fragments connecting these four points do not introduce additional writhing because they are in the same plane. Note that our approach is similar to that used by Fuller to close the ends of a ribbon wound in a regular superhelix (see Figure 4 in [31]).

    Figure 2. Closing the fiber ends to compute the DNA writhing, Wr. Two points are chosen on the fiber axis, 350 Å above and below the fiber periphery. The other two points are obtained by shifting the first two points by 500 Å in the plane defined by the fiber axis and the entry point shown as a red ball. The direction of shift is perpendicular to the fiber axis.

    To calculate the DNA writhing, we use the quadrangle method that was first derived by Levitt [34] for the description of protein folding and was later used for DNA by Klenin and Langowski [35]. The DNA trajectory is represented by a polygonal chain with the vertex points at the centers of base pairs. The DNA twisting is determined using the Euler angle formalism [36,37] implemented in CompDNA [38], 3DNA [39] and SCHNAaP [40] software.

    3. Results

    3.1. Energetically optimal conformations

    As mentioned above, the two series of linker length, L = 10n and 10n + 5, have been the focus of investigators, the former because chromatin fibers with L = 10n are more stable in vitro, and the latter because linkers L = 10n + 5 are observed in vivo. Therefore, we paid most attention to these values of L.

    Our computations demonstrate significant variability of two-start chromatin fibers (Figure 3), which is the result of interplay between the linker DNA twisting (naturally depending on the linker length) and the inclination of nucleosomal disks (Figure 1). For example, note the nearly two-fold increase in diameter of fibers accompanying a linker increase from 10-15 to 60-65 bp. This trend is to be expected because the linker length dictates the diameter. On the other hand, the fiber extension clearly visible for the short linkers, L = 10-20 bp, is apparently favorable, as it helps to avoid steric clashes between nucleosomes when their orientation is “almost horizontal”.

    Figure 3. Two-start chromatin fibers for variable linker sizes. Optimal structures are shown for the two series of linker lengths, i.e. for L = 10n (A) and L = 10n + 5 (B). For the {10n} series, the inclination angle ρ varies from −80°to −30°. For the {10n + 5} series, the angle ρ changes from 80° to 110°. In this case, the nucleosomes are “almost” parallel to the fiber axis. Note that the fiber dimensions vary significantly for both series, but overall the fibers for L = 10n are more extended than for L = 10n + 5. The most compact forms (with the smallest rise) occur for L = 40-45 bp. For the shorter linkers L = 10-25 bp, in order to avoid clashes between linkers and nucleosomes, the rise is higher. In both series the diameter increases linearly with the linker length. The structures for L = 20, 25 and 30 bp were presented earlier [24]. The fibers shown in (A) and in (B) belong to different topological families, denoted T2 and T1, respectively. As shown in Figure 5, the two series of structures are distinguished by the DNA linking number.

    Comparison of our calculated structures with those observed experimentally is possible only for L = 20 to 40 bp (Supplementary Figure S1). Our optimal structures are remarkably similar to those visualized by Cryo-EM [8] for L = 30 and 40 bp in terms of the super-helical parameters shown in Figure 1. For example, the parameter rise is 24 Å and 22.5 Å in the experimental structures, while our computations predict the values 25 Å and 20 Å (for L = 30 and 40 bp, respectively). In the case of L = 20 bp, we cannot directly compare our regular fiber with the irregular dinucleosome crystal structure solved by Schalch et al. [7]. The “direct” model built in that study had a low rise of 17 Å and several “steric overlaps” [7]. Therefore, we used EM images presented by Routh et al. [9] and Correll et al. [16] and estimated the rise to be ~ 25 Å, which is close to our predicted rise = 27 Å.

    For the L = 10n + 5 series, we predicted a family of novel structures that are clearly different from the fibers with L = 10n (Figure 3B). Note the “nearly vertical” orientation of nucleosomes in the novel structures—they are rotated by more than 90° compared to the known structures for L = 20-30 bp, so that the red balls indicating “entry points” are invisible in this projection. Since there are no experimentally observed high-resolution fiber structures for L = 10n + 5 (suitable for comparison with our models), we have only indirect evidence in support of their existence (see below).

    3.2. Energy landscape

    A more detailed presentation of our numeric results is given in a two-dimensional plot (ρ, L) with linker lengths varying continuously from ~ 10 to 70 bp, and the inclination angle r spanning the whole 360° interval (Figure 4). As in our earlier study with limited variation of the linker L [24], we observe two vertical rows of low energy regions (blue areas and white contour lines) arranged with 10-11 bp periodicity, which reflects the helical period of DNA duplex. The energy minima on the left side of the plot (−90° < ρ < −20°) correspond to the L = 10n values, whereas the minima on the right side (80° < ρ < 120° ) are observed for L = 10n + 5. In other words, for L = 10n, the “canonical” structures with negative angle ρ (similar to the fibers observed experimentally [7,8]; Figure 3A) are most favorable. They have the lowest energy values; thus, it is not surprising that they were stabilized and resolved by X-ray [7] and Cryo-EM [8] methods. By contrast, for L = 10n + 5, the “novel” configurations with positive angle ρ ≈ 90° are more preferable (see Figure 3B). Their energy is predicted to be less than that in the “canonical”structures, which is consistent with the observation that nucleosomal arrays with L = 10n + 5 are less prone to fold according to sedimentation measurements [16]. Accordingly, no experiment-based models of these structures are available to date.

    Figure 4. Total energy of the two-start fibers as a function of the inclination angle ρ and the linker length L. Optimization is made in the space of three remaining fiber parameters, i.e., the radius, rise, and polar angle φ (Figure 1). Dark blue regions represent stable structures, whereas dark brown areas are those with energies higher than 40 kT. Energies less than zero are highlighted by white contour lines. White circles show the positions of optimal structures separated by approximately a half pitch of B-DNA (5-6 bp). Note that for the relatively short linkers (up to 45 bp) there are two clearly visible energy minima (bimodality patterns described earlier [24]), which become “blurry” for the longer linkers. The two optimal regions of the inclination angle are marked by the white arrows and denoted T2 and T1 (see the bottom of the plot), corresponding to the two topological families of the fibers presented in Figure 3. Bottom: Different orientations of a nucleosome illustrating variation of the inclination angle ρ from −90°to +90°.

    The difference between the two types of fibers described above (“canonical” fibers for the {L = 10n} series and “novel” configurations for {L = 10n + 5}) is not limited to the inclination of nucleosomes. More importantly, the two families of structures are characterized by different DNA linking numbers (see the next section).

    Finally, note that the increase in the linker length is accompanied by a gradual decrease in fiber stability (Figure 4). In particular, all energy values are positive for L > 57 bp, which generally agrees with the fact that two-start fibers are not formed when linkers are 50 bp or longer [9,10,41]. It is quite possible that the interdigitated one-start helix [9] is more favorable in this case (especially in the presence of linker histones).

    3.3. DNA linking number in the chromatin fibers

    In an early study by Simpson et al. [26] it was found that the number of nucleosomes reconstituted on ccDNA is approximately proportional to the number of superhelical turns formed in DNA. These results allowed considering the linking number DLk normalized per nucleosome, as a topological parameter characterizing spatial organization of the chromatin fiber independent of its length. However, a systematic analysis of the ΔLk value as a function of the fiber length was never published (to the best of our knowledge). Below, we present the results of such an analysis for the two series of linker length, L = 10n and 10n + 5.

    Figure 5A demonstrates the asymptotic behavior of ΔLkdepending on the number of nucleosomes in the fiber, N. We see that the linking number decreases significantly when N increases from 3 to 10, but then it converges rather fast, and ΔLk remains practically the same for N > 30 (with the precision ~ 0.05). Note that this convergence is somewhat faster for the {10n + 5} series (L = 25 and 45 bp).

    Figure 5. Linking number per nucleosome, ΔLk, in regular fibers with various linker lengths. (A) Dependence of ΔLkon the number of nucleosomes for fibers with L = 10n = 20, 40, 60 bp, and L = 10n + 5 = 25, 45 bp. For every L, the energetically optimal fiber conformation was selected (Figure 4). (B) Schematic representation of the topological transition T2-T1 in the two-start fiber with L = 20 bp. Here ΔLk is plotted versus changes in the inclination angle. Note that the linking number remains nearly constant except for a transition point where it abruptly changes by ~ 1. The transition point (here, ρ = 60°) depends on the linker length, L. For details see [24].(C) The limiting ΔLk values (calculated for N = 100) are shown for the two series, L = {10n} (blue squares) and {10n+ 5} (red circles). The color code is the same as in (B): the T1 type is shown in red and T2 in blue. Note that among the optimal fibers, the smallest ΔLk = −1.7 is observed for L = 33 bp (not shown).

    The curves presented in Figure 5A are naturally divided into two groups: for L = 10n + 5 the linking number is higher than −1, but for L = 10n, it is less than −1.2. This is consistent with our earlier observation of the topological polymorphism of two-start fibers [24]. We found that for all linker lengths there are two stereochemically feasible topoisomers (denoted T1 and T2) which have different linking numbers. Importantly, the optimal configurations of the fibers belong to different topological families, depending on the linker size: for L = 10n + 5 this is the topoisomer T1 that is most favorable, and for L = 10n this is the topoisomer T2. The two families of structures differ mostly by the inclination angle ρ.

    The topological transition in the case of L = 20 bp is shown schematically in Figure 5B. This transition (associated with the change in inclination angle) is caused by an abrupt 360°change in the linker DNA twisting. The stereochemical details are presented elsewhere [24]; here we wish to emphasize that this T2-to-T1 transition (in the experimental system) requires the presence of nicking-closing enzyme such as topoisomerase I. (Naturally, this limitation is valid only for the DNA with closed ends.)

    The two topological types, T1 and T2, are clearly separated in Figure 5C,where the limiting ΔLk values (calculated for N = 100) are presented for the optimal configurations of the fibers with L = 10n and 10n + 5 bp. As in Figure 5A,the distinction between the T1 and T2 types is reflected in the linking number values, which are higher than −1.0 and lower than −1.2, respectively. An increase in the linker length is accompanied by a gradual decrease in the absolute value of DLk such that for the “long”linkers L = 60-70 bp, DLk is close to −1.2 (compared to −1.5 predicted for the “short” linkers L = 10-20 bp). At the same time, the difference ΔΔLk between the {10n} and {10n + 5} series remains approximately the same for the “long” and “short” linkers - that is, the difference between L = 20 and 25 bp, and between L = 60 and 65 bp, is expected to be ΔΔLk ≈ 0.5.

    Before proceeding to the next section, we summarize the main results presented above:

    (1) The finite size effect in calculating the DNA linking number proved to be significant if N < 10 (where N is number of nucleosomes in the fiber), but it can be practically neglected if N > 20. This is important for the quantitative interpretation of the topological gel assays used to estimate ΔLk in SV40 minichromosome (N ≈ 20) [25,42,43] and in ccDNA containing 5S DNA repeats (N = 18) [26].

    (2) Our classification of two-start chromatin fibers made for the relatively short linkers

    L < 40 bp [24] still holds for the linkers spanning the interval from ~ 10 to 70 bp. The energetically optimal fibers with L = 10n belong to the topological type T2, with ΔLk varying from −1.2 to −1.4, while the fibers with L = 10n + 5 belong to the family T1, with ΔLk varying from −1.0 to −0.8.

    3.4. Relationship between nucleosome repeat length and gene expression level

    Potentially, the topological polymorphism of chromatin fibers described above can be utilized in vivo for regulation of transcription. Indeed, according to the twin domain model of Liu and Wang [44] the positive torsional stress propagates downstream and the negative stress accumulates upstream of the transcription complex (Figure 6A). The DNA torsional stress modulates the level of negative supercoiling of DNA in vivo: the positive stress decreases absolute level of DNA supercoiling, and negative stress increases it. Initially, this model was suggested for bacteria but later it was confirmed for eukaryotes as well [21,45].

    Figure 6. Chromatin fiber topology may be related to the level of gene expression in yeast. A graphical illustration of the mechanism of transcription based on the twin domain model by Liu and Wang [44]. Top: The RNA polymerase (RNAP) is moving along DNA to the right, increasing the local DNA twisting ahead of the transcription complex and leaving undertwisted DNA behind. Bottom: The DNA torsional stress modulates the level of supercoiling, so that in the DNA regions upstream of RNAP, the DNA linking number becomes less (on average) than downstream from RNAP. The two fiber topoisomers, T1 and T2, are positioned in accord with this ΔLk distribution. This model predicts that the T2 type fibers with the linker length L = 10n are stabilized upstream of RNAP, while the T1 fibers with L = 10n+5 are formed downstream from RNAP.

    Our observation of the two types of fibers (T1 and T2) having different linking numbers may be related to the transient DNA topological changes occurring during transcription. We suggest that the T1 topoisomer with ΔLk −1 is formed predominantly downstream from RNA polymerase (Figure 6A), as opposed to the T2 topoisomer with ΔLk −1.5, which is stabilized in the upstream regions (and more generally, in the regions with a low level of transcription). In this regard, it is important that the T1 and T2 topoisomers are characterized by distinct linker lengths (see above). Therefore, we expect that there might be a difference in the distribution of sizes of inter-nucleosome linkers in highly and lowly expressed genes.

    To test this hypothesis, we analyzed the nucleosome positions [46] in ~ 3,500 baker’s yeast genes that are at least 1,000 bp long [47]. We selected 25% of genes that are highly expressed (with transcription rates from 4 to 200 [48]) and the 25% of genes that are lowly expressed (transcription rates 0.1-0.9). For brevity, we refer to these genes as UP- and DOWN-genes, respectively.

    Distribution of nucleosome occurrences for UP- and DOWN-regulated genes reflects the difference in nucleosome repeat length (NRL) between these two sets of genes (Figure 6B). The peaks in the nucleosome distribution for UP-genes lag behind those for DOWN-genes, especially those that are far from the TSS—e.g., peaks at 841 bp (UP) and 884 bp (DOWN). Moreover, the peaks in the distribution of the nucleosome occupancies for DOWN-genes are higher than those for UP-genes. This result is expected, because the higher number of nucleosomes slows transcription.

    However, the following result is less trivial and thus more interesting: NRL ≈ 161 bp for the highly expressed genes and NRL ≈ 167 bp for the lowly expressed genes. In terms of the linker length, this means that L ≈ 14 bp for the UP-genes and L ≈ 20 bp for the DOWN-genes. Note that the genome-wide average for yeast is NRL = 162 bp, or L = 15 bp [12,14,15]; thus, the UP-regulated genes are more “typical” for yeast. This assessment agrees with the RNA-seq data [48] indicating that overall, the level of transcription in yeast is much more active than in higher eukaryotes. By contrast, DOWN-regulated genes can be considered as an exception in the case of yeast.

    In summary, our results are consistent with the above hypothesis that nucleosomal arrays with L ≈ 10n + 5, which are characterized by a less negative superhelical density, are transcriptionally competent. Note also that the fibers with L ≈ 10n + 5 reveal a greater plasticity [16,24], which may facilitate formation of gene loops [49,50], thereby inducing transcription of the corresponding genes. Accordingly, in the silent (repressed) genes the predominant linker length is L ≈ 10n, which corresponds to a higher superhelical density and a higher stability of the chromatin fiber.

    Figure 7. Nucleosome occurrences for two groups of yeast S. cerevisiae genes. Positions of nucleosome dyads relative to the transcription start site (TSS) are calculated for nucleosome fragments 147-152 bp in length [15,46]. Note the nucleosome-depleted region upstream from TSS [64]. The red curve represents the 25% of genes that are highly expressed (UP-genes) and the blue curve shows the 25% of genes that are lowly expressed (DOWN-genes). The nucleosome occurrences are normalized by the number of genes in each set (i.e., 860 genes). The running averages of 51 bp are shown. Averaging the distances between the peaks, we obtained NRL = 161.2 (±3.5) for UP-genes and NRL = 167.2 (±2.9) for DOWN-genes.

    Naturally, this correlation between gene expression level and NRL raises many questions about the cellular mechanisms responsible for transient stabilization of one of the two fiber topoisomers. In particular, it would be interesting to see if there are any DNA sequence patterns distinguishing the two groups of genes (and the two types of nucleosome packaging). If the tendency observed for baker’s yeast also holds for other species, these intriguing questions should be the subject of a separate investigation.

    4. Conclusion

    We have expanded our earlier analysis of two-start chromatin fibers [24] to those with long linkers, up to L = 70 bp (NRL = 217 bp). The new results corroborate the main conclusion made earlier that fiber topology depends on the linker length L. For the series {L = 10n}, the energetically optimal topoisomers belong to the family T2, with the DNA linking number about −1.5, whereas for {L = 10n + 5}, the T1 topoisomers are more favorable, with ΔLk ≈ −1.0 (Figure 3).

    The calculated T2 forms closely match experimentally observed structures for L = 20, 30 and 40 bp. The energy map (Figure 4) suggests that the two-start fibers for “long” linkers L = 50, 60 and 70 bp are relatively unstable (compared to L = 20-40 bp). This is consistent with Routh et al. [9] data indicating that for L = 50 bp, the chromatin fibers are not formed without linker histone. (When the histone H5 is added, an alternative structure is formed, probably an interdigitized one-start superhelix, computational analysis of which goes beyond the limits of this study.)

    By contrast, the predicted T1 topoisomers are drastically different from the known fiber structures [7,8]. Locally, the T1 and T2 forms are characterized by different inclination of the nucleosomal disks (Figure 3). In terms of “global” organization of DNA, these forms comprise two distinct topological families with different DNA linking numbers (Figure 5). Conformational transition between the T1 and T2 topoisomers is possible only in the presence of nicking-closing enzymes (provided that the DNA ends are restrained). Note that our computations agree with the finding of Correll et al. [16] that the fibers with L = 10n + 5 = 25 bp are less stable than the fibers with L = 10n = 20 and 30 bp. This may be one of the reasons why the “new” topoisomer T1 (with L = 10n + 5) was not observed earlier.

    In addition, we tested our prediction [24] concerning the topological difference between the two types of fibers with L = 20 and 25 bp. Our approach is similar to that described by Simpson et al. [26]; the main difference is that in our case these are tandem arrays of the “601” sequences [11] instead of the 5S DNA repeats used earlier [26]. Our preliminary results support the predicted difference of ΔΔLk ≈0.5 between the two topoisomers (T. Nikitina, D.N., S.A. Grigoryev and V.B.Z., unpublished observation). We consider this as a proof of principle and anticipate that using topological gel assays will allow us to make a detailed quantitative comparison between the two types of chromatin fiber organization.

    For the first time, we have systematically analyzed how the finite size of chromatin fibers affects the calculated DNA linking number. Our results are important for numerical interpretation of the topological gel assays made for strongly positioned nucleosomes such as “601” [11] or 5S DNA [26]. In particular, if the array contains N ≈ 5 positioned nucleosomes, the DNA linking number differs from its “ideal” limiting value by 0.1 or even by 0.2, depending on the linker length (Figure 5A). Therefore, appropriate adjustments are necessary to account for the size of the nucleosomal array.

    The same reasoning is applicable to the nucleosome positioning in vivo. In yeast, a typical nucleosomal organization is represented by relatively short clusters of N £ 5 nucleosomes, separated by gaps [15,51,52]. Therefore, instead of using the “ideal” ΔLk values which are correct only for “long” arrays, one has to consider the finite size effect described in Figure 5A.

    Earlier, we hypothesized [24] that the different topology and flexibility of the two types of fibers (T2 with L = 10n and T1 with L = 10n + 5 bp) may be utilized by cells in functionally distinct parts of the genome. In particular, we suggested that highly and lowly expressed genes may have different L values (on average). Here, we present data for yeast supporting this hypothesis (Figure 6B). It would be interesting to see whether this topological mechanism of transcription regulation is applicable to higher eukaryotes as well. For example, the data on genome-wide nucleosome positioning provided by Teif et al. [53] for three murine cell lines suggests that the average NRL (and the linker length L) are changed upon cell differentiation. Note, however, that the higher eukaryotic chromatin contains significant amounts of linker histone H1, and the NRL strongly depends on this amount [54,55]. Therefore, analysis of a possible interplay between DNA topology and the level of transcription has to take into account the role played by histone H1 in electrostatic neutralization of the linker DNA and stabilization of chromatin fiber [56].

    We consider the topological mechanism accounting for distinct organization of transcriptionally active chromatin not instead of, but rather in addition to the alternative mechanisms proposed earlier, such as partial unwrapping of nucleosomal DNA due to the loss of histones H2A-H2B [57] or formation of “reverse nucleosomes” [58,59] in the course of transcription (see recent review by Teves and Henikoff [60]).

    Our results may reflect a more general tendency of chromosomal domains containing active or repressed genes to retain topologically distinct higher-order structures. Indeed, transcriptionally silent domains were shown to acquire a stable negative supercoiling [61], which agrees with our findings: (i) the computational prediction [24] that {10n} chromatin is more negatively supercoiled than {10n + 5} chromatin; and (ii) the predominant occurrence of linker L ≈ 10n in genes with a low level of expression (Figure 6B).

    In this regard, it is remarkable that histone acetylation reduces the absolute value of DNA linking number from 1.0 to 0.8 [27]. The histone acetylation is one of the epigenetic markers indicating the increased level of transcription. Therefore, the observation made by Norton et al. [27] fits nicely into our hypothesis that the reduced level of DNA supercoiling correlates with the high level of transcription.

    Acknowledgements

    We are grateful to Sergei Grigoryev and Fedor Kouzine for valuable discussions and to George Leiman for text editing. The research was supported by the Intramural Research Program of the National Institutes of Health (Center for Cancer Research, NCI) (V.B.Z.), the Start-Up Fund of the Rochester Institute of Technology and the National Institute of General Medical Sciences under Award Number R15GM116102 (F.C.).

    Conflict of Interest

    The authors declare no conflict of interest.


    Acknowledgments



    This research has been funded by Scientific Research Deanship at University of Ha'il, Saudi Arabia through project number COVID-1936.

    Conflict of interest



    All authors declare no conflicts of interest in this paper.

    [1]  COVID-19 Worldmeters info website (2020) .Available from: https://www.worldometers.info/coronavirus/.
    [2] Lalwani S, Sahni G, Mewara B, et al. (2020) Predicting optimal lockdown period with parametric approach using three-phase maturation SIRD model for COVID-19 pandemic. Chaos, Solitons Fractals 138: 109939. doi: 10.1016/j.chaos.2020.109939
    [3] Alboaneen D, Pranggono B, Alshammari D, et al. (2020) Predicting the Epidemiological Outbreak of the Coronavirus Disease 2019 (COVID-19) in Saudi Arabia. Int J Environ Res Public Health 17: 4568. doi: 10.3390/ijerph17124568
    [4] Almeshal A, Almazrouee A, Alenezi M, et al. (2020) Forecasting the Spread of COVID-19 in Kuwait Using Compartmental and Logistic Regression Models. Appl Sci 10: 3402. doi: 10.3390/app10103402
    [5] Arora P, Kumar H, Panigrahi P (2020) Prediction and analysis of COVID-19 positive cases using deep learning models: A descriptive case study of India. Chaos, Solitons and Fractals 139: 110017. doi: 10.1016/j.chaos.2020.110017
    [6] Rodriguez O, Conde-Gutierrez R, Hernadez-Gavier A (2020) Modeling and prediction of COVID-19 in Mexico applying mathematical and computational models. Chaos, Solitons Fractals 138: 109946. doi: 10.1016/j.chaos.2020.109946
    [7] Hu Z, Cui Q, Han J, et al. (2020) Evaluation and prediction of the COVID-19 variations at different input population and quarantine strategies, a case study in Guangdong province, China. Int J Infect Dis 95: 231-240. doi: 10.1016/j.ijid.2020.04.010
    [8] Roda W, Varughese M, Han D, et al. (2020) Why is it difficult to accurately predict the COVID-19 epidemic? Infect Dis Modell 5: 271-281. doi: 10.1016/j.idm.2020.03.001
    [9] Postnikov E (2020) Estimation of COVID-19 dynamics “on a back-of-envelope”: Does the simplest SIR model provide quantitative parameters and predictions? Chaos, Solitons Fractals 135: 109841. doi: 10.1016/j.chaos.2020.109841
    [10] Ndaïrou F, Area I, Nieto J, et al. (2020) Mathematical modeling of COVID-19 transmission dynamics with a case study of Wuhan. Chaos, Solitons Fractals 135: 109846. doi: 10.1016/j.chaos.2020.109846
    [11] Alzahrani S, Aljamaan I, Al-fakih E (2020) Forecasting the spread of the COVID-19 pandemic in Saudi Arabia using ARIMA prediction model under current public health interventions. J Infect Public Health 13: 914-919. doi: 10.1016/j.jiph.2020.06.001
    [12] Aslam M (2020) Using the Kalman filter with Arima for the COVID-19 pandemic dataset of Pakistan. Data Brief 31: 105854. doi: 10.1016/j.dib.2020.105854
    [13]  COVID-19 Worldmeters info website (Saudi Arabia) (2020) .Available from: https://www.worldometers.info/coronavirus/country/saudi-arabia/.
    [14]  COVID-19 Worldmeters info website (Kuwait) (2020) .Available from: https://www.worldometers.info/coronavirus/country/kuwait/.
    [15] Chen D, Chen X, Chen J (2020) Reconstructing and forecasting the COVID-19 epidemic in the United States using a 5-parameter logistic growth model. Global Health Res Policy 5: 25. doi: 10.1186/s41256-020-00152-5
    [16] Malavika B, Marimuth, Joy M, et al. (2020) Forecasting COVID-19 epidemic in India and high incidence states using SIR and logistic growth models. Clin Epidemiol Global Health .
    [17] Munayco C, Tariq A, Rothenberg R, et al. (2020) Early transmission dynamics of COVID-19 in a southern hemisphere setting: Lima-Peru: February 29 the March 30th, 2020. Infect Dise Modell 5: 338-345. doi: 10.1016/j.idm.2020.05.001
    [18] Wang P, Zheng X, Jiayang L, et al. (2020) Prediction of epidemic trends in COVID-19 with logistic model and machine learning technics. Chaos, Solitons Fractals 139: 110058. doi: 10.1016/j.chaos.2020.110058
    [19] Behnood A, Gholafshan A, Hosseini S (2020) Determinants of the infection rate of the COVID-19 in the U.S. using ANFIS and virus optimization algorithm (VOA). Chaos, Solitons Fractals 139: 110051. doi: 10.1016/j.chaos.2020.110051
    [20] Singhal A, Singh P, Lall B, et al. (2020) Modeling and prediction of COVID-19 pandemic using Gaussian mixture model. Chaos, Solitons Fractals 138: 110023. doi: 10.1016/j.chaos.2020.110023
    [21] Wu K, Darcet D, Wang Q, et al. Generalized logistic growth modeling of the COVID-19 outbreak in 29 provinces in China and in the rest of the world (2020) .Available from: https://arxiv.org/abs/2003.05681.
    [22] Shen C (2020) Logistic growth modelling of COVID-19 proliferation in China and its international implications. Int J Infect Dis 96: 582-589. doi: 10.1016/j.ijid.2020.04.085
    [23] Materassi M (2019) Some fractal thoughts about the COVID-19 infection outbreak. Chaos, Solitons Fractals X4: 100032. doi: 10.1016/j.csfx.2020.100032
    [24] Al-qaness M, Ewees A, Fan H, et al. (2020) Optimization Method for Forecasting Confirmed Cases of COVID-19 in China. J Clin Med 9: 674. doi: 10.3390/jcm9030674
    [25] He S, Peng Y, Sun K (2020) SEIR modeling of the COVID-19 and its dynamics. Nonlinear Dyn 101: 1667-1680. doi: 10.1007/s11071-020-05743-y
    [26] Paggi M (2020) An Analysis of the Italian Lockdown in Retrospective Using Particle Swarm Optimization in Machine Learning Applied to an Epidemiological Model. Physics 2: 368-382. doi: 10.3390/physics2030020
    [27] Boubaker S (2017) Identification of nonlinear Hammerstein system using mixed integer-real coded particle swarm optimization: application to the electric daily peak-load forecasting. Nonlinear Dyn 90: 797-814. doi: 10.1007/s11071-017-3693-9
    [28] Kennedy J, Eberhart RParticle swarm optimization. (1995) .4: 1942-1948.
    [29] Alberti T, Faranda D (2020) On the uncertainty of real-time predictions of epidemic growths: A COVID-19 case study for China and Italy. Commun Nonlinear Sci Numer Simulat 90: 105372. doi: 10.1016/j.cnsns.2020.105372
    [30] Consolini G, Materassi M (2020) A stretched logistic equation for pandemic spreading. Chaos, Solitons Fractals 140: 110113. doi: 10.1016/j.chaos.2020.110113
    [31] Roosa K, Lee Y, Luo R, et al. (2020) Real-time forecasts of the COVID-19 epidemic in China from February 5th to February 24th, 2020. Infect Dis Modell 5: 256-263. doi: 10.1016/j.idm.2020.02.002
    [32] Huang J, Zhang L, Liu X, et al. (2020) Global prediction system for COVID-19 pandemic. Science Bulletin https://doi.org/10.1016/j.scib.2020.08.002.
    [33] Pelinovsky E, Kurkin A, Kurkina O, et al. (2020) Logistic equation and COVID-19. Chaos, Solitons Fractals 140: 110241. doi: 10.1016/j.chaos.2020.110241
    [34] Gillman M, Crokidakis N Dynamics and future of SARS-CoV-2 in the human host (2020) .Available from: https://www.medrxiv.org/content/10.1101/2020.07.14.20153270v2.
    [35] Faranda D, Alberti T Modelling the second wave of COVID-19 infections in France and Italy via a Stochastic SEIR model (2020) .Available from: https://arxiv.org/pdf/2006.05081v1.pdf.
    [36] Mwalili S, Kimathi M, Ojiambo V, et al. (2020) SEIR model for COVID-19 dynamics incorporating the environment and social Distancing. BMC Res Notes 13: 352. doi: 10.1186/s13104-020-05192-1
    [37] Castorina P, Iorio A (2020) Data analysis on Coronavirus spreading by macroscopic growth laws. Int J Mod Phys C 7: 2050103. doi: 10.1142/S012918312050103X
    [38] Vasconcelos G, Macêdo A, Ospina R, et al. (2020) Modelling fatality curves of COVID-19 and the effectiveness of intervention strategies. Peer J 8: e9421. doi: 10.7717/peerj.9421
    [39] Maier B, Brockmann D (2020) Effective containment explains subexponential growth in recent confirmed COVID-19 cases in China. Science 368: 742-746. doi: 10.1126/science.abb4557
    [40] Crokidakis N (2020) COVID-19 spreading in Rio de Janeiro, Brazil: Do the policies of social isolation really work? Chaos, Solitons Fractals 136: 109930. doi: 10.1016/j.chaos.2020.109930
  • This article has been cited by:

    1. Lele Xiao, Fan Li, Chao Niu, Gelian Dai, Qian Qiao, Chengsen Lin, Evaluation of Water Inrush Hazard in Coal Seam Roof Based on the AHP-CRITIC Composite Weighted Method, 2022, 16, 1996-1073, 114, 10.3390/en16010114
    2. Qiushuang Zheng, Changfeng Wang, Weitao Liu, Lifu Pang, Evaluation on Development Height of Water-Conduted Fractures on Overburden Roof Based on Nonlinear Algorithm, 2022, 14, 2073-4441, 3853, 10.3390/w14233853
    3. Yaoshan Bi, Jiwen Wu, Xiaorong Zhai, Quantitative prediction model of water inrush quantities from coal mine roofs based on multi-factor analysis, 2022, 81, 1866-6280, 10.1007/s12665-022-10432-7
    4. Zhu Gao, Guosheng Xu, Huigui Li, Deguo Su, Yuben Liu, Bed Separation Formation Mechanism and Water Inrush Evaluation in Coal Seam Mining under a Karst Cave Landform, 2023, 11, 2227-9717, 3413, 10.3390/pr11123413
    5. Mengcheng Jiang, Zhidong Qiu, Yuanyuan Diao, Yuwen Shi, Weipeng Liu, Na Li, Ailing Jia, Optimization of the extraction process for Shenshou Taiyi powder based on Box-Behnken experimental design, standard relation, and FAHP-CRITIC methods, 2024, 24, 2662-7671, 10.1186/s12906-024-04554-7
    6. Weitao Liu, Yuying Ren, Xiangxi Meng, Bo Tian, Xianghai Lv, Analysis of Potential Water Inflow Rates at an Underground Coal Mine Using a WOA-CNN-SVM Approach, 2024, 16, 2073-4441, 813, 10.3390/w16060813
    7. Jie Xu, Qiqing Wang, Yuguang Zhang, Wenping Li, Xiaoqin Li, Evaluation of Coal-Seam Roof-Water Richness Based on Improved Weight Method: A Case Study in the Dananhu No.7 Coal Mine, China, 2024, 16, 2073-4441, 1847, 10.3390/w16131847
    8. Gangzhu Sun, Xiaoyue Zhang, Yadan Yan, Yao Lu, Xiaoqin Zhang, Evaluation Method for Green Construction Demonstration Projects in China Based on G-TOPSIS, 2023, 15, 2071-1050, 15828, 10.3390/su152215828
    9. Chao Niu, Xin Xu, Gelian Dai, Kai Liu, Lele Xiao, Shoutao Luo, Wanxue Qian, Evaluation of Water-Richness and Risk Level of the Sandstone Aquifer in the Roof of the No. 3 Coal Seam in Hancheng Mining Area, 2025, 17, 2073-4441, 1164, 10.3390/w17081164
  • Reader Comments
  • © 2020 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(5720) PDF downloads(141) Cited by(18)

Article outline

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog