DNase I hypersensitive sites (DHSs) are a specific genomic region, which is critical to detect or understand cis-regulatory elements. Although there are many methods developed to detect DHSs, there is a big gap in practice. We presented a deep learning-based language model for predicting DHSs, named LangMoDHS. The LangMoDHS mainly comprised the convolutional neural network (CNN), the bi-directional long short-term memory (Bi-LSTM) and the feed-forward attention. The CNN and the Bi-LSTM were stacked in a parallel manner, which was helpful to accumulate multiple-view representations from primary DNA sequences. We conducted 5-fold cross-validations and independent tests over 14 tissues and 4 developmental stages. The empirical experiments showed that the LangMoDHS is competitive with or slightly better than the iDHS-Deep, which is the latest method for predicting DHSs. The empirical experiments also implied substantial contribution of the CNN, Bi-LSTM, and attention to DHSs prediction. We implemented the LangMoDHS as a user-friendly web server which is accessible at http:/www.biolscience.cn/LangMoDHS/. We used indices related to information entropy to explore the sequence motif of DHSs. The analysis provided a certain insight into the DHSs.
Citation: Xingyu Tang, Peijie Zheng, Yuewu Liu, Yuhua Yao, Guohua Huang. LangMoDHS: A deep learning language model for predicting DNase I hypersensitive sites in mouse genome[J]. Mathematical Biosciences and Engineering, 2023, 20(1): 1037-1057. doi: 10.3934/mbe.2023048
[1] | Hasan Zulfiqar, Rida Sarwar Khan, Farwa Hassan, Kyle Hippe, Cassandra Hunt, Hui Ding, Xiao-Ming Song, Renzhi Cao . Computational identification of N4-methylcytosine sites in the mouse genome with machine-learning method. Mathematical Biosciences and Engineering, 2021, 18(4): 3348-3363. doi: 10.3934/mbe.2021167 |
[2] | Jianhua Jia, Mingwei Sun, Genqiang Wu, Wangren Qiu . DeepDN_iGlu: prediction of lysine glutarylation sites based on attention residual learning method and DenseNet. Mathematical Biosciences and Engineering, 2023, 20(2): 2815-2830. doi: 10.3934/mbe.2023132 |
[3] | Jianhua Jia, Yu Deng, Mengyue Yi, Yuhui Zhu . 4mCPred-GSIMP: Predicting DNA N4-methylcytosine sites in the mouse genome with multi-Scale adaptive features extraction and fusion. Mathematical Biosciences and Engineering, 2024, 21(1): 253-271. doi: 10.3934/mbe.2024012 |
[4] | Jianhua Jia, Lulu Qin, Rufeng Lei . DGA-5mC: A 5-methylcytosine site prediction model based on an improved DenseNet and bidirectional GRU method. Mathematical Biosciences and Engineering, 2023, 20(6): 9759-9780. doi: 10.3934/mbe.2023428 |
[5] | Pingping Sun, Yongbing Chen, Bo Liu, Yanxin Gao, Ye Han, Fei He, Jinchao Ji . DeepMRMP: A new predictor for multiple types of RNA modification sites using deep learning. Mathematical Biosciences and Engineering, 2019, 16(6): 6231-6241. doi: 10.3934/mbe.2019310 |
[6] | Eric Ke Wang, liu Xi, Ruipei Sun, Fan Wang, Leyun Pan, Caixia Cheng, Antonia Dimitrakopoulou-Srauss, Nie Zhe, Yueping Li . A new deep learning model for assisted diagnosis on electrocardiogram. Mathematical Biosciences and Engineering, 2019, 16(4): 2481-2491. doi: 10.3934/mbe.2019124 |
[7] | Keruo Jiang, Zhen Huang, Xinyan Zhou, Chudong Tong, Minjie Zhu, Heshan Wang . Deep belief improved bidirectional LSTM for multivariate time series forecasting. Mathematical Biosciences and Engineering, 2023, 20(9): 16596-16627. doi: 10.3934/mbe.2023739 |
[8] | Hui Li, Xintang Liu, Dongbao Jia, Yanyan Chen, Pengfei Hou, Haining Li . Research on chest radiography recognition model based on deep learning. Mathematical Biosciences and Engineering, 2022, 19(11): 11768-11781. doi: 10.3934/mbe.2022548 |
[9] | Honglei Wang, Wenliang Zeng, Xiaoling Huang, Zhaoyang Liu, Yanjing Sun, Lin Zhang . MTTLm6A: A multi-task transfer learning approach for base-resolution mRNA m6A site prediction based on an improved transformer. Mathematical Biosciences and Engineering, 2024, 21(1): 272-299. doi: 10.3934/mbe.2024013 |
[10] | Guanghua Fu, Qingjuan Wei, Yongsheng Yang . Bearing fault diagnosis with parallel CNN and LSTM. Mathematical Biosciences and Engineering, 2024, 21(2): 2385-2406. doi: 10.3934/mbe.2024105 |
DNase I hypersensitive sites (DHSs) are a specific genomic region, which is critical to detect or understand cis-regulatory elements. Although there are many methods developed to detect DHSs, there is a big gap in practice. We presented a deep learning-based language model for predicting DHSs, named LangMoDHS. The LangMoDHS mainly comprised the convolutional neural network (CNN), the bi-directional long short-term memory (Bi-LSTM) and the feed-forward attention. The CNN and the Bi-LSTM were stacked in a parallel manner, which was helpful to accumulate multiple-view representations from primary DNA sequences. We conducted 5-fold cross-validations and independent tests over 14 tissues and 4 developmental stages. The empirical experiments showed that the LangMoDHS is competitive with or slightly better than the iDHS-Deep, which is the latest method for predicting DHSs. The empirical experiments also implied substantial contribution of the CNN, Bi-LSTM, and attention to DHSs prediction. We implemented the LangMoDHS as a user-friendly web server which is accessible at http:/www.biolscience.cn/LangMoDHS/. We used indices related to information entropy to explore the sequence motif of DHSs. The analysis provided a certain insight into the DHSs.
Since the beginning of the HIV epidemic, it is estimated that 75 million people have been infected with HIV and 36 million have died [40]. HIV spans the globe, affecting every country, although some have fared worse than others. This is especially apparent in sub-Saharan Africa, with an estimated 5% of all adults infected [40]. HIV is not restricted to certain age groups, and, despite prevention programs and awareness campaigns, HIV incidence continues to increase, particularly amongst men who have sex with other men (MSM).
Fortunately, there have been numerous medical advancements since AIDS and HIV were defined in 1982-85. In 1986, the first antiretroviral therapy (ART) was introduced to prolong the life of individuals affected with HIV and reduce its spread. This changed the perspective of HIV to that of a chronic disease. Questions are now raised as to why the incidence rate continues to increase amongst certain population groups. Numerous surveys attempt to explain an individuals' reasoning, but the results have been varied, showing evidence of increased risky behaviour as well as decreased or no change in behaviour. [12,22,28,32]. The imminence of prophylactic HIV vaccines (such as one being developed at The University of Western Ontario [39]) would have an enormous impact worldwide, but also raises some new interesting questions, chief among them being how to asses its impact on individuals contemplating whether to engage in casual sex, specifically in unprotected casual sex.
Population models have been useful in understanding and predicting the spread of HIV [20,36,18,31,19]. To better understand how one individual can affects change at population level game-theoretical models have been used [30,33] These models illuminate a feedback mechanism where individual's choices may affect the population, which in turn affect the choices an individual is likely to make.
Game theory dates back to late 1940s with the works of von Neumann and Morgenstern [37] and later those of Nash [26] in the early 1950s. A game is a mathematical framework to describe decision-making by individuals engaged in competitive situations, where they can behave noncooperatively or cooperatively. Noncooperative game theory is nowadays widely used in applied areas such as economics, engineering, operations research, evolutionary biology and social sciences (psychology and cognitive sciences) see [4,15,16] and many references therein. The question of existence and computation of Nash strategies for a given game can be tackled with various methods, such as the reaction-curves method, optimization techniques, variational inequalities, computational methods (such as genetic algorithms, evolutionary computation), or a replicator-dynamics equilibrium, etc. [4,10,21,0,11].
Generalized Nash games (GN) with finite dimensional strategy sets were first studied by Arrow and Debreu in [1], followed by [29,23,34,27], with a recent review in [14]. The formulation of the generalized Nash game as a variational inequality problems dates back to Bensoussan [5], while [34] gives first equivalence results for finite-dimensional GN games and quasivariational inequalities.
In this paper, we model casual (sexual) encounters as a noncooperative generalized Nash game between 2, 3, or 4 players, where each player's HIV status is known to both one's self, and to the player they choose to interact with. We do not model in this paper a population-level transmission process. All players have personal preferences ranked in utility of unprotected and protected sex outcomes, and they are given expected utilities of the casual encounter, depending on possible outcomes: unprotected sex outcome (
In our previous work on the topic, we modelled a similar setup of casual encounters with an agent-based model of the population [33,35], and we analyzed how groups can emerge from coevolution of HIV spread with partner choices and risk perception, where we also assumed that a player's true HIV status is only known to themselves. In this model, unlike our previous simulation-based work, we construct a theoretical model to identify and analyze Nash equilibria with respect to the decisions players make. This allows us to better understand: a) the impact of personal preferences for unprotected sex; and b) the impact of heterogeneity of players (division in age groups) and initial HIV age composition both in presence and absence of a prophylactic vaccine.
To the best of our knowledge, modelling HIV transmission with age-stratified multiplayer GN games models is absolutely novel in the literature. There are a handful of examples of multiplayer GN games in applied problems (the River Basin problem, electricity markets [25], cap-and-trade agreements [14], voluntary vaccination models [11]), so our work here is unique in pushing the boundaries of modelling using GN games.
The structure of the paper is as follows: In Section 2, we present a 2-player game while in Section 3 we formulate the multiplayer games of casual encounters between players in age groups. Throughout, we investigate the sensitivity of players' Nash strategies and of HIV transmission when changes in utility rankings, efficacy of prophylactic treatment and group-specific initial HIV age composition are taken into account. We close with a few conclusions and future work.
In general, a multiplayer game involves a finite number of players, denoted here by
Definition 2.1. Assume each player is rational and wants to maximize their payoff. Then a Nash equilibrium is a vector
∀i,fi(x∗i,x∗−i)≥fi(xi,x∗−i),∀xi∈Si |
where
For several decades, there exists, in the game theoretic literature, the concept of a generalized Nash (GN) game [1,29], which in brief is a game such as above, where however each player's strategy set
In this paper, we need to use the framework of GN theory, as our player interactions lead to casual sexual encounters in a closed population, thus the encounters have to obey a counting rule from both an
There are results asserting existence of generalized Nash equilibria for the type of games we model (see for instance [13,34,29] and references therein). One such approach is that of finding a small subset of solutions of a GN game by reformulating the GN game into a variational problem as below in Definition 2.2. Once the Ⅵ is proven to have solutions, computational methods are employed to find its solution set.
Definition 2.2. Given a set K
⟨F(x∗),y−x∗⟩≥0,∀y∈K. |
The existence of a solution to the Ⅵ problem in Definition 2.2 can be shown in many mathematical contexts (see [24]) but, specifically for our cases here, we use the results in [13]. Specifically, we solve for generalized Nash strategies of players where
To solve the Ⅵ problem we compute its solution set as the set of critical points of a projected dynamical system (see [2,8]) given by
dxdτ=PTK(x(τ))(F(x(τ))),x(0)∈K, | (1) |
where
It is known that the system (1) is well-defined if
Let us consider now a casual encounter between two individuals from a general population of individuals aged 15 and over. A player can have one of two statuses: HIV negative (
Utility ranking of preferences of each player is given based on the HIV status of the individual they might engage with, which is considered known to them, as well as based on the type of sexual outcome they may find themselves in, namely
Specifically, we assign as a numerical value for the utility of a sexual encounter the range of
Let us define the expected utilities for
E1−=ρ[x1−USE(+,−)+(1−x1−)PSE(+,−)] and |
E1+=ρ[x1+USE(+,+)+(1−x1+)PSE(+,+)] |
where
Then the overall expected utility of the encounter for
E1(x1−,x1+)=ϵ+E1−+(1−ϵ+)E1+. |
Similarly, for
E2−=ρ[x2−USE(−,−)+(1−x2−)PSE(−,−)], |
E2+=ρ[x2+USE(−,+)+(1−x2+)PSE(−,+)]. |
Thus E2(x2−,x2+)=ϵ+E2−+(1−ϵ+)E2+. |
Now let us recall that
ϵ+=ϵ+(0)+[x1−ϵ+(0)+x2+ϵ−(0)]τ | (2) |
where
Last but not least, we need to make sure that the number of possible sexual encounters that lead to transmission is the same whether counted from the
ϵ−∑i∈HIV−xi+=ϵ+∑i∈HIV+xi−⇔(1−ϵ+)x2+=ϵ+x1−, |
with
Players
K:={S1×S2∣(1−ϵ+)x2+=ϵ+x1−} |
where
Si={xi=(xi−,xi+)∈[0,1]2,0≤xi−+xi+≤1,i=1,2}. |
Due to expression (2) we see that these utilities have actual dependencies on the other player's choices, so our model is a 2-player game with nonlinear payoffs.
At this stage, a discussion of parameter values we define as our ''base case scenario'' throughout the rest of the manuscript is needed. We outlined above our starting assumptions on the utility ranking values for
The base values used in our simulations, unless otherwise noted, are listed in Table 1.
Utility for USE | Utility for PSE | Range | ||
HIV+ | HIV+ | | | |
HIV+ | HIV- | | | |
HIV- | HIV+ | | ||
HIV- | HIV- | |
In general, it is expected that a Nash game will have multiple equilibria. We took a numerical approach to investigate the type of Nash equilibria we get in the game above. We first set the rest of our game parameters as described in Table 2 below. We then vary the initial conditions of the game reformulated as in equation (1), using experiments with uniformly distributed points from
Term | Definition | Baseline value | Range |
Probability of HIV spread from an | 0.02 | - | |
Initial proportion of | 0.05 | 5% of population | |
Initial proportion of | 0.95 | 95% of population |
In the next section, we investigate the sensitivity of these results with respect to changes in utilities for casual sex with players of opposite status.
In this section we are interested in investigating the effects of varying base utilities on the equilibrium strategies of both players to have unprotected sex in a casual encounter, namely on
We see a change in the equilibrium solution for both
We extend next the 2-player game presented in Section 2.2 to a multiplayer game, to capture interactions between players belonging to different age groups, as a possibly important factor in HIV transmission [7]. We consider here a population with 5 age cohorts, 15-20, 20-30, 30-40, 40-50 and 50+. Age group 1 (
A game is defined by choosing one
Let us denote by
P1 ingameibelongs toGi. |
We assume first that
Finally, each age group has a differing activity parameter
Ei:=ϵ+Ei+(x)+(1−ϵ+)Ei−(x),∀i∈{1,2,3}. |
We start by setting up the game concerning age cohort 15-20, where we choose
We define their expected utilities as:
E1−=ρ1[(x11−+x12−)USE(+,−)+(1−(x11−+x12−))PSE(+,−)] |
E1+=ρ1[(x11+)USE(+,+)+(1−x11+)PSE(+,+)] |
Then
Similarly for
Ej−=ρj−1[(xj1−+xj2−)USE(−,−)+(1−(xj1−+xj2−))PSE(−,−)] |
Ej+=ρj−1[(xj1+)USE(−,+)+(1−xj1+)PSE(−,+)] |
Each player strategies have to satisfy (
As a consequence of the interaction between players, the fractions of
ϵG1+(game1)=ϵG1+(0)+τ[x11−ϵG1+(0)+x21+ϵG1−(0)], and |
ϵG2+(game1)=ϵG2+(0)+[x12−ϵG1+(0)+x31+ϵG2−(0)]τ |
Then we compute
ϵ+(game1):=5∑i=1ϵGi+(game1). | (3) |
Last but not least, we need to impose the shared constraint that the number of possible sexual encounters that lead to transmission is the same when counted from each of the
(1−ϵ+(game1))(x21++x31+)=ϵ+(game1)(x11−+x12−), |
with
We investigate uniqueness of solutions for the generalized Nash equilibrium strategies computed as in Section 2.2. Figure 3 shows a non-unique Nash point while varying initial conditions of system (1). We ran 100 simulations each starting with 50 uniformly distributed initial values and the compiled results are always as in Figure 3. As expected, the equilibrium strategies are not unique, however, the equilibrium strategies for players engaging in USE with players of opposite status (namely
Similar to studying the 2-player game, we study the effects of varying
We know from our previous section that equilibrium strategies found for this game are not unique, thus initial conditions for the computation of equilibrium strategies under varying parameters are important. In order to derive our analysis, we use as initial conditions one of the equilibrium points computed in the subsection above:
x_∗=((0,1,0),(0.6369,0,0.3631),(0.62,0,0.38)), | (4) |
relying on the fact that all equilibrium strategies for players engaging in USE with players of opposite status in the baseline scenario are 0. The new equilibrium strategies pictured above are:
x_∗=((0.5,0,0.5),(0.62,0.02,0.36),(0.62,0,0.38)). |
Figure 4 shows more refined results than those in the 2-player scenario. Choices for
We describe next the 4-player game arising from choosing for instance
We denote
P1:0_≤(x11−,x12−,x13−,x12+)≤1_ s.tx11−+x12−+x13−+x12+=1 |
P2:0_≤(x21−,x22−,x22+)≤1_ s.t. x21−+x22−+x22+=1 |
P3:0_≤(x31−,x32−,x33−,x32+)≤1_ s.t. x31−+x32−+x33−+x32+=1 |
P4:0_≤(x42−,x43−,x42+)≤1_ s.t. x42−+x43−+x42+=1 |
The expected utilities for these individuals are listed below, starting with
E1−=ρ2[(x11−+x12−+x13−)USE(+,−)+(1−(x11−+x12−+x13−))PSE(+,−)] |
E1+=ρ2[x12+USE(+,+)+(1−x12+)PSE(+,+)] |
Similarly for
E2−=ρ1[(x21−+x22−)USE(−,−)+(1−(x21−+x22−))PSE(−,−)] |
E2+=ρ1[x22+USE(−,+)+(1−x22+)PSE(−,+)] |
For
E3−=ρ2[(x31−+x32−+x33−)USE(−,−)+(1−(x31−+x32−+x33−))PSE(−,−)] |
E3+=ρ2[x32+USE(−,+)+(1−x32+)PSE(−,+)] |
Finally, for
E4−=ρ3[(x42−+x43−)USE(−,−)+(1−(x42−+x43−))PSE(−,−)] |
E4+=ρ3[x42+USE(−,+)+(1−x42+)PSE(−,+)] |
As a result of interactions allowed in this game, the fraction of the infected individuals changes in groups 1, 2, 3 (note that
ϵGj+(game2)=ϵGj+(0)+τ[xj1−ϵG2+(0)+xj+12+ϵGj−(0)],j∈{1,2,3} |
hen we compute
ϵ+(game2):=5∑i=1ϵGi+. | (5) |
Last but not least, we need to impose the shared constraint that the number of possible sexual encounters that lead to transmission is the same when counted from each of the
(1−ϵ+(game2))(x22++x32++x42+)=ϵ+(game2)(x11−+x12−+x13−), |
with
We again investigate uniqueness of solutions as we did in Section 2.2. Figure 5 shows non-unique Nash points while varying initial conditions of (1). We ran 100 simulations each starting with 50 uniformly distributed initial values and the compiled results are always as in Figure 5. As expected, the equilibrium strategies are not unique, however, the equilibrium strategies for players engaging in US with players of opposite status (namely
Similar to studying the 2-player and 3-player games, we vary
x∗_=((0,0,0,1),(0.2799,0.7201,0),(0.3174,0.3651,0.3175,0),(0.5339,0.4661,0)), | (6) |
relying on the fact that all equilibrium strategies for players engaging in USE with players of opposite status in the baseline scenario are 0.
Figure 6 shows results similar to the 3-player scenario, where
The impact on the infection levels from such a game alone is shown to have an increasing direction. The infected fraction in the population (lower left panel of Figure 6) increases from the baseline value of 0.05% to
Recall that by
As we have stated in previous sections, in both analyses of
In this section we investigate implementing a theoretical prophylactic vaccine within the population, with efficacy
1This is an assumption only; treatment optimism can in fact increase HIV transmission, as we show in [35].
The population sizes are adjusted to fit
We investigate what impact these changes have on HIV transmission if
game 1: P2∈G1isHIV−vaccinated,P3∈G2 isHIV−unvaccinated |
game 2: P2∈G1 isHIV−vaccinated,P3,P4 areHIV−unvaccinated. |
Games 3, 4, 5 do not involve players in
Results presented are dependent on varying the vaccine efficacy
The increase in
The previous sections outlined a one-off casual sexual encounters game for 2, 3 and 4 player variations dependent on status and age. Our most interesting conclusions are that preferences of
We showed that generalized Nash equilibria exist and can be computed for these types of games. Moreover, we demonstrated the sensitivity of GN equilibria with respect to varying players' utilities of unprotected sex with partners of opposite
As we expanded from 2-player to 3-and 4-player games, we in fact refined the interactions among the individuals in a population, previously regarded as one-on-one positive-negative outcomes. Given that the sizes of groups and activity parameter values matter, we saw that the increase in transmission due to a single instance of a 3-player or a 4-player game is smaller than in a 2-player game. However, the biggest contributing factor in changing HIV transmission is found to be due to the
We need to stress again here that the value of this modelling framework depends on the initial conditions of a population under observation, especially in the case of multiplayer groups. Moreover, knowing specific equilibrium strategies within the population of players can lead to sensitivity analyses having these specific equilibrium strategies as initial values. Last but not least, the fact that mathematically a generalized Nash game has in fact sets of equilibria gives strength to this modelling paradigm, in the sense that differing initial conditions in a players' population can give rise to different outcomes, which seems more appropriate to model real life encounter outcomes.
As future work, it would be interesting to include incorporation of ART for
This work has been funded by the National Science and Engineering Research Council of Canada. The authors gratefully acknowledge this support.
The authors would like to thank the anonymous referees for their excellent suggestions that led to a clearer presentation of our work. It is through their suggestions that we have come to realize the naturally arising shared constraint which pushed our models into the generalized Nash game applications presented here.
[1] |
T. Zhang, A. P. Marand, J. Jiang, PlantDHS: A database for DNase I hypersensitive sites in plants, Nucleic. Acids. Res., 44 (2016), D1148–D1153. https://doi.org/10.1093/nar/gkv962 doi: 10.1093/nar/gkv962
![]() |
[2] |
D. S. Gross, W. T. Garrard, Nuclease hypersensitive sites in chromatin, Annu. Rev. Biochem., 57 (1988), 159–197. https://doi.org/10.1146/annurev.bi.57.070188.001111 doi: 10.1146/annurev.bi.57.070188.001111
![]() |
[3] |
G. E. Crawford, I. E. Holt, J. C. Mullikin, D. Tai, E. D. Green, T. G. Wolfsberg, et al., Identifying gene regulatory elements by genome-wide recovery of DNase hypersensitive sites, Proc. Natl. Acad. Sci., 101 (2004), 992–997. https://doi.org/10.1073/pnas.0307540100 doi: 10.1073/pnas.0307540100
![]() |
[4] |
M. M. Carrasquillo, M. Allen, J. D. Burgess, X. Wang, S. L. Strickland, S. Aryal, et al., A candidate regulatory variant at the TREM gene cluster associates with decreased Alzheimer's disease risk and increased TREML1 and TREM2 brain gene expression, Alzheimer's Dementia, 13 (2017), 663–673. https://doi.org/10.1016/j.jalz.2016.10.005 doi: 10.1016/j.jalz.2016.10.005
![]() |
[5] |
W. Meuleman, A. Muratov, E. Rynes, J. Halow, K. Lee, D. Bates, et al., Index and biological spectrum of human DNase I hypersensitive sites, Nature, 584 (2020), 244–251. https://doi.org/10.1038/s41586-020-2559-3 doi: 10.1038/s41586-020-2559-3
![]() |
[6] |
M. T. Maurano, R. Humbert, E. Rynes, R. E. Thurman, E. Haugen, H. Wang, et al., Systematic localization of common disease-associated variation in regulatory DNA, Science, 337 (2012), 1190–1195. https://doi.org/10.1126/science.1222794 doi: 10.1126/science.1222794
![]() |
[7] |
J. Ernst, P. Kheradpour, T. S. Mikkelsen, N. Shoresh, L. D. Ward, C. B. Epstein, et al., Mapping and analysis of chromatin state dynamics in nine human cell types, Nature, 473 (2011), 43–49. https://doi.org/10.1038/nature09906 doi: 10.1038/nature09906
![]() |
[8] |
M. Mokry, M. Harakalova, F. W. Asselbergs, P. I. de Bakker, E. E. Nieuwenhuis, Extensive association of common disease variants with regulatory sequence, PLoS One, 11 (2016), e0165893. https://doi.org/10.1371/journal.pone.0165893 doi: 10.1371/journal.pone.0165893
![]() |
[9] |
D. Weghorn, F. Coulet, K. M. Olson, C. DeBoever, F. Drees, A. Arias, et al., Identifying DNase I hypersensitive sites as driver distal regulatory elements in breast cancer, Nat. Commun., 8 (2017), 1–16. https://doi.org/10.1038/s41467-017-00100-x doi: 10.1038/s41467-017-00100-x
![]() |
[10] |
W. Jin, Q. Tang, M. Wan, K. Cui, Y. Zhang, G. Ren, et al., Genome-wide detection of DNase I hypersensitive sites in single cells and FFPE tissue samples, Nature, 528 (2015), 142–146. https://doi.org/10.1038/nature15740 doi: 10.1038/nature15740
![]() |
[11] |
G. E. Crawford, S. Davis, P. C. Scacheri, G. Renaud, M. J. Halawi, M. R. Erdos, et al., DNase-chip: A high-resolution method to identify DNase I hypersensitive sites using tiled microarrays, Nat. Methods, 3 (2006), 503–509. https://doi.org/10.1038/nmeth888 doi: 10.1038/nmeth888
![]() |
[12] |
J. Cooper, Y. Ding, J. Song, K. Zhao, Genome-wide mapping of DNase I hypersensitive sites in rare cell populations using single-cell DNase sequencing, Nat. Protoc., 12 (2017), 2342–2354. https://doi.org/10.1038/nprot.2017.099 doi: 10.1038/nprot.2017.099
![]() |
[13] |
G. E. Crawford, I. E. Holt, J. Whittle, B. D. Webb, D. Tai, S. Davis, et al., Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS), Genome Res., 16 (2006), 123–131. https://doi.org/10.1101/gr.4074106 doi: 10.1101/gr.4074106
![]() |
[14] |
L. Song, G. E. Crawford, DNase-seq: A high-resolution technique for mapping active gene regulatory elements across the genome from mammalian cells, Cold Spring Harbor Protoc., 2010 (2010), pdb.prot5384. https://doi.org/10.1101/pdb.prot5384 doi: 10.1101/pdb.prot5384
![]() |
[15] | W. Zhang, J. Jiang, Genome-wide mapping of DNase I hypersensitive sites in plants, in Plant Functional Genomics, Humana Press, 1284 (2015), 71–89. https://doi.org/10.1007/978-1-4939-2444-8_4 |
[16] |
Y. Wang, K. Wang, Genome-wide identification of DNase I hypersensitive sites in plants, Curr. Protoc., 1 (2021), e148. https://doi.org/10.1002/cpz1.148 doi: 10.1002/cpz1.148
![]() |
[17] |
S. Wang, Q. Zhang, Z. Shen, Y. He, Z. Chen, J. Li, et al., Predicting transcription factor binding sites using DNA shape features based on shared hybrid deep learning architecture, Mol. Ther. Nucleic Acids, 24 (2021), 154–163. https://doi.org/10.1016/j.omtn.2021.02.014 doi: 10.1016/j.omtn.2021.02.014
![]() |
[18] |
Q. Zhang, Y. He, S. Wang, Z. Chen, Z. Guo, Z. Cui, et al., Base-resolution prediction of transcription factor binding signals by a deep learning framework, PLoS Comp. Biol., 18 (2022), e1009941. https://doi.org/10.1371/journal.pcbi.1009941 doi: 10.1371/journal.pcbi.1009941
![]() |
[19] |
S. Wang, Y. He, Z. Chen, Q. Zhang, FCNGRU: Locating transcription factor binding sites by combing fully convolutional neural network with gated recurrent unit, IEEE J. Biomed. Health. Inf., 26 (2021), 1883–1890. https://doi.org/10.1109/JBHI.2021.3117616 doi: 10.1109/JBHI.2021.3117616
![]() |
[20] |
Q. Zhang, Z. Shen, D. S. Huang, Predicting in-vitro transcription factor binding sites using DNA sequence+ shape, IEEE/ACM Trans. Comput. Biol. Bioinf., 18 (2019), 667–676. https://doi.org/10.1109/TCBB.2019.2947461 doi: 10.1109/TCBB.2019.2947461
![]() |
[21] |
Q. Zhang, S. Wang, Z. Chen, Y. He, Q. Liu, D. S. Huang, Locating transcription factor binding sites by fully convolutional neural network, Briefings Bioinf., 22 (2021), bbaa435. https://doi.org/10.1093/bib/bbaa435 doi: 10.1093/bib/bbaa435
![]() |
[22] |
Y. Zhang, Z. Wang, Y. Zeng, Y. Liu, S. Xiong, M. Wang, et al., A novel convolution attention model for predicting transcription factor binding sites by combination of sequence and shape, Briefings Bioinf., 23 (2022), bbab525. https://doi.org/10.1093/bib/bbab525 doi: 10.1093/bib/bbab525
![]() |
[23] |
Y. Zhang, Z. Wang, Y. Zeng, J. Zhou, Q. Zou, High-resolution transcription factor binding sites prediction improved performance and interpretability by deep learning method, Briefings Bioinf., 22 (2021), bbab273. https://doi.org/10.1093/bib/bbab273 doi: 10.1093/bib/bbab273
![]() |
[24] |
Y. He, Z. Shen, Q. Zhang, S. Wang, D. S. Huang, A survey on deep learning in DNA/RNA motif mining, Briefings Bioinf., 22 (2021), bbaa229. https://doi.org/10.1093/bib/bbaa229 doi: 10.1093/bib/bbaa229
![]() |
[25] |
W. S. Noble, S. Kuehn, R. Thurman, M. Yu, J. Stamatoyannopoulos, Predicting the in vivo signature of human gene regulatory sequences, Bioinformatics, 21 (2005), i338–i343. https://doi.org/10.1093/bioinformatics/bti1047 doi: 10.1093/bioinformatics/bti1047
![]() |
[26] |
B. Manavalan, T. H. Shin, G. Lee, DHSpred: Support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest, Oncotarget, 9 (2018), 1944. https://doi.org/10.18632/oncotarget.23099 doi: 10.18632/oncotarget.23099
![]() |
[27] |
S. Zhang, W. Zhuang, Z. Xu, Prediction of DNase I hypersensitive sites in plant genome using multiple modes of pseudo components, Anal. Biochem., 549 (2018), 149–156. https://doi.org/10.1016/j.ab.2018.03.025 doi: 10.1016/j.ab.2018.03.025
![]() |
[28] |
Y. Liang, S. Zhang, IDHS-DMCAC: Identifying DNase I hypersensitive sites with balanced dinucleotide-based detrending moving-average cross-correlation coefficient, SAR QSAR Environ. Res., 30 (2019), 429–445. https://doi.org/10.1080/1062936X.2019.1615546 doi: 10.1080/1062936X.2019.1615546
![]() |
[29] |
S. Zhang, Z. Duan, W. Yang, C. Qian, Y. You, IDHS-DASTS: Identifying DNase I hypersensitive sites based on LASSO and stacking learning, Mol. Omics, 17 (2021), 130–141. https://doi.org/10.1039/D0MO00115E doi: 10.1039/D0MO00115E
![]() |
[30] |
B. Liu, R. Long, K. C. Chou, IDHS-EL: Identifying DNase I hypersensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework, Bioinformatics, 32 (2016), 2411–2418. https://doi.org/10.1093/bioinformatics/btw186 doi: 10.1093/bioinformatics/btw186
![]() |
[31] |
S. Zhang, J. Lin, L. Su, Z. Zhou, PDHS-DSET: Prediction of DNase I hypersensitive sites in plant genome using DS evidence theory, Anal. Biochem., 564 (2019), 54–63. https://doi.org/10.1016/j.ab.2018.10.018 doi: 10.1016/j.ab.2018.10.018
![]() |
[32] |
Y. Zheng, H. Wang, Y. Ding, F. Guo, CEPZ: A novel predictor for identification of DNase I hypersensitive sites, IEEE/ACM Trans. Comput. Biol. Bioinf., 18 (2021), 2768–2774. https://doi.org/10.1109/TCBB.2021.3053661 doi: 10.1109/TCBB.2021.3053661
![]() |
[33] |
S. Zhang, Q. Yu, H. He, F. Zhu, P. Wu, L. Gu, et al., IDHS-DSAMS: Identifying DNase I hypersensitive sites based on the dinucleotide property matrix and ensemble bagged tree, Genomics, 112 (2020), 1282–1289. https://doi.org/10.1016/j.ygeno.2019.07.017 doi: 10.1016/j.ygeno.2019.07.017
![]() |
[34] |
S. Zhang, T. Xue, Use Chou's 5-steps rule to identify DNase I hypersensitive sites via dinucleotide property matrix and extreme gradient boosting, Mol. Genet. Genomics, 295 (2020), 1431–1442. https://doi.org/10.1007/s00438-020-01711-8 doi: 10.1007/s00438-020-01711-8
![]() |
[35] |
Z. C. Xu, S. Y. Jiang, W. R. Qiu, Y. C. Liu, X. Xiao, IDHSs-PseTNC: Identifying DNase I hypersensitive sites with pseuo trinucleotide component by deep sparse auto-encoder, Lett. Org. Chem., 14 (2017), 655–664. https://doi.org/10.2174/1570178614666170213102455 doi: 10.2174/1570178614666170213102455
![]() |
[36] |
C. Lyu, L. Wang, J. Zhang, Deep learning for DNase I hypersensitive sites identification, BMC genomics, 19 (2018), 155–165. https://doi.org/10.1186/s12864-018-5283-8 doi: 10.1186/s12864-018-5283-8
![]() |
[37] |
P. Feng, N. Jiang, N. Liu, Prediction of DNase I hypersensitive sites by using pseudo nucleotide compositions, Sci. World J., 2014 (2014), 740506. https://doi.org/10.1155/2014/740506 doi: 10.1155/2014/740506
![]() |
[38] |
W. Chen, T. Y. Lei, D. C. Jin, H. Lin, K. C. Chou, PseKNC: A flexible web server for generating pseudo K-tuple nucleotide composition, Anal. Biochem., 456 (2014), 53–60. https://doi.org/10.1016/j.ab.2014.04.001 doi: 10.1016/j.ab.2014.04.001
![]() |
[39] |
W. Chen, H. Lin, K. C. Chou, Pseudo nucleotide composition or PseKNC: An effective formulation for analyzing genomic sequences, Mol. Biosyst., 11 (2015), 2620–2634. https://doi.org/10.1039/C5MB00155B doi: 10.1039/C5MB00155B
![]() |
[40] |
B. Liu, F. Liu, X. Wang, J. Chen, L. Fang, K. C. Chou, Pse-in-One: A web server for generating various modes of pseudo components of DNA, RNA, and protein sequences, Nucleic Acids Res., 43 (2015), W65–W71. https://doi.org/10.1093/nar/gkv458 doi: 10.1093/nar/gkv458
![]() |
[41] |
S. Zhang, Z. Zhou, X. Chen, Y. Hu, L. Yang, PDHS-SVM: A prediction method for plant DNase I hypersensitive sites based on support vector machine, J. Theor. Biol., 426 (2017), 126–133. https://doi.org/10.1016/j.jtbi.2017.05.030 doi: 10.1016/j.jtbi.2017.05.030
![]() |
[42] |
K. He, X. Zhang, S. Ren, J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition, IEEE Trans. Pattern Anal. Mach. Intell., 37 (2015), 1904–1916. https://doi.org/10.1109/TPAMI.2015.2389824 doi: 10.1109/TPAMI.2015.2389824
![]() |
[43] |
F. Y. Dao, H. Lv, W. Su, Z. J. Sun, Q. L. Huang, H. Lin, IDHS-deep: an integrated tool for predicting DNase I hypersensitive sites by deep neural network, Briefings Bioinf., 22 (2021), bbab047. https://doi.org/10.1093/bib/bbab047 doi: 10.1093/bib/bbab047
![]() |
[44] |
C. E. Breeze, J. Lazar, T. Mercer, J. Halow, I. Washington, K. Lee, et al., Atlas and developmental dynamics of mouse DNase I hypersensitive sites, bioRxiv, 2020 (2020). https://doi.org/10.1101/2020.06.26.172718 doi: 10.1101/2020.06.26.172718
![]() |
[45] |
W. Li, A. Godzik, Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences, Bioinformatics, 22 (2006), 1658–1659. https://doi.org/10.1093/bioinformatics/btl158 doi: 10.1093/bioinformatics/btl158
![]() |
[46] |
L. Fu, B. Niu, Z. Zhu, S. Wu, W. Li, CD-HIT: Accelerated for clustering the next-generation sequencing data, Bioinformatics, 28 (2012), 3150–3152. https://doi.org/10.1093/bioinformatics/bts565 doi: 10.1093/bioinformatics/bts565
![]() |
[47] |
X. Tang, P. Zheng, X. Li, H. Wu, D. Q. Wei, Y. Liu, et al., Deep6mAPred: A CNN and Bi-LSTM-based deep learning method for predicting DNA N6-methyladenosine sites across plant species, Methods, 204 (2022), 142–150. https://doi.org/10.1016/j.ymeth.2022.04.011 doi: 10.1016/j.ymeth.2022.04.011
![]() |
[48] | T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space, preprint, arXiv: 1301.3781. |
[49] | T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, J. Dean, Distributed representations of words and phrases and their compositionality, in Advances in neural information processing systems, 26 (2013), 3111–3119. |
[50] |
K. Fukushima, S. Miyake, Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position, Pattern Recognt., 15 (1982), 455–469. https://doi.org/10.1016/0031-3203(82)90024-3 doi: 10.1016/0031-3203(82)90024-3
![]() |
[51] |
D. H. Hubel, T. N. Wiesel, Receptive fields, binocular interaction and functional architecture in the cat's visual cortex, J. Physiol., 160 (1962), 106. https://doi.org/10.1113/jphysiol.1962.sp006837 doi: 10.1113/jphysiol.1962.sp006837
![]() |
[52] | Y. LeCun, B. Boser, J. Denker, D. Henderson, R. Howard, W. Hubbard, et al., Handwritten digit recognition with a back-propagation network, in Advances in neural information processing systems, Morgan Kaufmann, 2 (1989), 396–404. |
[53] |
S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Comput., 9 (1997), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735 doi: 10.1162/neco.1997.9.8.1735
![]() |
[54] | A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, et al., Attention is all you need, in Advances in neural information processing systems, 30 (2017), 6000–6010. |
[55] | C. Raffel, D. P. Ellis, Feed-forward networks with attention can solve some long-term memory problems, preprint, arXiv: 1512.08756. |
1. | Jian Jin, Jie Feng, iDHS-RGME: Identification of DNase I hypersensitive sites by integrating information on nucleotide composition and physicochemical properties, 2024, 734, 0006291X, 150618, 10.1016/j.bbrc.2024.150618 |
Utility for USE | Utility for PSE | Range | ||
HIV+ | HIV+ | | | |
HIV+ | HIV- | | | |
HIV- | HIV+ | | ||
HIV- | HIV- | |
Term | Definition | Baseline value | Range |
Probability of HIV spread from an | 0.02 | - | |
Initial proportion of | 0.05 | 5% of population | |
Initial proportion of | 0.95 | 95% of population |
Utility for USE | Utility for PSE | Range | ||
HIV+ | HIV+ | | | |
HIV+ | HIV- | | | |
HIV- | HIV+ | | ||
HIV- | HIV- | |
Term | Definition | Baseline value | Range |
Probability of HIV spread from an | 0.02 | - | |
Initial proportion of | 0.05 | 5% of population | |
Initial proportion of | 0.95 | 95% of population |