Research article

Identification of DNA-protein binding residues through integration of Transformer encoder and Bi-directional Long Short-Term Memory


  • Received: 12 October 2023 Revised: 18 November 2023 Accepted: 28 November 2023 Published: 11 December 2023
  • DNA-protein binding is crucial for the normal development and function of organisms. The significance of accurately identifying DNA-protein binding sites lies in its role in disease prevention and the development of innovative approaches to disease treatment. In the present study, we introduce a precise and robust identifier for DNA-protein binding residues. In the context of protein representation, we combine the evolutionary information of the protein, represented by its position-specific scoring matrix, with the spatial information of the protein's secondary structure, enriching the overall informational content. This approach initially employs a combination of Bi-directional Long Short-Term Memory and Transformer encoder to jointly extract the interdependencies among residues within the protein sequence. Subsequently, convolutional operations are applied to the resulting feature matrix to capture local features of the residues. Experimental results on the benchmark dataset demonstrate that our method exhibits a higher level of competitiveness when compared to contemporary classifiers. Specifically, our method achieved an MCC of 0.349, SP of 96.50%, SN of 44.03% and ACC of 94.59% on the PDNA-41 dataset.

    Citation: Haipeng Zhao, Baozhong Zhu, Tengsheng Jiang, Zhiming Cui, Hongjie Wu. Identification of DNA-protein binding residues through integration of Transformer encoder and Bi-directional Long Short-Term Memory[J]. Mathematical Biosciences and Engineering, 2024, 21(1): 170-185. doi: 10.3934/mbe.2024008

    Related Papers:

    [1] Tahir Afzal, Muhammad Asim Afridi, Muhammad Naveed Jan . Integrating LSTM with Fama-French six factor model for predicting portfolio returns: Evidence from Shenzhen stock market China. Data Science in Finance and Economics, 2025, 5(2): 177-204. doi: 10.3934/DSFE.2025009
    [2] Xiaozheng Lin, Meiqing Wang, Choi-Hong Lai . A modification term for Black-Scholes model based on discrepancy calibrated with real market data. Data Science in Finance and Economics, 2021, 1(4): 313-326. doi: 10.3934/DSFE.2021017
    [3] Nitesha Dwarika . Asset pricing models in South Africa: A comparative of regression analysis and the Bayesian approach. Data Science in Finance and Economics, 2023, 3(1): 55-75. doi: 10.3934/DSFE.2023004
    [4] Yi Chen, Zhehao Huang . Measuring the effects of investor attention on China's stock returns. Data Science in Finance and Economics, 2021, 1(4): 327-344. doi: 10.3934/DSFE.2021018
    [5] Moses Khumalo, Hopolang Mashele, Modisane Seitshiro . Quantification of the stock market value at risk by using FIAPARCH, HYGARCH and FIGARCH models. Data Science in Finance and Economics, 2023, 3(4): 380-400. doi: 10.3934/DSFE.2023022
    [6] Zimei Huang, Zhenghui Li . What reflects investor sentiment? Empirical evidence from China. Data Science in Finance and Economics, 2021, 1(3): 235-252. doi: 10.3934/DSFE.2021013
    [7] Tsumbedzo Mashamba, Modisane Seitshiro, Isaac Takaidza . A comprehensive high pure momentum equity timing framework using the Kalman filter and ARIMA forecasting. Data Science in Finance and Economics, 2024, 4(4): 548-569. doi: 10.3934/DSFE.2024023
    [8] Kuo-Shing Chen . Interlinkages between Bitcoin, green financial assets, oil, and emerging stock markets. Data Science in Finance and Economics, 2024, 4(1): 160-187. doi: 10.3934/DSFE.2024006
    [9] Qian Shen, Yifan Zhang, Jiale Xiao, Xuhua Dong, Zifei Lin . Research of daily stock closing price prediction for new energy companies in China. Data Science in Finance and Economics, 2023, 3(1): 14-29. doi: 10.3934/DSFE.2023002
    [10] Katleho Makatjane, Ntebogang Moroke . Examining stylized facts and trends of FTSE/JSE TOP40: a parametric and Non-Parametric approach. Data Science in Finance and Economics, 2022, 2(3): 294-320. doi: 10.3934/DSFE.2022015
  • DNA-protein binding is crucial for the normal development and function of organisms. The significance of accurately identifying DNA-protein binding sites lies in its role in disease prevention and the development of innovative approaches to disease treatment. In the present study, we introduce a precise and robust identifier for DNA-protein binding residues. In the context of protein representation, we combine the evolutionary information of the protein, represented by its position-specific scoring matrix, with the spatial information of the protein's secondary structure, enriching the overall informational content. This approach initially employs a combination of Bi-directional Long Short-Term Memory and Transformer encoder to jointly extract the interdependencies among residues within the protein sequence. Subsequently, convolutional operations are applied to the resulting feature matrix to capture local features of the residues. Experimental results on the benchmark dataset demonstrate that our method exhibits a higher level of competitiveness when compared to contemporary classifiers. Specifically, our method achieved an MCC of 0.349, SP of 96.50%, SN of 44.03% and ACC of 94.59% on the PDNA-41 dataset.



    This research focuses its interest on the mathematical modeling of the population dynamics of semelparous biological species. A species is called semelparous when it has a single reproductive episode before dying. Semelparity (sometimes called big-bang reproduction) occurs in very diverse biological species, see [1], including amphibians (e.g., Hyla frogs), arachnids (e.g., Pardosa licosidae spider, australian redback spider, desert spider, or black widow spider), fish (e.g., Pacific salmon, or sockeye salmon), insects (e.g., some butterflies, cicadas, or mayflies), mammals (e.g., some didelphids or dasyurid marsupials), mollusks (some squids or octopuses), reptiles (e.g., Labord chameleon, or some lizards), etc.

    Methodologies for modeling dynamic biological systems, such as those based on population viability analysis (see [2,3]) or compartmental modeling (see [4,5]) usually require information about environmental variables, mortality rates, growth rates, etc. In practice, such information is difficult to obtain. In this work, we shall consider the methodology based on branching processes. These stochastic processes are appropriate mathematical models to describe the evolution of dynamical systems whose components, after a certain life period, reproduce and die in such a way that the transition from one to another state of the system is made according to a certain probability distribution. For theoretical concepts and applications about such types of processes, we refer the reader to some classical monographs[6,7,8], where several applications to cell kinetics, cell biology, chemotherapy, gene amplification, human evolution, and molecular biology are presented. See also the contributions, based on branching processes, by [9,10], in nuclear physics and complex contagion adoption dynamics, respectively.

    In fact, branching processes are routinely used to describe the population dynamics of biological species with both asexual and sexual reproduction. We are especially interested in the mathematical modeling of the demographic dynamics of biological species with sexual reproduction. To this end, a fairly rich literature has emerged about discrete-time two-sex branching processes, see the surveys in [11,12] and the discussions therein. Most of these branching processes assume that all of the progenitor couples have a similar reproductive behavior, see [13,14,15]. It is also frequently assumed that mating and reproduction depend on the current number of progenitor couples existing in the population, see [16,17,18]. However, it is known that, due to various environmental factors, e.g., weather conditions, food supply, fertility parameters, or predators, in many biological species mating and reproduction occur in a non-predictable environment influenced by the current number of females and males in the population. For stochastic modeling about the demographic evolution of such species, two-sex branching processes had not been sufficiently developed. With such motivation, in [19], a new class of two-sex branching processes, which takes into account the possibility of various mating and reproduction strategies, both depending on the number of females and males in the population, was introduced. This class of processes is appropriate for the description of the demographic dynamics of semelparous species, which are characterized by having diverse behaviors in the mating and reproduction phases. In [20,21], some results about such a class of two-sex processes were established. The main purpose of this work is to continue this research line by providing new methodological (probabilistic and statistical) contributions.

    The paper is organized as follows. In Section 2, the probability model is mathematically described and interpreted. In Section 3, some probabilistic results are provided. The probability distribution associated with the number of generations elapsed before the possible extinction of the population (time to the extinction) is determined. The class of processes under study is then used to mathematically model phenomena concerning to populate or repopulate habitats with endangered semelparous species. In Section 4, statistical results are derived. By considering maximum likelihood and Bayesian estimation methodologies, approximations for the main reproductive parameters involved in the probability model are proposed. To this purpose, information about the reproduction of the couples is incorporated. As illustration, a simulated example contextualized with a species of chameleons (Labord's chameleon) is presented. Concluding remarks and some questions for research are included in Section 5.

    Let us consider the discrete-time two-sex branching process introduced in [19], denoted by {Xn}n=0, Xn=(Fn,Mn) representing the number of female and male individuals at generation n. The underlying probability model is described as follows, with N and N+ denoting the non-negative and the positive integers, respectively:

    1) The mating phase is represented by a sequence of nm1 two arguments integer-valued functions {Ll}lNm, Nm:={1,,nm}. Each Ll is assumed to be non-decreasing and such that Ll(F,0)=Ll(0,M)=0, F,MN. At generation n, according to the lth mating strategy (function), Ll(Fn,Mn) couples female-male are formed.

    2) The reproduction phase is modeled by a sequence of nr1 offspring probability distributions {Ph}hNr, Nr:={1,,nr}, Ph:={phk,s}(k,s)Sh, with ShN2 and phk,s being the probability for a given couple to produce exactly k females and s males, when Ph is the reproductive strategy.

    3) In each generation, the mating and reproduction strategies are determined through suitable functions φm and φr, both defined on N2, taking values on Nm and Nr, respectively.

    Initially, X0=x0N2+, and the number of couples originated Lφm(x0)(x0)>0. Given that Xn=xN2, it is then derived that, in the nth generation, Lφm(x) and Pφr(x) are the corresponding mating and reproductive strategies, respectively. Hence, at generation n+1,

    Xn+1:=Lφm(x)(x)i=1(Fhn,i,Mhn,i),  h=φr(x),  nN, (2.1)

    with Fhn,i and Mhn,i denoting, respectively, the number of female and male individuals originated by the ith couple at generation n, providing that the reproductive strategy h has been considered. For each hNr, the random vectors (Fhn,i,Mhn,i), i=1,,Lφm(x)(x), are assumed to be independent and identically distributed (i.i.d.) with offspring probability distribution Ph, i.e.,

    P(Fhn,1=k,Mhn,1=s)=phk,s, (k,s)Sh.

    Remark 2.1 Functions Ll, φm, and φr should be flexible enough in order to fit the main features of the semelparous species we pretend to describe. Usually, such functions will depend of certain biological/ecological parameters of interest in the demographic dynamics of the species.

    Remark 2.2 It is verified that {Xn}n=0 is a homogeneous Markov chain. In fact, given x0,,xn,xn+1N2, taking into account that for each hNr, independent of n, the random vectors (Fhn,i,Mhn,i), i=1,,Lφm(x)(x), are i.i.d.,

    P(Xn+1=xn+1X0=x0,,Xn=xn)=P(Xn+1=xn+1Xn=xn)=P(Lφm(xn)(xn)i=1(Fφr(xn)n,i,Mφr(xn)n,i)=xn+1).

    Note that if, for some n1, Xn=(0,0), then Xn+j=(0,0), j1, thus (0,0) is an absorbing state. Consequently, in such a case, the population will become extinct. Let us denote by:

    q(x0):=P(limnXn=0X0=x0), x0N2+,

    the extinction probability, associated with {Xn}n=0, when initially X0=x0. Assuming that, for each hNr:

    max{P(Fh1,1=0),P(Mh1,1=0)}>0, (2.2)

    it has been proved in [20] the extinction-explosion property:

    q(x0)+P(limng(Xn)=X0=x0)=1, (2.3)

    where for F,MN,

    g(F,M):=αF+βM, α0, β0, α+β>0.

    Also, by considering the rate:

    mg(x):=g(x)1E[g(Xn+1)Xn=x]=Ll(x)g(μh)g(x)1,

    sufficient conditions for the extinction/survival of a biological population, described through the model (2.1), have been established in [20].

    In particular, the following cases have a special biological significance: g(F,M)=F, g(F,M)=M, or g(F;M)=F+M, namely, the number of females, males, or total individuals in the population, respectively.

    Remark 2.3 For each hNr, let us denote by μh:=(μh1, μh2) and Σh=(σhij)i,j=1,2 the mean vector and the covariance matrix of (Fh1,1,Mh1,1), respectively, i.e.,

    μhi:=(k1,k2)Shkiphk1,k2,i=1,2, (2.4)
    σhij:=(k1,k2)Sh(kiμhi)(kjμhj)phk1,k2,i,j=1,2. (2.5)

    Given xN2,

    E[Xn+1Xn=x]=E[Ll(x)i=1(Fhn,i,Mhn,i)]=Ll(x)i=1E[(Fhn,i,Mhn,i)]=Ll(x)μh,

    and

    Var[Xn+1Xn=x]=Var[Ll(x)i=1(Fhn,i,Mhn,i)]=Ll(x)i=1Var[(Fhn,i,Mhn,i)]=Ll(x)Σh,

    where l=φm(x) and h=φr(x).

    The research about this class of two-sex branching processes will now continue investigating new probabilistic and statistical questions of biological/ecological interest.

    From a probabilistic point of view, attention will be focused on two issues of special interest. First, assuming the possible extinction of the population in the habitat, we will look at the determination of the probability distribution associated with the number of generations elapsed before the possible extinction occurs. Second, we will investigate the application of the class of two-sex branching processes under study to the phenomenon of populating or repopulating habitats with endangered semelparous species. We will assume condition (2.2) and q(x0)(0,1). Note that, taking into account (2.3), the cases q(x0)=0 or q(x0)=1 mean the explosion or the extinction of the population, respectively.

    Let us denote by Zn:=Lφm(Xn)(Xn) the number of couples formed at generation n. Clearly, if for some n1, Zn=0, then the population will be extinct. Given that X0=x0=(F0,M0)N2+, let us introduce the random variable:

    T(x0):=sup{n0: Zn>0},

    representing the number of generations elapsed before the extinction of the population occurs, when there were F0 females and M0 males in the population. Assuming that T(x0)<, the next result provides the probability distribution associated with T(x0), and also its main moments (mean and variance). In the case that the population goes extinct, such probability distribution shows how quickly the extinction will probably occur.

    Let un(s):=E[sZn], 0s1, be the probability generating function of Zn. By simplicity, it will be denoted by un(0):=un+1(0)un(0), nN.

    Theorem 3.1

    (a) P(T(x0)=nT(x0)<)=q(x0)1un(0), nN.

    (b) E[T(x0)T(x0)<]=n=1(1q(x0)1un(0)).

    (c) Var[T(x0)T(x0)<]=q(x0)1[n=1n2un(0)q(x0)1(n=1(q(x0)un(0))2].

    Proof. First, note that P(T(x0)<)=q(x0).

    (a) Using that Z0:=Lφm(x0)(x0)>0, it is derived that u0(0)=P(Z0=0)=0. Therefore,

    P(T(x0)=0T(x0)<)=P(T(x0)<)1P(T(x0)=0)=q(x0)1P(Z1=0)=q(x0)1u1(0)=q(x0)1(u1(0)u0(0))=q(x0)1u0(0).

    Now, for nN+,

    P(T(x0)=nT(x0)<)=q(x0)1P(T(x0)=n)=q(x0)1(P(T(x0)n)P(T(x0)n1))=q(x0)1(P(Zn+1=0)P(Zn=0))=q(x0)1(un+1(0)un(0))=q(x0)1un(0).

    (b) Using that T(x0) is a non-negative random variable, it is derived, see [22, pg. 84], that

    E[T(x0)T(x0)<]=n=0(1P(T(x0)nT(x0)<)).

    Hence, using that P(T(x0)n)=P(Zn+1=0)=un+1(0),

    E[T(x0)T(x0)<]=n=0(1P(T(x0)nT(x0)<))=n=0(1q(x0)1P(T(x0)n))=n=0(1q(x0)1un+1(0))=n=1(1q(x0)1un(0)).

    (c) The result is derived, from Theorem 3.1(b), using that:

    Var[T(x0)T(x0)<]=E[T(x0)2T(x0)<]E[T(x0)T(x0)<]2,

    and taking into account that:

    E[T(x0)2T(x0)<]=n=1n2P(T(x0)=nT(x0)<)=q(x0)1n=1n2(P(T(x0)n)P(T(x0)n1))=q(x0)1n=1n2(P(Zn+1=0)P(Zn=0))=q(x0)1n=1n2(un+1(0)un(0))=q(x0)1n=1n2un(0). (3.1)

    Let us consider a certain habitat in which a semelparous species has become extinct, or it is in serious danger of extinction. The purpose is to populate or repopulate the habitat with such species. To this end, of ecological significance, several attempts to repopulate the species in the habitat will probably be necessary. Next, the class of two-sex branching processes {Xn}n=0, defined in (2.1), will be used as the mathematical model.

    In fact, let {X(j)n}n=0, jN+, be independent processes where, for each jN+, {X(j)n}n=0 is a two-sex branching process, like the one defined in (2.1), which describes the population dynamics concerning the jth attempt of repopulating. Thus, vector X(j)n=(F(j)n,M(j)n) represents the number of female and male individuals in the habitat at generation n, in the jth attempt of repopulating. All processes {X(j)n}n=0, jN+, have the same mating and reproductive strategies, i.e., they have the same sequences {Ll}lNm and {Ph}hNr. Assume condition (2.2) holds and q(x0)(0,1).

    Initially (attempt j=1), it is assumed that X(1)0=x0=(F0,M0)N2+, i.e., F0 females and M0 males of the species under consideration are introduced in the habitat. If, after a certain number of generations, the population becomes extinct, then it will be necessary to restart the repopulating, by introducing again (attempt j=2) F0 females and M0 males in the habitat, i.e., X(2)0=x0, and so on. This iterative procedure continues until a sufficient number of females and males are achieved in the habitat, so that the risk of extinction disappears (for simplicity, this fact will be referred to as implementation of the species in the habitat). It is deduced that

    P(limnX(j)n=0X(j)0=x0)=q(x0), jN+.

    Let T(j)(x0):=sup{n0: Z(j)n>0}, Z(j)n:=Lφm(Xn)(j))(Xn(j)), jN+, namely, the time to the extinction at the jth attempt of repopulation. It is derived that {T(j)(x0)}j=1 is a sequence of i.i.d. random variables with the probability distribution given in Theorem 3.1(a).

    Let us denote by N(x0) the number of attempts until the implementation of the species occurs. It is then verified that N(x0) is distributed according to a geometric law with parameter 1q(x0). Hence,

    P(N(x0)=n)=q(x0)n(1q(x0)), nN, (3.2)

    and

    E[N(x0)]=q(x0)(1q(x0))1 and Var[N(x0)]=q(x0)(1q(x0))2. (3.3)

    Finally, let us consider the variable:

    T(x0):=N(x0)j=1T(j)(x0), (3.4)

    representing the total number of generations elapsed until the implementation of the species in the habitat occurs. The next result establishes the probability distribution of T(x0) and its main moments.

    Theorem 3.2

    (a) P(T(x0)=n)=q(x0)1(δn,0+j=1q(x0)jPn(x0)(j)), nN.

    (b) E[T(x0)]=q(x0)n=1(q(x0)un(0)).

    (c) Var[T(x0)]=q(x0)[(1+q(x0))n=1n2un(0)q(x0)1(n=1(q(x0)un(0)))2],

    where

    q(x0):=(1q(x0))1,  δn,0:=1  if  n=0  or  0  if n0,
    Pn(x0):=P(T(1)(x0)=nT(1)(x0)<),  Pn(x0)(j):=i1++ij=nPi1(x0)...Pij(x0).

    Proof.

    (a) Taking into account (3.2) and (3.4),

    P(T(x0)=n)=j=0P(T(x0)=nN(x0)=j)P(N(x0)=j)=q(x0)1(δn,0+j=1q(x0)jP(jl=1T(l)(x0)=n))=q(x0)1(δn,0+j=1q(x0)jPn(x0)(j) ), nN.

    (b) From (3.3), (3.4), and Theorem 3.1(b),

    E[T(x0)]=E[N(x0)]E[T(1)(x0)T(1)(x0)<]=q(x0)n=1(q(x0)un(0)).

    (c) Using (3.3), (3.4), and Theorem 3.1(b), (c), we can obtain that:

    Var[T(x0)]=E[N(x0)]Var[T(1)(x0)T(1)(x0)<]+Var[N(x0)]E[(T(1)(x0))2T(1)(x0)<].

    Remark 3.1 By simplicity in the underlying probabilistic development, it has been assumed that in the proposed iterative procedure, for each jN+, X(j)0=x0=(F0,M0), that is, each iteration begins with F0 females and M0 males in the habitat. In practice, this assumption is not a problem. It implies that the extinction probability q(x0) is the same for all iterations. Therefore, it is deduced that N(x0) is distributed according to a geometric law with parameter 1q(x0).

    With the aim to check some possible changes in the demographic dynamics of the semelparous species, it is important to determine accurate approximations for the main statistical parameters (offspring means, variances, and covariances) specified in (2.4) and (2.5), respectively. The application of estimation methodologies based on population viability analysis requires having information on various variables related to the biological species under consideration. In practice, the information about such variables is difficult to obtain. In this scenario, estimation based in Bayesian methodology provides a reasonable solution. In fact, by considering a parametric statistical context about the reproductive strategies Ph, hNr, estimates for such statistical parameters have been determined in [19]. More recently, by using approximate Bayesian computation techniques, estimates have also been proposed in [21]. The application of such techniques requires a large number of simulations from the mathematical model. Consequently, it involves a significant computational effort. In this section, we shall consider the more general nonparametric framework about the reproductive strategies Ph, hNr, that is, no functional form is assumed for the probabilities phk,s, (k,s)Sh. By using maximum likelihood and Bayesian estimation methodologies, we will determine approximations for such probabilities and also for μhi, σhij, i,j=1,2.

    Let {Xn}n=0 be the two-sex branching process described in the previous section. Assume, for a given n, the following observed information over time in a set of generations, denoted by Gn, up to the nth generation is reached:

    Sn:={X0=(F0,M0), Zi,(k,s), (k,s)Sφr(Xi), iGn}, (4.1)

    where

    Zi,(k,s):=Lφm(Xi)(Xi)j=1I{(Fφr(Xi)i,j,Mφr(Xi)i,j)=(k,s)}, (4.2)

    with IA denoting the indicator function of A. The variable Zi,(k,s) represents the total number of couples at generation i, which produce exactly k females and s males, (k,s)hNrSh.

    For hNr, let Ghn:={iGn: φr(Xi)=h}, i.e., the set of observed generations of Gn where Ph has been the reproductive strategy. Clearly,

    hNrGhn=Gn,GhnGhn=,  h,hNr,  hh.

    In what follows, it will be assumed that Ghn, hNr. From (4.2), let us introduce the variables:

    Vhn,(k,s):=iGhnZi,(k,s) , hNr.

    Note that Vhn,(k,s) represents the total number of couples in the observed generations that have originated exactly k females and s males, with Ph being the underlying reproductive strategy.

    According to this estimation methodology, taking into account the sample information given in (4.1) and (4.2), we have to determine, for hNr, the values of phk,s,(k,s)Sh that maximize the corresponding likelihood function.

    Theorem 4.1 The maximum likelihood estimates for phk,s, μhi, and σhij, i,j=1,2, hNr, are given by:

    (a) ^phk,s=(Vhn)1Vhn,(k,s), (k,s)Sh.

    (b) ^μhi=(Vhn)1(k1,k2)ShkiVhn,(k1,k2).

    (c) ^σhij=(Vhn(1+Vhn))1[Vhn(k1,k2)ShkikjVhn,(k1,k2)(k1,k2),(l1,l2)ShkiljVhn,(k1,k2)Vhn,(l1,l2)],

    where Vhn:=(k,s)ShVhn,(k,s) is assumed to be positive.

    Proof. Taking into account (4.1), the corresponding likelihood function is given by:

    L(P1,,PnrSn)=nrh=1(k,s)Sh(phk,s)Vhn,(k,s). (4.3)

    From (4.3), the log-likelihood function is then deduced:

    (P1,,PnrSn):=log(L(P1,,PnrSn))=nrh=1(k,s)ShVhn,(k,s)log(phk,s). (4.4)

    In order to determine the maximum likelihood estimates for phk,s, (k,s)Sh, it is necessary to obtain the values that maximize the likelihood function (4.3), or equivalently, the log-likelihood function (4.4), subject to the constraints:

    (k,s)Shphk,s=1,phk,s0,h=1,,nr.

    Consequently, using the Lagrange multipliers technique, it is required to determine the values of phk,s that maximize the function:

    Ψ(P1,,Pnr):=(P1,,PnrSn)+λh(1(k,s)Shphk,s),

    where λh, h=1,,nr denotes the corresponding Lagrange multipliers. Now, it is obtained:

    phk,s[(P1,,PnrSn)+λh(1(k,s)Shphk,s)]=(phk,s)1Vhn,(k,s)λh=0. (4.5)

    From (4.5), it is deduced that λh=Vhn. Hence, the solutions of the equations given in (4.5) are the expressions given in Theorem 4.1(a). It can be checked that such solutions maximize function (4.4). Intuitively, notice that ^phk,s is the proportion of progenitor couples that generate exactly k females and s males when Ph is the underlying reproductive strategy.

    Estimates given in Theorem 4.1(b), (c), are then determined using (2.4), (2.5), and Theorem 4.1(a).

    First, in order to apply the Bayesian estimation methodology, it is necessary to choose a suitable class of prior densities π(P1,,Pnr) for (P1,,Pnr). From the mathematical expression given in (4.3), it is derived that an appropriate class of prior densities for (P1,,Pnr) is the product of Dirichlet densities:

    π(P1,,Pnr)=nrh=1Dh(k,s)Sh(phk,s)τhk,s1, (4.6)

    where for each hNr,

    τh=(τhk,s, (k,s)Sh) is a vector of positive constants,

    Dh=(k,s)ShΓ(τh)(Γ(τhk,s))1, τh=(k,s)Shτhk,s, Γ(u):=0exxu1dx, u>0.

    Theorem 4.2 By considering the squared error loss function, the Bayesian estimates for phk,l, μhi, and σhij, i,j=1,2, hNr, are given by:

    (a) ~phk,s=(Wh)1Whk,s,  (k,s)Sh.

    (b) ~μhi=(Wh)1(k1,k2)ShkiWhk1,k2.

    (c) ~σhij=(Wh(1+Wh))1(Wh(k1,k2)ShkikjWhk1,k2(k1,k2),(l1,l2)ShkiljWhk1,k2Whl1,l2),

    where

    Whk,s:=τhk,s+Vhn,(k,s),Wh:=(k,s)ShWhk,s.

    Proof. Taking into account (4.6), the posterior density of (P1,,Pnr), incorporating the information provided by (4.1), is given by the product of Dirichlet densities:

    π(P1,,PnrSn)=nrh=1Dh(k,s)Sh(phk,s)Whk,s1, (4.7)

    where for each hNr,

    Wh=(Whk,s, (k,s)Sh),Dh=(k,s)ShΓ(Wh)(Γ(Whk,s))1.

    Given (k,l)Sh, from (4.7) we deduce, as marginal posterior density for phk,s, the Beta density:

    π(phk,sSn)=Γ(Wh)Γ(Whk,s)Γ(WhWhk,s)(phk,s)Whk,s1(1phk,s)WhWhk,s1,  phk,s(0,1). (4.8)

    Finally, from (4.8), using the squared error loss function, it is obtained as Bayesian estimate for phk,s:

    ~phk,s:=10phk,lπ(phk,sSn)dphk,s=(Wh)1Whk,s. (4.9)

    From (2.4), (2.5), and (4.9), the expressions given in Theorem 4.2(b), (c), are deduced as Bayesian estimates for μhi and σhij, i,j=1,2, hNr, respectively.

    Remark 4.1 From (4.8), we can determine the highest posterior density (HPD) credibility sets for phk,s, namely, sets of the form:

    J(Q):={phk,s: π(phk,sSn)Q}, Q>0,

    where given a certain credibility coefficient 1α, the constant Q is calculated taking into account that:

    J(Q)π(phk,sSn)dphk,s=1α.

    To determine HPD credibility sets for μhi and for σhij, i,j=1,2, hNr, it is necessary to approximate the corresponding posterior densities for such parameters. An appropriate procedure is based on the simulation, from π(phk,sSn), of a sufficiently large number of values of phk,s. Then, from (2.4) and (2.5), the corresponding values for μhi and for σhij, i,j=1,2, hNr, are determined. From these values, applying a suitable Gaussian kernel method, see e.g., [23], approximated posterior densities for such parameters can be obtained. Using such approximated densities, we can then determine the corresponding empirical HPD credibility sets.

    Remark 4.2 In order to simulate data from model (2.1), to calculate the proposed estimates for the parameters, and to determine the corresponding HPD credibility sets, we have implemented some specific programs using the statistical software R, see [24].

    Labord's chameleon (Furcifer labordi) is a native reptile of southwestern Madagascar, where it usually lives in dry deciduous forests. It is considered the shortest-lived tetrapod animal. It spends most of its life in the developing embryo phase (8 to 9 months) and, after that, it experiences a very rapid growth. It has a short lifespan (4 to 5 months), reaching its sexual maturity at an early age (2 months). There are no rigorous studies in the specialized scientific literature about their social organization or on their mating and reproduction strategies. The few studies that have been carried out are based on monitoring experiments through radio telemetry. From the information recorded, it has been detected that females exhibit high habitat fidelity, moving small cumulative and linear distances with low dispersion rates. Males move greater distances, in a less predictable manner, with higher dispersal rates than females. This species of chameleon constitutes a particular example of semelparous life (progenitors die shortly after reproducing). Their mating and reproductive strategies, highly conditioned by the number of females and males in the habitat, must adapt to these temporal limitations existing in intense competition and fighting between males. It has been suggested that they possess a wide range of different mating systems, generally polygamous matings. Males can mate with more than one female and females can mate with different males during the same ovarian cycle (the dynamics of male color change could also affect the choice of partner). The female lays a clutch of eggs and the progenitor male and female die. Some studies reported that females can lay between 6 and 8 eggs. Due to various random factors, mainly predators and environmental factors, a high percentage of eggs will not hatch. See [25,26,27,28] for more information about this species of chameleon.

    Unfortunately, there is no real data available on the demographic dynamics of this reptile species. Next, taking into account the special characteristics of this species of chameleon, a simulated example is presented where mating and reproduction strategies close to reality are assumed. Let us consider a population of chamaleon of Labord with the following population dynamics:

    1) Females and males form couples according to the nm=2 mating strategies:

    L1(F,M)=min{F,M},  L2(F,M)=Fmin{1,M},  F,MN.

    According to L1, the females and males practice fidelity, and they are allowed to have at most one mate. According to L2, in each generation, a dominant male mates with each female. The other males do not participate in the mating process.

    2) Since the number of eggs laid by a female is between 6 and 8, and it is known that a high percentage of eggs do not hatch, it will be assumed in our simulation the nr=2 reproductive strategies Ph={phk,s}(k,s)Sh, h=1,2, where

    S1=S2={(0,0),(3,3),(4,3),(3,4),(4,4),(5,3),(3,5)},
    p10,0=0.71, p13,3=0.08, p14,3=0.06, p13,4=0.05, p14,4=0.03, p15,3=0.03, p13,5=0.04,
    p20,0=0.71, p23,3=0.08, p24,3=0.05, p23,4=0.06, p24,4=0.03, p25,3=0.04, p23,5=0.03.

    From P1 and P2,

    μ11=1.02, μ12=1.03, σ111=2.679, σ122=2.749, σ112=2.519,
    μ21=1.03, μ22=1.02, σ211=2.749, σ222=2.679, σ212=2.519.

    P1 slightly favors the birth of males, with a ratio of males/females of the means equal to 1.01. This ratio has a value of 0.99 for P2, which slightly favors the birth of females. From such reproductive strategies, it is deduced that:

    max{P(Fh0,1=0),P(Mh0,1=0)}=0.71, h=1,2.

    Hence, condition (2.2) holds.

    3) In each generation, it is assumed that the functions φm and φr are given by:

    φm(F,M):=1I{M(F+M)1>K}+2I{M(F+M)1K},  φr(F,M):=1I{FM}+2I{F>M}, F,MN.

    K<1 is a suitable threshold for the proportion of males in the population.

    Let {Xn}n=0 be the two-sex branching process described in (2.1) with the mating and reproduction strategies mentioned previously. By way of illustration, taking X0=(10,10) and K=0.8, we have developed a simulation for the first n=100 generations of such a chameleon Labord population, i.e., in this case, G100={1,2,,100}, see Figure 1.

    Figure 1.  The evolution of the number of females (red color) and males (black color) in the successive generations belonging to G100.

    According to the obtained simulation, it is derived that P2 has been the reproductive strategy in 19 generations and P1 has been the reproductive strategy in 81 generations. In fact,

    G2100={2,3,5,6,8,9,10,12,17,18,20,24,26,33,34,36,52,54,56},G1100=G100G2100.

    Taking into account the special reproductive characteristics of this biological species, with a high probability of unsuccessful hatching, we have considered the prior density given in (4.7), where

    τ1=τ2=(29, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5).

    Using the information provided in the simulation, from Theorems 4.1 and 4.2, we have determined the corresponding maximum likelihood and Bayesian estimates for phk,l, μhi, and σhij, i,j,h=1,2, see Tables 1 and 2.

    Table 1.  The true values of p1k,l, (k,l)S1, μ1i, σ1ij, i,j=1,2, and their maximum likelihood and Bayesian estimates.
    p10,0 p13,3 p14,3 p13,4 p14,4 p15,3 p13,5
    True values 0.71 0.08 0.06 0.05 0.03 0.03 0.04
    Maximum likelihood estimates 0.712 0.079 0.061 0.049 0.028 0.029 0.041
    Bayesian estimates 0.712 0.079 0.061 0.049 0.028 0.029 0.041
    μ11 μ12 σ111 σ122 σ112
    True values 1.02 1.03 2.679 2.749 2.519
    Maximum likelihood estimates 1.012 1.023 2.664 2.738 2.505
    Bayesian estimates 1.013 1.024 2.666 2.740 2.507

     | Show Table
    DownLoad: CSV
    Table 2.  The true values of p2k,l, (k,l)S2, μ2i, σ2ij, i,j=1,2, and their maximum likelihood and Bayesian estimates.
    p20,0 p23,3 p24,3 p23,4 p24,4 p25,3 p23,5
    True values 0.71 0.08 0.05 0.06 0.03 0.04 0.03
    Maximum likelihood estimates 0.706 0.072 0.051 0.068 0.028 0.049 0.026
    Bayesian estimates 0.700 0.072 0.052 0.068 0.030 0.049 0.028
    μ21 μ22 σ211 σ222 σ212
    True values 1.03 1.02 2.749 2.679 2.519
    Maximum likelihood estimates 1.038 1.030 2.857 2.670 2.556
    Bayesian estimates 1.081 1.054 2.898 2.722 2.595

     | Show Table
    DownLoad: CSV

    From Tables 1 and 2,

    maxh=1,2{max(k,l)Sh|phk,l^phk,l|}=0.009,maxh=1,2{maxi=1,2|μhi^μhi|}=0.009,maxh=1,2{maxi,j=1,2|σhij^σhij|}=0.108,
    maxh=1,2{max(k,l)Sh|phk,l~phk,l|}=0.008,maxh=1,2{maxi=1,2|μhi~μhi|}=0.051,maxh=1,2{maxi,j=1,2|σhij~σhij|}=0.149,

    which shows good accuracy of the estimators.

    As an illustration, Figures 2 and 3 show the evolution of the estimates for ph0,0 and ph3,5, in the successive generations belonging to Gh100, h=1,2, respectively. Similarly, Figures 4 and 5 show the evolution of the estimates for μ1i and μ2i, i=1,2, in the successive generations belonging to G1100 and G2100, respectively. The 95% HPD credibility sets are also included in all figures (the true values for the parameters are represented by horizontal lines). For a better visualization of the graphs, the generations belonging to G1100 and G2100 have been renumbered, from 1 to 81 and from 1 to 19, respectively.

    Figure 2.  On the left is the evolution of ^p10,0 (black color) and ~p10,0 (red color) and on the right is the evolution of ^p13,5 (black color) and ~p13,5 (red color) in the successive generations belonging to G1100.
    Figure 3.  On the left is the evolution of ^p20,0 (black color) and ~p20,0 (red color) and on the right is the evolution of ^p23,5 (black color) and ~p23,5 (red color) in the successive generations belonging to G2100.
    Figure 4.  On the left is the evolution of ^μ11 (black color) and ~μ11 (red color) and on the right is the evolution of ^μ12 (black color) and ~μ12 (red color) in the successive generations belonging to G1100.
    Figure 5.  On the left is the evolution of ^μ21 (black color) and ~μ21 (red color) and on the right is the evolution of ^μ22 (black color) and ~μ22 (red color) in the successive generations belonging to G2100.

    In this research, we have continued the working line started in previous papers about mathematical modeling concerning the demographic dynamics of semelparous biological species through two-sex branching processes with multiple mating and reproduction strategies. It has been assumed that, in each generation, both phases (mating and reproduction) are influenced by the current number of females and males existing in the population (habitat). Several probabilistic and statistical contributions have been established. In fact, the probability distribution associated to the variable number of generations elapsed before the possible extinction of the population has been derived. Also, the aforementioned two-sex branching processes have been used to mathematically describe the phenomenon of populating or repopulating habitats with a semelparous species that has become extinct, or is in danger of extinction. By considering the most general non-parametric statistical setting on the reproductive strategies associated with the biological species, various inferential questions about the main parameters governing the reproduction phase have been determined. To this purpose, information concerning the reproduction of the couples has been included in the observed data sample. Using this information, estimates for the reproductive parameters have been proposed. To this end, maximum likelihood and Bayesian estimates, and the corresponding 95% HPD credibility sets, have been determined. To simulate data from the mathematical model and also to calculate the estimates and the HPD credibility sets, the necessary computing programs have been developed. By way of illustration, the proposed estimation methodologies have been applied through a simulated example with Labord's chamaleon populations. The simulation performed has showed the accuracy of the proposed estimates.

    Some possible direcions for research are, for example, to extend this class of two-sex branching processes, including in the probability model the immigration of females, males, or couples, from external populations; considering mating and/or reproduction in random environments; or assuming multi-type populations. It is also necessary to explore other possible mathematical methodologies that allow modeling the evolution of semelparous biological species with sexual reproduction. In this regard, the survey in [29] provides interesting information to be considered.

    The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

    The authors wish to express their gratitude to the referees. Their technical questions, suggestions, and corrections have improved the presentation and understanding of this paper. This research has been supported by the Ministerio de Ciencia e Innovación of Spain (grant PID2019-108211GB-I00/AEI/10.13039/501100011033).

    The authors declare there is no conflict of interest.



    [1] V. Charoensawan, D. Wilson, S. A. Teichmann, Genomic repertoires of DNA-binding transcription factors across the tree of life, Nucleic Acids Res., 38 (2010), 7364–7377. https://doi.org/10.1093/nar/gkq617 doi: 10.1093/nar/gkq617
    [2] J. Si, R. Zhao, R. Wu, An overview of the prediction of protein DNA-binding sites, Int. J. Mol. Sci., 16 (2015), 5194–5215. https://doi.org/10.3390/ijms16035194 doi: 10.3390/ijms16035194
    [3] K. A. Aeling, N. R. Steffen, M. Johnson, G. W. Hatfield, R. H. Lathrop, D. F. Senear, DNA deformation energy as an indirect recognition mechanism in protein-DNA interactions, IEEE/ACM Trans. Comput. Biol. Bioinf., 4 (2007), 117–125. https://doi.org/10.1109/TCBB.2007.1000 doi: 10.1109/TCBB.2007.1000
    [4] M. Ljungman, Activation of DNA damage signaling, Mutat. Res. Fundam. Mol. Mech. Mutagen., 577 (2005), 203–216. https://doi.org/10.1016/j.mrfmmm.2005.02.014 doi: 10.1016/j.mrfmmm.2005.02.014
    [5] G. Zhu, S. Cansiz, M. You, L. Qiu, D. Han, L. Zhang, et al., Nuclease-resistant synthetic drug-DNA adducts: Programmable drug-DNA conjugation for targeted anticancer drug delivery, NPG Asia Mater., 7 (2015). https://doi.org/10.1038/am.2015.19 doi: 10.1038/am.2015.19
    [6] S. Peled, O. Leiderman, R. Charar, G. Efroni, Y. Shav-Tal, Y. Ofran, De-novo protein function prediction using DNA binding and RNA binding proteins as a test case, Nat. Commun., 7 (2016), 13424. https://doi.org/10.1038/ncomms13424 doi: 10.1038/ncomms13424
    [7] C. J. Jeffery, Current successes and remaining challenges in protein function prediction, Front. Bioinf., 3 (2023). https://doi.org/10.3389/fbinf.2023.1222182 doi: 10.3389/fbinf.2023.1222182
    [8] C. P. Ponting, J. Schultz, F. Milpetz, P. Bork, SMART: Identification and annotation of domains from signalling and extracellular protein sequences, Nucleic Acids Res., 27 (1999), 229–232. https://doi.org/10.1093/nar/27.1.229 doi: 10.1093/nar/27.1.229
    [9] N. M. Luscombe, R. A. Laskowski, J. M. Thornton, Amino acid–base interactions: A three-dimensional analysis of protein–DNA interactions at an atomic level, Nucleic Acids Res., 29 (2001), 2860–2874. https://doi.org/10.1093/nar/29.13.2860 doi: 10.1093/nar/29.13.2860
    [10] Y. Mandel-Gutfreund, H. Margalit, Quantitative parameters for amino acid-base interaction: Implications for prediction of protein-DNA binding sites, Nucleic Acids Res., 26 (1998), 2306–2312. https://doi.org/10.1093/nar/26.10.2306 doi: 10.1093/nar/26.10.2306
    [11] Y. H. Zhu, J. Hu, X. N. Song, D. Yu, DNAPred: Accurate identification of DNA-binding sites from protein sequence by ensembled hyperplane-distance-based support vector machines, J. Chem. Inf. Model., 59 (2019), 3057–3071. https://doi.org/10.1021/acs.jcim.8b00749 doi: 10.1021/acs.jcim.8b00749
    [12] X. Ma, J. Guo, H. D. Liu, J. Xie, X. Sun, Sequence-based prediction of DNA-binding residues in proteins with conservation and correlation information, IEEE/ACM Trans. Comput. Biol. Bioinf., 9 (2012), 1766–1775. https://doi.org/10.1109/TCBB.2012.106 doi: 10.1109/TCBB.2012.106
    [13] J. Yan, L. Kurgan, DRNApred, fast sequence-based method that accurately predicts and discriminates DNA-and RNA-binding residues, Nucleic Acids Res., 45 (2017). https://doi.org/10.1093/nar/gkx059 doi: 10.1093/nar/gkx059
    [14] L. Wang, M. Q. Yang, J. Y. Yang, Prediction of DNA-binding residues from protein sequence information using random forests, BMC Genomics, 10 (2009). https://doi.org/10.1186/1471-2164-10-S1-S1 doi: 10.1186/1471-2164-10-S1-S1
    [15] H. A. Maghawry, M. G. M. Mostafa, T. F. Gharib, A new protein structure representation for efficient protein function prediction, J. Comput. Biol., 21 (2014), 936–946. https://doi.org/10.1089/cmb.2014.0137 doi: 10.1089/cmb.2014.0137
    [16] Y. Xia, C. Q. Xia, X. Pan, H. Shen, GraphBind: Protein structural context embedded rules learned by hierarchical graph neural networks for recognizing nucleic-acid-binding residues, Nucleic Acids Res., 49 (2021). https://doi.org/10.1093/nar/gkab044 doi: 10.1093/nar/gkab044
    [17] H. Zhou, D. Ren, H. Xia, M. Fan, X. Yang, H. Huang, Ast-gnn: An attention-based spatio-temporal graph neural network for interaction-aware pedestrian trajectory prediction, Neurocomputing, 445 (2021), 298–308. https://doi.org/10.1016/j.neucom.2021.03.024 doi: 10.1016/j.neucom.2021.03.024
    [18] R. Liu, J. Hu, DNABind: A hybrid algorithm for structure‐based prediction of DNA‐binding residues by combining machine learning‐and template‐based approaches, Proteins Struct. Funct. Bioinf., 81 (2013), 1885–1899. https://doi.org/10.1002/prot.24330 doi: 10.1002/prot.24330
    [19] S. Jones, H. P. Shanahan, H. M. Berman, J. M. Thornton, Using electrostatic potentials to predict DNA‐binding sites on DNA‐binding proteins, Nucleic Acids Res., 31 (2003), 7189–7198. https://doi.org/10.1093/nar/gkg922 doi: 10.1093/nar/gkg922
    [20] Y. Tsuchiya, K. Kinoshita, H. Nakamura, Structure‐based prediction of DNA‐binding sites on proteins using the empirical preference of electrostatic potential and the shape of molecular surfaces, Proteins Struct. Funct. Bioinf., 55 (2004), 885–894. https://doi.org/10.1002/prot.20111 doi: 10.1002/prot.20111
    [21] T. Wang, J. Sun, Q. Zhao, Investigating cardiotoxicity related with hERG channel blockers using molecular fingerprints and graph attention mechanism, Comput. Biol. Med., 153 (2023), 106464. https://doi.org/10.1016/j.compbiomed.2022.106464 doi: 10.1016/j.compbiomed.2022.106464
    [22] Z. Chen, L. Zhang, J. Sun, R. Meng, S. Yin, Q. Zhao, DCAMCP: A deep learning model based on capsule network and attention mechanism for molecular carcinogenicity prediction, J. Cell. Mol. Med., 27 (2023), 3117–3126. https://doi.org/10.1111/jcmm.17889 doi: 10.1111/jcmm.17889
    [23] R. Meng, S. Yin, J. Sun, H. Hu, Q. Zhao, scAAGA: Single cell data analysis framework using asymmetric autoencoder with gene attention, Comput. Biol. Med., 165 (2023), 107414. https://doi.org/10.1016/j.compbiomed.2023.107414 doi: 10.1016/j.compbiomed.2023.107414
    [24] J. Hu, Y. Li, M. Zhang, X. Yang, H. Shen, D. Yu, Predicting protein-DNA binding residues by weightedly combining sequence-based features and boosting multiple SVMs, IEEE/ACM Trans. Comput. Biol. Bioinf., 14 (2017), 1389–1398. https://doi.org/10.1109/TCBB.2016.2616469 doi: 10.1109/TCBB.2016.2616469
    [25] W. Li, A. Godzik, Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences, Bioinformatics, 22 (2006), 1658–1659. https://doi.org/10.1093/bioinformatics/btl158 doi: 10.1093/bioinformatics/btl158
    [26] J. cheol Jeong, X. Lin, X. W. Chen, On position-specific scoring matrix for protein function prediction, IEEE/ACM Trans. Comput. Biol. Bioinf., 8 (2010), 308–315. https://doi.org/10.1109/TCBB.2010.93 doi: 10.1109/TCBB.2010.93
    [27] J. Zahiri, O. Yaghoubi, M. Mohammad-Noori, R. Ebrahimpour, A. Masoudi-Nejad, PPIevo: Protein–protein interaction prediction from PSSM based evolutionary information, Genomics, 102 (2013), 237–242. https://doi.org/10.1016/j.ygeno.2013.05.006 doi: 10.1016/j.ygeno.2013.05.006
    [28] S. F. Altschul, T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang, W. Miller, et al., Gapped BLAST and PSI-BLAST: A new generation of protein database search programs, Nucleic Acids Res., 25 (1997), 3389–3402. https://doi.org/10.1093/nar/25.17.3389 doi: 10.1093/nar/25.17.3389
    [29] The UniProt Consortium, UniProt: A worldwide hub of protein knowledge, Nucleic Acids Res., 47 (2019), D506–D515. https://doi.org/10.1093/nar/gky1049 doi: 10.1093/nar/gky1049
    [30] L. J. McGuffin, K. Bryson, D. T. Jones, The PSIPRED protein structure prediction server, Bioinformatics, 16 (2000), 404–405. https://doi.org/10.1093/bioinformatics/16.4.404 doi: 10.1093/bioinformatics/16.4.404
    [31] L. Wang, S. J. Brown, BindN: A web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences, Nucleic Acids Res., 34 (2006), W243–W248. https://doi.org/10.1093/nar/gkl298 doi: 10.1093/nar/gkl298
    [32] W. Y. Chu, Y. F. Huang, C. C. Huang, Y. Cheng, C. Huang, Y. Oyang, ProteDNA: A sequence-based predictor of sequence-specific DNA-binding residues in transcription factors, Nucleic Acids Res., 37 (2009), W396–W401. https://doi.org/10.1093/nar/gkp449 doi: 10.1093/nar/gkp449
    [33] S. Hwang, Z. Gou, I. B. Kuznetsov, DP-Bind: A web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins, Bioinformatics, 23 (2007), 634–636. https://doi.org/10.1093/bioinformatics/btl672 doi: 10.1093/bioinformatics/btl672
    [34] L. Wang, C. Huang, M. Q. Yang, J. Y. Yang, BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features, BMC Syst. Biol., 4 (2010). https://doi.org/10.1186/1752-0509-4-S1-S3 doi: 10.1186/1752-0509-4-S1-S3
    [35] J. Si, Z. Zhang, B. Lin, M. Schroeder, B. Huang, MetaDBSite: A meta approach to improve protein DNA-binding sites prediction, BMC Syst. Biol., 5 (2011). https://doi.org/https://doi.org/10.1186/1752-0509-5-S1-S7 doi: 10.1186/1752-0509-5-S1-S7
    [36] J. Li, H. Tian, J. Yang, Z. Gong, Long noncoding RNAs regulate cell growth, proliferation, and apoptosis, DNA Cell Biol., 35 (2016), 459–470. https://doi.org/10.1089/dna.2015.3187 doi: 10.1089/dna.2015.3187
    [37] M. D. Paraskevopoulou, A. G. Hatzigeorgiou, Analyzing miRNA–lncRNA interactions, in Long Non-coding RNAs: Methods and Protocols, Humana press, (2016), 271–286. https://doi.org/10.1007/978-1-4939-3378-5_21
    [38] J. C. R. Fernandes, S. M. Acuña, J. I. Aoki, L. M. Floeter-Winter, S. M. Muxel, Long non-coding RNAs in the regulation of gene expression: Physiology and disease, Non-coding RNA, 5 (2019), 17. https://doi.org/10.3390/ncrna5010017 doi: 10.3390/ncrna5010017
    [39] X. Li, C. Q. Zhong, R. Wu, X. Xu, Z. Yang, S. Cai, et al., RIP1-dependent linear and nonlinear recruitments of caspase-8 and RIP3 respectively to necrosome specify distinct cell death outcomes, Protein Cell, 12 (2021), 858–876. https://doi.org/10.1007/s13238-020-00810-x doi: 10.1007/s13238-020-00810-x
    [40] W. Wang, L. Zhang, J. Sun, Q. Zhao, J. Shuai, Predicting the potential human lncRNA–miRNA interactions based on graph convolution network with conditional random field, Briefings Bioinf., 23 (2022), bbac463. https://doi.org/10.1093/bib/bbac463 doi: 10.1093/bib/bbac463
    [41] L. Zhang, P. Yang, H. Feng, Q. Zhao, H. Liu, Using network distance analysis to predict lncRNA–miRNA interactions, Interdiscip. Sci.: Comput. Life Sci., 13 (2021), 535–545. https://doi.org/10.1007/s12539-021-00458-z doi: 10.1007/s12539-021-00458-z
  • This article has been cited by:

    1. Fangzhou Huang, Jiao Song, Nick J. Taylor, The impact of business conditions and commodity market on US stock returns: An asset pricing modelling experiment, 2022, 6, 2573-0134, 433, 10.3934/QFE.2022019
  • Reader Comments
  • © 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(2028) PDF downloads(68) Cited by(0)

Figures and Tables

Figures(3)  /  Tables(4)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog