Processing math: 71%
Research article

Optimal modeling of anti-breast cancer candidate drugs screening based on multi-model ensemble learning with imbalanced data

  • The imbalanced data makes the machine learning model seriously biased, which leads to false positive in screening of therapeutic drugs for breast cancer. In order to deal with this problem, a multi-model ensemble framework based on tree-model, linear model and deep-learning model is proposed. Based on the methodology constructed in this study, we screened the 20 most critical molecular descriptors from 729 molecular descriptors of 1974 anti-breast cancer drug candidates and, in order to measure the pharmacokinetic properties and safety of the drug candidates, the screened molecular descriptors were used in this study for subsequent bioactivity, absorption, distribution metabolism, excretion, toxicity, and other prediction tasks. The results show that the method constructed in this study is superior and more stable than the individual models used in the ensemble approach.

    Citation: Juan Zhou, Xiong Li, Yuanting Ma, Zejiu Wu, Ziruo Xie, Yuqi Zhang, Yiming Wei. Optimal modeling of anti-breast cancer candidate drugs screening based on multi-model ensemble learning with imbalanced data[J]. Mathematical Biosciences and Engineering, 2023, 20(3): 5117-5134. doi: 10.3934/mbe.2023237

    Related Papers:

    [1] Junjing Xiong, Xiong Li, Hao Wang . The survival analysis of a stochastic Lotka-Volterra competition model with a coexistence equilibrium. Mathematical Biosciences and Engineering, 2019, 16(4): 2717-2737. doi: 10.3934/mbe.2019135
    [2] Mengqing Zhang, Qimin Zhang, Jing Tian, Xining Li . The asymptotic stability of numerical analysis for stochastic age-dependent cooperative Lotka-Volterra system. Mathematical Biosciences and Engineering, 2021, 18(2): 1425-1449. doi: 10.3934/mbe.2021074
    [3] Yanyan Du, Ting Kang, Qimin Zhang . Asymptotic behavior of a stochastic delayed avian influenza model with saturated incidence rate. Mathematical Biosciences and Engineering, 2020, 17(5): 5341-5368. doi: 10.3934/mbe.2020289
    [4] Qingsheng Zhu, Changwen Xie, Jia-Bao Liu . The impact of population agglomeration on ecological resilience: Evidence from China. Mathematical Biosciences and Engineering, 2023, 20(9): 15898-15917. doi: 10.3934/mbe.2023708
    [5] Sha He, Sanyi Tang, Libin Rong . A discrete stochastic model of the COVID-19 outbreak: Forecast and control. Mathematical Biosciences and Engineering, 2020, 17(4): 2792-2804. doi: 10.3934/mbe.2020153
    [6] Tingting Ding, Tongqian Zhang . Asymptotic behavior of the solutions for a stochastic SIRS model with information intervention. Mathematical Biosciences and Engineering, 2022, 19(7): 6940-6961. doi: 10.3934/mbe.2022327
    [7] Lanjun Liu, Han Wu, Junwu Wang, Tingyou Yang . Research on the evaluation of the resilience of subway station projects to waterlogging disasters based on the projection pursuit model. Mathematical Biosciences and Engineering, 2020, 17(6): 7302-7331. doi: 10.3934/mbe.2020374
    [8] Abhishek Savaliya, Rutvij H. Jhaveri, Qin Xin, Saad Alqithami, Sagar Ramani, Tariq Ahamed Ahanger . Securing industrial communication with software-defined networking. Mathematical Biosciences and Engineering, 2021, 18(6): 8298-8313. doi: 10.3934/mbe.2021411
    [9] Jiongxuan Zheng, Joseph D. Skufca, Erik M. Bollt . Heart rate variability as determinism with jump stochastic parameters. Mathematical Biosciences and Engineering, 2013, 10(4): 1253-1264. doi: 10.3934/mbe.2013.10.1253
    [10] Helong Liu, Xinyu Song . Stationary distribution and extinction of a stochastic HIV/AIDS model with nonlinear incidence rate. Mathematical Biosciences and Engineering, 2024, 21(1): 1650-1671. doi: 10.3934/mbe.2024072
  • The imbalanced data makes the machine learning model seriously biased, which leads to false positive in screening of therapeutic drugs for breast cancer. In order to deal with this problem, a multi-model ensemble framework based on tree-model, linear model and deep-learning model is proposed. Based on the methodology constructed in this study, we screened the 20 most critical molecular descriptors from 729 molecular descriptors of 1974 anti-breast cancer drug candidates and, in order to measure the pharmacokinetic properties and safety of the drug candidates, the screened molecular descriptors were used in this study for subsequent bioactivity, absorption, distribution metabolism, excretion, toxicity, and other prediction tasks. The results show that the method constructed in this study is superior and more stable than the individual models used in the ensemble approach.



    Microbiome dynamics are inherently stochastic. They are not just the assemblage of living microorganisms present in a defined environment, but also encompass the entire spectrum of molecules, structural elements, metabolites, and mobile genetic elements, produced by these microorganisms, known as their theater of activity [1]. Consequently, interpreting the causal link between microbiomes and human health, a critical aspect of microbiome research [2,3], remains challenging.

    Environmental fluctuations can induce variability in interactions within a microbiome and between microbes and their environment or host. For instance, some microbial species may exhibit varying levels of sensitivity to temperature, impacting their interactions with other microbes or with the environment/host depending on the ambient temperatures. This variability introduces errors in the estimation of microbial interactions, which, in turn, affect our ability to understand the drivers of dynamics. It becomes imperative to translate microbial interactions, which govern the microbiome's population dynamics, into a high-level property of a microbiome, such as its resilience.

    Theories from dynamical systems, adapted from macroecology, offer insight into understanding the dynamics of the microbiome in different environments [4]. The resilience of a microbiome, i.e., its ability to maintain and recover function in the face of perturbations, such as those caused by antibiotics or opportunistic pathogens, can be studied by employing the concept of stability in dynamic systems. Microbiomes that more readily return to stationarity (i.e., when the dynamics are captured by white noise about an equilibrium) when faced with perturbations are considered to be more resilient. Figure 1 illustrates the concept of the resilience of the microbiome. The impact of a perturbation, classified as a "pulse" or a "press," on a microbial community is determined by the duration and magnitude of the perturbation [5]. Following a perturbation, a microbial community in a healthy state (eubiosis) may either transition to an unstable and transient state or recover to its initial state. In some instances, the perturbation might drive the community towards the formation of an alternative healthy state or an unhealthy stable state (dysbiosis) associated with disease. Recent empirical evidence supports these theoretical concepts. Studies of gut microbiota dynamics have revealed intrinsic stability patterns and catastrophic shifts following antibiotic perturbations [6,7]. Moreover, mechanistic research has elucidated how commensals regulate host immunity, such as the suppression of retinoic acid synthesis to control IL-2 activity and prevent dysbiosis [8]. Beyond the gut, investigations across different body sites demonstrate community-specific resilience patterns, from vaginal microbiome's stability [9] to oral ecosystems [10]. Building on these findings, recent comprehensive reviews have established resilience as the microbiota's universal capacity to resist and recover from perturbations [11,12], with longitudinal studies revealing personalized resilience features, such as the individual-specific stability observed during the Mars500 mission [13]. Therefore, understanding how a microbiome responds to perturbations is crucial to preventing disease states and promoting health.

    Figure 1.  Conceptual illustration of the resilience of a microbiome.

    The concept of resilience has been intensively studied in macroecological systems. In fact, the ecological term "resilience", was originally coined by Holling 1973 [14]. In this seminal article, the author defined resilience as a "measure of the persistence of ecosystems and their ability to absorb change and disturbance and still maintain the same relationships between populations or state variables." This concept was subjected to a comprehensive examination by Pimm and Lawton 1977 [15], DeAngelis 1980 [16], and Cottingham and Carpenter 1994 [17]. The index most frequently used to calculate resilience in ecosystems is based on the eigenvalues of an ordinary differential equation system near its equilibrium. Resilience is computed as the negative reciprocal of the maximum eigenvalue, providing a measure of the time it takes the ecosystem to recover from a perturbation. Subsequently, Neubert and Caswell 1997 [18] introduced four quantities (asymptotic resilience, reactivity, and two quantities of the amplification envelope) to measure the long- and short-term responses of an ecosystem to pulse perturbations. More recently, Arnoldi et al., [19,20] attempted to unify all previous resilience measures for a community, making them more practical by extending the analysis of ecosystem responses to perturbations on all time scales.

    All the previously mentioned definitions of resilience are based on the theory of ordinary differential equations, in which an equilibrium state is assumed to be a fixed point of a deterministic dynamic system. However, due to continuously fluctuating environments, equilibrium states may not be fixed points but fluctuating stationary distributions. To address this issue, Ives 1995 [21] proposed a definition of stochastic resilience based on a discrete-time linear stochastic model. This article used the process error, modeled as a sequence of normal random variables with a mean of 0 and a constant variance, to account for the variability in interactions between species. Subsequently, Ives et al., 2003 [22] extended this definition to three additional stability measures using a first-order discrete-time multivariate autoregressive model (MAR). Buckwar and Kelly 2014 [23] redefined the notions of asymptotic resilience, reactivity, and the amplification envelope, proposed by Neubert and Caswell 1997 [18], in terms of the second moment of a linear Ito stochastic differential equation system, with the goal of investigating the effect of persistent stochastic perturbations on a deterministic system.

    Over the past decade, the conceptualization of resilience within microbiome research has evolved significantly through the integration of stochastic modeling approaches. Zaoli and Grilli 2021 [26] analyzed longitudinal human gut microbiome data, revealing that while most operational taxonomic units (OTUs) exhibit stable dynamics with fluctuations around a constant average, a minority display nonstationary behavior with abrupt transitions in abundance, potentially indicating shifts between alternative stable states. This study underscored the importance of considering stochastic fluctuations and the potential for multiple stable states in microbial communities.

    Building on this foundation, Wolff et al., 2023 [27] investigated the daily dynamics of intraspecific genetic variation in the human gut microbiome. They discovered that approximately 80% of the strains analyzed exhibited abundance fluctuations that could be predicted using a stochastic logistic model, suggesting that the stability and macroecological properties of the human gut microbiome emerge at the strain level. Hayashi et al., 2024 [24] conducted time-series analyses of experimental microbiomes, demonstrating that both deterministic and stochastic ecological processes drive the divergence of alternative states. This work highlighted the role of stochasticity in generating multiple stable states within microbial communities. Concurrently, Ponciano et al., 2024 [25] emphasized the necessity of integrating stochastic models to predict a population's persistence probabilities within the vaginal microbiome, advocating for a shift from static snapshots to dynamic modeling approaches. Collectively, these studies illustrate a progressive shift towards embracing stochastic modeling to more accurately capture the resilience and dynamic behavior of microbiome communities.

    Although well studied, existing resilience measures are inadequate for microbiome studies because stochasticity, known to affect the stability of a dynamical system, does not influence these measures. Therefore, there is a pressing need to incorporate stochasticity, such as environmental noise and uncertainty, into the mechanistic models used to identify causal mechanisms using time-series microbiome data. In nature, the environment often fluctuates randomly and without memory, making white noise an excellent candidate to account for environmentally driven stochastic effects. Given the apparent need for such a model, we propose a stochastic generalized Lotka–Volterra (SgLV) model to describe the temporal dynamics of a community of microbial species. Stochasticity is induced in the model by adding white noise to the model parameters. Then, based on the linearization of our proposed SgLV model, we develop resilience measures to quantify resilience that can be applied to time-series microbiome data.

    In our modeling framework, we represent environmental stochasticity using white noise. Specifically, we modify a constant parameter, such as \(\alpha\), to include stochastic fluctuations by redefining it as \(\bar{\alpha} = \alpha + \tau \frac{dW}{dt}\), where \(W(t)\) is standard Brownian motion, and \(\tau\) represents the intensity of noise. Consequently, \(\bar{\alpha}\) transforms from a fixed parameter into a stochastic process. Within a sufficiently small time interval \([0, t]\), we approximate the term \(\frac{dW}{dt}\) by \(\frac{W(t)}{t}\), which follows a normal distribution \(N(0, \frac{1}{t})\). Thus, each time we simulate the model, the value of ˉα will be chosen from the normal distribution N(α,α+τ2/t). This way of incorporating noise is more realistic than that in Ives et al., 2003 [22], since the variance of the noise is not constant but changing over time.

    The foundational work by Ives et al., 2003 [22] established critical methodologies for analyzing ecological community stability using discrete-time multivariate autoregressive (MAR) models, providing robust methods for estimating stability and ecological interactions from empirical time-series data. Their pioneering approach revealed important relationships between species interactions and environmental variability, significantly advancing the quantitative study of ecological dynamics. In our manuscript, we extend this foundational framework from discrete time to continuous time by developing stochastic generalized Lotka–Volterra models, which are particularly suited to capturing microbiome dynamics characterized by continuous growth and interaction processes. Moreover, we propose novel resilience metrics specifically tailored to microbiome community dynamics, including measures capable of quantifying recovery following perturbations and assessing how stochasticity influences the pace of community restoration. Thus, while conceptually rooted in the seminal work of Ives et al., 2003 [22], our model provides significant advancements optimized for addressing the unique complexities inherent in microbiome research.

    The remainder of the paper is organized into three sections. First, we develop a general theoretical framework on how to calculate four resilience measures for the proposed SgLV model. Second, we demonstrate the practicality of our framework by applying it to a stochastic two-species competitive model to illustrate the effects of environmental noise on the resilience of the stochastic model. Lastly, we have a brief discussion on the possibilities of applying our framework to real microbiome data.

    In this section, we develop a general theoretical framework to calculate four resilience measures that quantify the recovery of a microbial community from pulse perturbations (single shocks or perturbations of relatively short duration). In contrast to previous works that used ordinary differential equation (ODE) systems [18,19,20] and/or stochastic discrete-time systems [21,22], we use a stochastic differential equation (SDE) system to describe the temporal dynamics of species within a microbiome community and then study its transient dynamics under different pulse perturbations. We utilize white noise to represent random environmental fluctuations without memory to capture the fluctuations observed in microbiomic time-series data. We consider them to be the effects of persistent environmental perturbations. Mathematically, we add white noise, described as the "derivative of Brownian Motion," to a deterministic gLV model. We then obtain a non-linear stochastic gLV model. Next, we linearize this nonlinear SDE system at its positive stationary distribution to get a linear SDE system. On the basis of two works by Arnoldi et al., [19,20], we indirectly establish four resilience measures, which are the instantaneous return rate (Rinst), average return rate (Ravet), asymptotic resilience (R), and the convergence rate (Rc), of our stochastic gLV model, through the second moment of solutions of the corresponding linearized SDE system. These four resilience quantities, in which the first two are functions of time and the last two are just numbers, measure how strong the recovery dynamics of our original stochastic system is after a pulse perturbation is applied. Figure 2 shows the scheme of the theoretical method used to calculate the resilience of a stochastic generalized Lotka–Volterra system of a microbial community of N species. The details of this method will be presented in the following four subsections.

    Figure 2.  Four-step scheme of the method of calculating the resilience of a SgLV.

    The widely used mechanistic model for microbiome studies is the deterministic generalized Lotka–Volterra model, which is given by

    dXidt=Xi[αi+Nj=1βijXj],i=1,,N, (2.1)

    in which Xi is the abundance of species i in a microbiome community. The parameter αi represents the intrinsic growth rate of species i. The interaction between species i and species j (ji) is captured by the parameter βij. When i=j, the parameter βij takes the form βii=αici<0 where ci is the coefficient of negative intraspecific interaction representing the inverse of the carrying capacity of the species i in isolation.

    To characterize the microenvironmental fluctuations and noises in pairwise interactions between species within a microbiome community, we propose a general stochastic generalized Lotka–Volterra (SgLV) model that takes the following form

    dXi=Xi[αi+Nj=1βijXj]dt+XiNj=1τijdWj,i=1,,N, (2.2)

    Notice that we perturbed each interaction coefficient βij in the system (2.1) by replacing each term βijXj with the term βijXj+τijdWjdt (i,j=1,,N) where (W1,,WN)T is a standard Brownian motion of N dimensions.

    For convenience, we introduce our notation that will be used throughout this paper. First, we need to specify an appropriate completed filtered probability space. Let Ω={ωC(R,RN),ω(0)=0}, let F be the Borel σ-algebra on Ω, and let P be the measure induced by {W=W(t)}tR, a two-sided N-dimensional Brownian motion. Then the elements of Ω will be identified with paths of the N-dimensional Brownian process ω(t)=W(t,ω). Now we consider the P-completion of F, also denoted by F, that is, F contains all P-null sets. The filtration Ft is given by the canonical filtration generated by the Brownian motion {W(t)}t0 completed by all P-null sets of F. Denote the probability measure given by the extension of P to the completed F again by P. Therefore, a completed filtered probability space (Ω,F,{Ft}t0,P) is obtained. Next, consider the SgLV (2.2) whose solutions take values in [0,)n and describe the abundance dynamics of N interacting species X(t):=(X1(t),,XN(t))T, t0, in a microbial community. We write

    RN:={(x1,,xN)T:x1R,,xNR},

    RN+:={(x1,,xN)TRN:x10,,xN0},

    RN,+:={(x1,,xN)TRN:x1>0,,xN>0},

    fi=fi(X(t)):=αi+Nj=1βijXj (the per-capita growth rate of the species i),

    Γ:=(τij)N×N, and Σ:=ΓΓT=(σij)N×N where σij=Nk=1τikτjk.

    From now on, the stochastic process given by the solution of the system (2.2) will be denoted by X or (X(t))t0. We use Px to denote the probability law on Ω when the solution path starts at x=(x1,,xn)T and Ex is the expectation corresponding to Px.

    We use the norm x:=Ni=1|xi| in RN where x=(x1,,xN)T. For (u1,,uN)TRN, we let Ni=1ui:=min{u1,,uN} and Ni=1ui:=max{u1,,uN}.

    Based on the excellent work of Hening and Nguyen 2018 [32,33,34], we briefly present, without proofs, sharp sufficient conditions for both persistence and extinction for SgLV (2.2) and its corresponding deterministic counterpart.

    First, we consider the corresponding deterministic gLV system of the stochastic system (2.2) given by

    dXdt=h(X), (2.3)

    where h(X)=(h1(X),,hN(X))T, and hi(X):=Xi[αi+Nj=1βijXj], i=1,,N. We define persistence and extinction for the deterministic system (2.3) as follows.

    Definition 2.1. The system (2.3) is persistent if, for any initial value X(0) in RN,+, each component of the solution X(t) satisfies lim inftXi(t)>0 for all i=1,,N.

    Definition 2.2. For any initial value X(0)RN,+, the species Xi is said to go extinct if lim suptXi(t)=0.

    In a deterministic setting, in order to investigate the persistence and extinction of the system (2.3) (i.e., we wish to know which species persist or which species go extinct in this system), we look at the set N, which is the set of all the equilibria of the deterministic system (2.3) on the boundary RN+:=RN+RN,+, as well as the unique positive equilibrium E in RN,+. Clearly, N since 0=(0,,0)TN. Then, for each EN{E}, we compute the eigenvalues λ1(E),,λN(E) of the variational matrix dhdX|X=E. Hence, we have the following theorem about the persistence of the system (2.3).

    Theorem 2.1. The microbial community described by the system (2.3) becomes persistent, i.e., the abundance of each species is positive, if all the equilibria on the boundary RN,+ are unstable, which means that

    maxi=1,,NReλi(E)>0foreachEN, (2.4)

    This is equivalent to the fact that the unique positive equilibrium E is locally stable (i.e., all the eigenvalues of the linearized system at the positive equilibrium have non-positive real parts). If the deterministic system (2.3) is not persistent, we would like to track which species go extinct and which species survive. Indeed, for 0E=(x1,E,,xN,E)TN, 1n1<<nkN exist such that xni,E>0 for all i=1,,k and xj,E=0 for all j{1,,N}{n1,,nk}. Let

    RE+:={(x1,,xN)RN+:xi=0foriIcE},

    where IE:={n1,,nk} and IcE={1,,N}{n1,,nk}. If E=0, then R0+={0}. Let

    RE,+:={(x1,,xN)RN+:xi=0foriIcEandxi>0foriIE}

    and RE+=RE+RE,+,. Now we make the following assumption.

    Assumption 1. If a EN exists such that

    maxiIcEReλi(E)<0,

    If RE+{0}, then suppose further that for any ENE, we have

    maxiIEReλi(E)>0,

    where NE:={EN:RE+RE+}.

    Define

    N1:={EN:E satisfies Assumption 1}andN2:=NN1.

    To determine exactly which species go extinct, we need an additional assumption ensuring that equilibria that are not in N1 are unstable.

    Assumption 2. Either N2= or, for any EN2, maxi=1,,NReλi(E)>0.

    Theorem 2.2. If Assumptions 1 and 2 are satisfied and N1, then for any EN1, the species Xi(t) converges to 0 with the convergence rate λi(E) for any iIcE.

    Next, we consider the stochastic system (2.2). We make a standing assumption that will be used throughout this section.

    Assumption 3. The noise intensity matrix Γ of the system (2.2) satisfies the condition that the matrix Σ:=ΓΓT=(σij)N×N is a positive-definite matrix. Furthermore, c=(c1,,cN)TRN,+ and δ>0 exist such that for sufficiently large X, we have

    Ni=1ciXifi(X)1+cTXNi,j=1σijcicjXiXj2(1+cTX)2+δ(1+N+Ni=1|fi(X)|)<0. (2.5)

    Since the drift functions Xifi(X(t)) are locally Lipschitz functions on RN for any i=1,,N and the diffusion functions in the system (2.2) are linear, for each initial value in RN, the system (2.2) has a unique almost surely (a.s.) continuous solution X=X(t) up to the explosion time

    τe=inf{t>0:Ni=1Xi(t)=orNi=1Xi(t)=}.

    Using the Ito's formula for logXi(t), the system (2.2) can be written as

    Xi(t)=Xi(0)exp{t0fi(s)ds(12Nj=1τ2ij)t+Nj=1τijWj(t)}a.s.,

    which means that if X(0)RN+, then X(t)RN+ a.s. for all t[0,τe). Due to (2.5), we can obtain the tightness of the family of transition probabilities of the solution X to the system (2.2). This means that τe= a.s. Thus, for any initial value X(0) in RN+, the system (2.2) has a unique a.s. continuous strong solution X(t) in RN+ for all t[0,).

    The positive definiteness of the matrix Σ in Assumption 3 ensures that the solution to the system (2.2) is nondegenerate. This is an important assumption because it makes sure that we have enough noise that can locally push the dynamics of the solution in all directions, and hence any positive solution state can move close to any other positive solution state in a finite time. In other words, this condition guarantees that the family of transition probabilities of the solution has a smooth density, which will be used for the proof of the persistence and extinction of the system (2.2).

    Now, we define what we mean by persistence and extinction in our stochastic model.

    Definition 2.3. The process X is strongly stochastically persistent if it possesses a unique invariant probability measure π in RN,+ and

    limtP(t,x,)π()TV=0,xRN,+,

    where TV is the total variation norm and P(t,x,) is the transition probability of X.

    Definition 2.4. If X(0)=xRN,+, we say that the species Xi goes extinct with probability px>0 if

    Px{limtXi(t)=0}=px.

    We say that the species Xi goes extinct if, for all xRN,+,

    Px{limtXi(t)=0}=1.

    In a stochastic setting, to look into the persistence and extinction, we consider the set M as the collection of all ergodic invariant probability measures of the solution X(t) on the boundary RN+. Clearly, M, since δM is the Dirac measure concentrated at 0=(0,,0)TRN+. For ˜MM, we use Conv(˜M) to denote the convex hull of ˜M, which is the collection of probability measures of the form

    π()=μ˜Mpμμ()

    with pμ>0 and μ˜Mpμ=1. For δμM, since the stochastic system (2.2) is nondegenerate by Assumption 3, 1n1<<nkN exist such that the support of the ergodic invariant probability measure μ, denoted by supp(μ), is exactly Rμ+, in which

    Rμ+:={(x1,,xN)RN+:xi=0foriIcμ}

    for Iμ:={n1,,nk} and Icμ:={1,,N}Iμ. Let

    Rμ,+:={(x1,,xN)RN+:xi=0foriIcμandxi>0foriIμ}

    and Rμ+:=Rμ+Rμ,+. Similarly to a deterministic setting, persistence in a stochastic setting means that every ergodic invariant probability measure on the boundary is a repeller. Therefore, ergodic invariant probability measures play a similar role as equilibria in a deterministic setting. To determine an ergodic invariant probability measure of the stochastic system (2.3) to be a repeller, we compute its Lyapunov exponents by looking at the equations for lnXi(t), i=1,,N. Applying Ito's formula for lnXi(t) gives

    lnXi(t)t=lnXi(0)t+1tt0fi(X(s))ds12Nj=1τ2ij+Nj=1τijWj(t)t.

    When the solution X(t) to the system (2.3) is close to the support of an ergodic invariant probability measure μ, supp(μ), for a long time, then the Lyapunov exponent of the solution component Xi with respect to μ (which is also called the invasion rate of species Xi with respect to μ)

    λi(μ):=limtlnXi(t)t=limt1tt0fi(X(s))ds12Nj=1τ2ij,

    can be approximated by the average with respect to μ (using the strong law of large numbers)

    RN+fi(x)μ(dx)12Nj=1τ2ij,

    while the term

    lnXi(0)t+Nj=1τijWj(t)t,

    is negligible. This implies that the Lyapunov exponents λi(μ), i=1,,N, give the long-term growth rate of species Xi if X(t) is close to supp(μ). Therefore, an ergodic invariant probability measure μ is a repeller if maxi=1,,Nλi(μ)>0. Now we impose the following assumption that ensures the strong stochastic persistence.

    Assumption 4. For any μ in Conv(M),

    maxi=1,,Nλi(μ)>0,

    where

    λi(μ)=RN+fi(x)μ(dx)12Nj=1τ2ij,

    Theorem 2.3. Suppose that Assumptions 3 and 4 are true. Then the solution X(t) to the system (2.2) is strongly stochastically persistent, and its transition probabilities converge weakly to its unique invariant probability measure π in RN,+ exponentially fast in the total variation norm.

    Next, we make an assumption that will imply the extinction of the stochastic system.

    Assumption 5. There is a μM such that

    maxiIcμλi(μ)<0, (2.6)

    If Rμ+{0}, suppose further that for any νConv(Mμ), we get

    maxiIμλi(ν)>0, (2.7)

    where Mμ:={νM:supp(ν)Rμ+}.

    Theorem 2.4. Under Assumptions 3 and 5, for any δ>0 sufficiently small and for any initial value xRN,+, we obtain

    limtEx(Ni=1Xi(t))δ=0.

    We have several remarks for Assumption 5. First, if μM and iIμ, then λi(μ)=0. Intuitively, this is because if the solution is inside the support of an ergodic invariant probability measure μ, then it is at an "equilibrium" and it does not tend to grow or decay. If μM and maxiIcμλi(μ)<0, then μ is regarded as an "attractor", which attracts solutions starting nearby. So we require the condition (2.6) to make sure that the solution component Xi(t), iIcμ, will get close to 0 if it starts near Rμ,+. Second, the condition (2.7) is needed to guarantee that μ is really a "sink" in Rμ,+ but not the other invariant probability measures on Rμ+, that is, if X is close to Rμ,+, then it is not pulled away to the boundary Rμ+.

    Finally, let

    M1:={μM:μ satisfies Assumption 5}

    and M2:=MM1. If there is no persistence, we would like to know exactly which species go extinct and which species survive. On the basis of the repulsion of the invariant probability measures in M2, we can derive the "attracting" region of invariant probability measures in M1 for the solution X. We make an additional assumption that ensures all the invariant probability measures outside Conv(M1) are repellers.

    Assumption 6. Suppose that one of the following holds true:

    M2=;

    For any νConv(M2), maxi=1,,Nλi(ν)>0.

    For any initial value X(0)=xRN,+, let U=U(ω) denote the weak-limit set of the family of random occupation measures {˜Πt(),t0} where

    ˜Πt(A)=1tt01{X(s)A}ds,t>0,AB(RN,+),

    which is the proportion of time the solution spends in a set A up to time t.

    Theorem 2.5. Suppose that Assumptions 1, 5, and 6 hold true and M1. Then for any xRN,+, we obtain

    μM1Pμx=1

    in which

    Pμx:=Px{U(ω)={μ}andlimtlnXi(t)t=λi(μ)<0,iIcμ},μM1.

    Let us get started with the linearization of the deterministic system (2.3). We assume that all the equilibria of this system on the boundary RN,+ are unstable. Therefore, by Theorem 2.1, the unique positive equilibrium E=(X1,,XN)T is locally stable. To understand the behavior of the deterministic system near E, we approximate the system (2.3) by the following linear system, as the solution is close to E:

    dXidt=ϕi0+Nj=1θij(XjXj),i=1,,N, (2.8)

    where the values of ϕi0 and the θij are constants to be determined. In fact, we would like to replace each right-hand side of the system (2.3), hi(X), by its linearized form ϕi0+Nj=1θij(XjXj) for i=1,,N when X1X1,,XNXN are close to 0. The constant coefficients ϕi0 and θij are determined so that all the differences

    |hi(X)ϕi0Nj=1θij(XjXj)|,

    for i=1,,N approximate 0 as X is close to E. If we use the norm x2=x21++x2N in RN to identify how close two vectors are, then the linearization of the system (2.3) at the unique positive equilibrium E is equivalent to finding the vector ϕ0=(ϕ10,,ϕN0)T and the matrix Θ=(θij)N×N so that the difference

    h(X)ϕ0Θ(XE)2,

    approximates 0 as XE20. For each i=1,,N, applying the Taylor expansion for the function hi(X) at E gives

    hi(X)=hi(E)+hi(E)X(XE)+12(XE)T2hi(E)X2(XE)+o(XE22),

    in which

    hi(E)X=(hi(E)X1,,hi(E)XN)T

    and

    2hi(E)X2=(2hi(E)XjXk)N×N.

    This implies that as XE20, we get

    |hi(X)hi(E)hi(E)X(XE)|0,

    Notice that hi(E)=0 for each i=1,,N. Therefore, when we choose ϕ0=0 and Θ=h(E)X:=(hi(E)Xj)N×N, then

    hi(X)Nj=1θij(XjXj),

    as X is close to E. Then the linearization of the deterministic system (2.3) at E takes the form

    dXidt=Nj=1hi(E)Xj(XjXj),i=1,,N, (2.9)

    Next, we continue with the linearization of the stochastic system (2.2). We assume that the sufficient conditions (i.e., Assumption 3 and Assumption 4) for the strong persistence of the stochastic system are satisfied. By Theorem 2.3, there is a unique exponentially ergodic invariant probability measure π in RN,+. So, for any xRN,+, all transition probabilities P(t,x,)=Px{X(t)} of the solution X(t) starting at x become the same and equal to π as the time t becomes large enough. We call π the unique positive stationary solution of the stochastic system (2.2). We say that the solution X(t) is close to π if P(t,x,)π()TV is close to 0. Note that Exπ approaches E for any xRN,+ when all entries in the noise intensity matrix Γ approach 0. Since, as time passes, all solutions X(t) will eventually end up in the unique positive stationary distribution no matter where it starts, we will drop the subscript x in the notation of Ex and Px in this subsection.

    The linearization of the stochastic system (2.2) at the stationary solution π is to approximate this system by the following linear SDE system when the solution is close to π:

    dXi=[Φi+Nj=1γij(Xjmj)]dt+XiNj=1τijdWj,i=1,,N, (2.10)

    where (m1,,mN)=Eπ. In fact, our statistical linearization method is the replacement of each drift term hi(X)=Xi[αi+Nj=1βijXj] in (2.2) by its linearized form Φi+Nj=1γij(Xjmj) for i=1,,N, in which the coefficients Φi's and γij's are determined by using the Taylor expansion of each hi(X) at m=(m1,,mN). In other words, we would like to find these coefficients to minimize

    |hi(X)ΦiNj=1γij(Xjmj)|, (2.11)

    as the solution X of the stochastic system (2.2) is close to π. As in the deterministic setting, we obtain

    Φi=hi(m)=mi[αi+Nj=1βijmj]i=1,,N, (2.12)

    and for i,j=1,,N

    γij=hixj(m)={αi+Nk=1βikmkifi=j,βijmiifij, (2.13)

    in which each mi can be computed using the strong law of large numbers

    mi=RN+Xiπ(dX)=limt1tt0Xi(s)ds, (2.14)

    Therefore the system (2.2) can be approximated by the linear SDE system at the positive stationary solution π

    d[˜X1˜XN]=([Φ1ΦN]+[γ11γN1γ1NγNN,][˜X1m1˜XNmN])dt+Ni=1[τ1i00τNi,][˜X1˜XN]dWi. (2.15)

    Finally, we comment on our linearization method. We know that the solution of the nonlinear stochastic system (2.2) is a Markov process X=X(t). In fact, our linearization is to find another Markov process, say ˜X=˜X(t), which represents the solution of the linear SDE system (2.15) such that both X and ˜X have the same mean, and the variance EX˜X22 is to be minimized as the time becomes large enough. In other words, when time goes by and becomes sufficiently large, the long-term behaviors of the two Markov processes X and ˜X are approximately the same. Because of that, a δ>0 exists such that the linear SDE system (2.15) is considered as a "δ-perturbation" of the non-linear SDE system (2.2), and thus strong persistence of the system (2.2) implies that of the system (2.15) (see the definition of "δ-perturbation" and Proposition 3 in [28] for more details).

    On the basis of two excellent works by Arnoldi [19,20], we indirectly establish four resilience measures, which are the instantaneous return rate, average return rate, asymptotic resilience, and convergence rate of our stochastic gLV model, through the second moment of the solutions of the corresponding linearized SDE system. These four resilience quantities, in which the first two are functions of time and the last two are just numbers, measure how strong the recovery dynamics of our original stochastic system is after a pulse perturbation is applied.

    Now we start building up these four stability concepts by letting

    A=[γ11γN1γ1NγNN],Bi=[τ1i00τNi],i=1,,N,Θ=(Φ1,,ΦN)T,m=(m1,,mN)T,andY=˜Xm.

    Then (2.15) can be written in matrix form:

    dY=(AY+Θ)dt+Ni=1(BiY+Bim)dWi. (2.16)

    We assume that the linearized system (2.16) is already at its equilibrium state. A pulse perturbation at time t=0 applied to this linear SDE system is characterized by a vector u=Y(0), which describes the state of the system right after the perturbation. Then the recovery trajectory Y(t) after this pulse perturbation is given by

    Y(t)=Φ(t)u+Φ(t)t0Φ1(s)[ΘNi=1B2im]ds+Ni=1Φ(t)t0Φ1(s)BimdWi(s), (2.17)

    in which Φ(t) is the solution matrix of the homogeneous system

    dY=AYdt+Ni=1BiYdWi. (2.18)

    with the initial condition Φ(0)=I (the identity matrix of size N). In general, we cannot explicitly compute the solution Y(t) in terms of A, B1, ..., BN. Notice that the solution Y(t) is nowhere differentiable. Without the smoothness of the solution, it is difficult to build up stability concepts of a dynamic system that are able to be tractable. To avoid this difficulty, we can smooth out the solutions of the linearized system (2.16) by looking at the dynamics of their second moments. To proceed, let P(t)=E(Y(t)Y(t)T) be the expectation of the matrix-valued process Y(t)Y(t)T, which is given by the unique solution of the matrix ODE (see [23])

    dPdt=AP+PAT+Ni=1BiPBi+ΘMT+MΘT+Ni=1Bi(MmT+mMT+mmT)Bi, (2.19)

    with the initial value P(0)=Y(0)Y(0)T=uuT, and M:=M(t)=EY(t) is the solution to the linear ODE system, with the initial value M(0)=u,

    ˙M=AM+Θ. (2.20)

    We can treat the matrix P(t) as a N2-dimensional vector

    Z(t):=vec(P(t))=(Z1(t),,ZN2(t))T=(E(Y21(t)),E(Y2(t)Y1(t)),,E(YN(t)Y1(t)),E(Y1(t)Y2(t)),E(Y2(t)2),,E(YN(t)Y2(t)),,E(YN(t)2))T.

    Applying the vectorization operation on both sides of the system (2.19) and then using the properties of Kronecker products and matrix vectorization in linear algebra [23], we obtain the equivalent N2-dimensional ODE system

    dZdt=SZ+T, (2.21)

    in which

    S=IA+AI+Ni=1BiBi,

    and

    T=vec(ΘMT+MΘT)+Ni=1(BiBi)vec(MmT+mMT+mmT),

    The matrix S is called the mean-square stability matrix of the linear SDE system (2.16). The stationary solution to (2.16) is globally mean-square asymptotically stable if α(S)=maxi=1,,N2Re(λi)<0, where Re(λi) is the real part of the eigenvalue λi of the matrix S. Now let

    W(t):=Ni=1EY2i(t)=E(Ni=1Y2i(t))=E(YT(t)Y(t)),

    then W(t) measures the average squared distance of the solution X(t) of the system (2.2) to the expectation of its equilibrium state π. We define four resilience quantities based on W(t) as follows:

    Rinst=ddtlnW(t)1/2 is the instantaneous return rate at time t>0,

    Ravet=lnW(t)1/2lnW(0)1/2t is the average return rate over the time interval [0,t],

    Rc=12α(S) is the rate of convergence of the ODE system (2.21), and

    R=limtRavet is the asymptotic resilience of the ODE system (2.21).

    In these definitions, Rinst and Ravet are deterministic functions of time t, which capture the short-term behavior of the recovery trajectory Y(t), while Rc and R are only numbers, which represent the long-term dynamics of the solution's recovery. These two functions and two numbers give us more tractable computations that measure the recovery dynamics of the stochastic system (2.16) since we are able to explicitly compute the quantity lnW(t)1/2 and its derivative in terms of the matrices S and T. In fact, two quantities Rinst and Ravet can be computed on the basis of the derivative of lnW(t)

    ddtlnW=1WdWdt=1W{E(YT(AT+A)Y)+ΘTM+MTΘ+Ni=1[E(YTB2iY)+mTB2iM+MTB2im+mTB2im]}=1W{[vec(AT+A)+Ni=1vec(B2i)]TZ+ΘTM+MTΘ+Ni=1(mTB2iM+MTB2im+mTB2im)},

    where W, M, and Z can be computed from two ODE systems (2.20) and (2.21). Lastly, when all noise intensities are equal to 0 in the system (2.2) (that is, Bi=0 for all i=1,,N), all our definitions of the four resilience measures above correspond to those defined in Arnoldi [20] in which the rate of convergence and the asymptotic resilience coincide.

    To demonstrate the method we proposed in previous section, let us look at the stochastic gLV model of two species

    dX1=X1(α1β11X1+β12X2)dt+X1(τ11dW1+τ12dW2),dX2=X2(α2+β21X1β22X2)dt+X2(τ21dW1+τ22dW2), (3.1)

    where β11>0 and β22>0. There are three cases:

    ● If β12<0 and β21<0, then the system (3.1) is a competitive model.

    ● If β12>0 and β21>0, then the system (3.1) is a mutualistic model.

    ● If (β12<0,β21>0) or (β12>0,β21<0), then the system (3.1) is a predator–prey model (mixed competition and mutualism).

    Next, we analyze the dynamics of the system (3.1) on the boundary of R2+, which is R2+:=R1+R2+{(0,0)}, where R1+:={(x1,0):x1>0}andR2+:={(0,x2):x2>0}.

    Case 1. If X1(0)=0 and X2(0)=0, then, by (3.1), we get, for a.s.

    X1(t)=X1(0)exp{t0[α1β11X1(s)+β12X2(s)τ2112τ2122]dt+τ11W1(t)+τ12W2(t)},X2(t)=X2(0)exp{t0[α2+β21X1(s)β22X2(s)τ2212τ2222]dt+τ21W1(t)+τ22W2(t)}.

    This implies that X1(t)=X2(t)=0 for all t>0 a.s. So, the Dirac delta measure at (0,0), μ0, is an ergodic invariant probability measure for the system (3.1) on the boundary R2+.

    Case 2. If X1(0)=0 and X2(0)>0, then, as in Case 1, X1(t)0 a.s. and X2(t)>0 for all t>0 a.s. Plugging in 0 into X1 in the second equation of (3.1) gives

    dX2=X2(α2β22X2)dt+X2(τ21dW1+τ22dW2). (3.2)

    For a fixed α2>0, consider

    s(X2)=X2α2exp{yα2(2α22β22u)u(τ221+τ222)u2du}dy=C2X2α2y2α2τ221+τ222exp{2β22τ221+τ222y}dy.

    Since the integrand in the last integral can be written as

    y2α2τ221+τ222[1+2β22τ221+τ222y+12!4β222(τ221+τ222)2y2+],

    a kZ+ exists such that 2α2τ221+τ222+k>1, s()=. If 2α2τ221+τ222+1>0, which is equivalent to α2τ2212τ2222<0, then s(0+)>. This means that limtX2(t)=0 a.s. and so the equation (3.2) does not have any invariant probability measure on R+. If 2α2τ221+τ222+1<0, which is equivalent to α2τ2212τ2222>0, then s(0+)=. Therefore, X2(t) oscillates between 0 and . Thus the equation (3.2) has a unique invariant probability measure π2 on R+, which is the Gamma distribution

    π2Gamma(2α2τ221+τ2221,τ221+τ2222β22).

    So μ2:=δ0×π2 is an ergodic invariant probability measure for the system (3.1) on the boundary R2+, in which δ0 is the Dirac Delta measure with the mass at 0.

    Case 3. If X1(0)>0 and X2(0)=0 then X1(t)>0 for all t>0 a.s. and X2(t)0 a.s. But then the first equation of (3.1) becomes

    dX1=X1(α1β11X1)dt+X1(τ11dW1+τ12dW2). (3.3)

    By the same argument as above, the equation (3.3) has a unique invariant probability measure π1 on R+, which is the Gamma distribution

    π1Gamma(2α1τ211+τ2121,τ221+τ2222β11),

    provided that α1τ2112τ2122>0. Otherwise, (3.3) does not have any invariant probability measure on R+. Therefore, μ1=π1×δ0 is an ergodic invariant probability measure for the system (3.1) on the boundary R1+.

    Now we compute the Lyapunov exponents of μ0=δ0×δ0, μ1, and μ2. By Ito's formula, the system (3.1) is equivalent to

    lnX1(t)t=lnX1(0)t+1tt0[α1β11X1(s)+β12X2(s)τ2112τ2122]ds+τ11W1(t)t+τ12W2(t)t,lnX2(t)t=lnX2(0)t+1tt0[α2+β21X1(s)β22X2(s)τ2212τ2222]ds+τ21W1(t)t+τ22W2(t)t.

    By the strong law of large numbers, when the solution X(t)=(X1(t),X2(t))T of (3.1) is close to supp(δ)={(0,0)} for a long time (meaning that the solution X(t) is within some small neighborhood of (0,0) for a long time), the Lyapunov exponent for the first component (the invasion rate of Species 1) with respect to μ0 is

    λ1(μ0)=limtlnX1(t)t=limt1tt0[α1β11X1(s)+β12X2(s)τ2112τ2122]ds=R2+[α1β11X1+β12X2τ2112τ2122]μ0(dX)=α1τ2112τ2122.

    Similarly,

    λ2(μ0)=R2+[α2+β21X1+β22X2τ2212τ2222]μ0(dX)=α2τ2212τ2222.

    By the similar computations, the Lyapunov exponents of μ1 and μ2 can be computed as

    λ1(μ1)=0,λ2(μ1)=α2τ2212τ2222+β21β11α1β212β11(τ211+τ212),λ1(μ2)=α1τ2112τ2122+β12β22α2β122β22(τ221+τ222),λ2(μ2)=0.

    On the basis of the Lyapunov exponents of the set of all ergodic invariant probability measures of the system (3.1) on the boundary R2+, denoted M={μ0,μ1,μ2}, we can give the complete classification of the dynamics of the system (3.1). First of all, we assume that the matrix

    Σ=[σ11σ12σ21σ22],

    where σij=τi1τj1+τi2τj2 (i,j=1,2), is positive definite. This is equivalent to requiring that

    τ211+τ212>0,τ221+τ222>0,andτ11τ22τ12τ210.

    Next, we claim that the family of transition probability functions of the solution to the system (3.1) is tight when the system (3.1) is either competitive or predator–prey. Furthermore, when the system (3.1) is mutualistic, the family of transition probability functions of its solution might not be tight. In this case, we show that the solution blows up in finite time almost surely or that there is no invariant measure on R2,+. Indeed, we first assume β12<0 and β21<0. In order to prove the tightness of the solution of (3.1), it suffices to show that there is a δ>0 such that

    X1f1(X)+X2f2(X)1+X1+X22i,j=1σijXiXj2(1+X1+X2)2+δ(3+|f1(X)|+|f2(X)|)<0 (3.4)

    for any X=|X1|+|X2| that is sufficiently large, where f1(X)=X1(α1β11X1+β12X2) and f2(X)=X2(α2+β21X1β22X2). Since β12 and β21 are both negative, as X, we get

    X1f1(X)+X2f2(X)(1+X1+X2)2=α1X1+α2X2(β11X21+β22X22)+(β12+β21)X1X2(1+X1+X2)2

    which approaches . So there is a ˜β>0 such that for any X that is sufficiently large

    X1f1(X)+X2f2(X)1+X1+X2<˜β(1+X1+X2).

    On the other hand, using the Cauchy–Schwarz's inequality, a ˜σ>0 exists such that

    2i,j=1σijXiXj2(1+X1+X2)2˜σX21+X22(1+X1+X2)2˜σ

    when X is sufficiently large. Then we can choose a δ>0 that is small enough so that we can obtain the inequality (3.4) as X is large enough. Second, we suppose either (β12<0,β21>0) or (β12>0,β21<0). Then c1>0 and c2>0 exist such that c1β12+c2β21=0. This implies that

    limXc1X1f1(X)+c2X2f2(X)(1+c1X1+c2X2)2=.

    By the same reasoning as in the competitive case, we can easily find a δ>0 that is sufficiently small such that, for any X large enough, we get the inequality (3.4). Lastly, for the case β12>0 and β21>0, we show that the solution of (3.1) blows up in finite time almost surely under the assumptions that

    α1τ2112τ2122>0,α2τ2212τ2222>0,andβ11β22β12β210.

    By way of contradiction, assume that X(t) does not blow up in finite time and has an invariant probability measure on R2,+. As a result, X(t) is a recurrent process. But then, by Ito's formula,

    β22lnX1(t)+β12lnX2(t)t=β22lnX1(0)+β12lnX2(0)t+β22(α1τ2112τ2122)+β12(α2τ2212τ2222)+(β12β21β11β22)1tt0X1(s)ds+(τ11β22+τ21β12)W1(t)t+(τ12β22+τ22β12)W2(t)t.

    It follows that, by the above assumptions,

    lim suptβ22lnX1(t)+β12lnX2(t)t>0a.s.

    which is a contradiction. Therefore, the system (3.1), which is competitive or predator–prey, satisfies Assumption 3 in Subsection 2.2.

    Now we focus on the stochastic model (3.1) of the competitive type and predator–prey type. On the basis of Theorems 2.3, 2.4, and 2.5 in Subsection 2.2, we are able to describe the complete dynamic picture of the system (3.1) as follows.

    C1. Suppose λ1(μ0)<0 and λ2(μ0)<0. This means that there are no other ergodic invariant probability measures in R2+ in addition to μ0. Then M={μ0}, which implies that Assumptions 3 and 5 hold for the system (3.1). By Theorem 2.4, the solution X(t)=(X1(t),X2(t))T converges a.s. to (0,0)T for any initial condition x=(x1,x2)R2,+.

    C2. Suppose λ1(μ0)>0 and λ2(μ0)<0. Then M={μ0,μ1}. There are two cases.

    (ⅰ) If λ2(μ1)>0 then Assumption 4 is fulfilled and hence transition probability function of the solution to the system (3.1) converges to an invariant probability measure ˉμ1 in R2,+ in total variation norm.

    (ⅱ) If λ2(μ1)<0 then M1={μ1} and so M2=MM1={μ0}. This means that Assumptions 3 and 6 are satisfied. By Theorem 2.5, X2(t) converges to 0 with the exponential rate λ2(μ1) and the family of random occupation measures of the solution converges to μ1.

    C3. Suppose λ1(μ0)<0 and λ2(μ0)>0. Then M={μ0,μ2}. There are 2 cases:

    (ⅰ) If λ1(μ2)>0, then Assumption 4 is fulfilled, and hence the transition probability function of the solution to the system (3.1) converges to an invariant probability measure ˉμ2 in R2,+ in the total variation norm.

    (ⅱ) If λ1(μ2)<0, then M1={μ2} and so M2=MM1={μ0}. This means that Assumptions 3 and 6 are satisfied. By Theorem 2.5, X1(t) converges to 0 with the exponential rate λ1(μ2), and the family of random occupation measures of the solution converges to μ2.

    C4. Suppose λ1(μ0)>0 and λ2(μ0)>0. Then M={μ0,μ1,μ2}. There are four cases.

    (ⅰ) If λ1(μ2)>0 and λ2(μ1)>0, then any invariant probability measure on R2+ has the form μ=p0μ0+p1μ1+p2μ2, where p0, p1, and p2 are positive numbers such that p0+p1+p2=1. It is straightforward that max{λ1(μ),λ2(μ)}>0. Then Assumptions 3 and 4 are satisfied. By Theorem 2.3, there is a unique invariant probability measure μ on R2,+, and the transition probability function P(t,x,), xR2,+, converges to μ in the total variation norm.

    (ⅱ) If λ1(μ2)<0 and λ2(μ1)<0, then M1={μ1,μ2} and so M2=MM1={μ0}. Hence, Assumptions 3 and 6 hold true. By Theorem 2.5, px1>0, px2>0, and px1+px2=1, where

    px1=Px{U(ω)={μ1}andlimtlnX2(t)t=λ2(μ1)<0},px2=Px{U(ω)={μ2}andlimtlnX1(t)t=λ1(μ2)<0}.

    (ⅲ) If λ1(μ2)>0 and λ2(μ1)<0, then the conclusion is the same as in C2(ⅱ).

    (ⅳ) If λ1(μ2)<0 and λ2(μ1)>0, then the conclusion is the same as in C3(ⅱ).

    Finally, to study the resilience of the two-dimensional stochastic gLV model (3.1), we assume that the system (3.1) is competitive or predator–prey type with the matrix Σ being positive-definite and satisfies the inequality (3.4) for some δ>0. Furthermore, we assume that the system (3.1) satisfies the assumptions as in C2(ⅰ) or C3(ⅰ) or C4(ⅰ).

    Then the stochastic gLV model of two species is stochastically strongly persistent, that is, it has a unique positive stationary distribution π on R2,+ such that the transition probability function of its solution converges to π in the total variation norm. The linearized system of (3.1) at π is given by

    d˜X1=[Φ1+γ11(˜X1m1)+γ12(˜X2m2)]dt+˜X1(τ11dW1+τ12dW2),d˜X2=[Φ2+γ21(˜X1m1)+γ22(˜X2m2)]dt+˜X2(τ21dW1+τ22dW2), (3.5)

    where the values of θi and the γij are determined by

    Φ1=α1m1β11m21+β12m1m2,Φ2=α2m2+β21m2m1β22m22,γ11=α1+2β11m1+β12m2,γ12=β12m1,γ21=β21m2,γ22=α2+β21m1+2β22m2,

    with

    mi=R2,+Xiπ(dX1dX2)=limt1tt0Xi(s)ds(i=1,2).

    Let A=[γ11γ12γ21γ22], Θ=(Φ1,Φ2)T, B1=[τ1100τ21], B2=[τ1200τ22], σij=τi1τj1+τi2τj2 (i,j=1,2), and Yi=˜Ximi (i=1,2). Then the system (3.5) becomes

    dY=(AY+Θ)dt+(B1Y+B1m)dW1+(B2Y+B2m)dW2 (3.6)

    where Y=(Y1,Y2)T and m=(m1,m2)T. Next, by computation, we have

    I2A=[γ11γ1200γ21γ220000γ11γ1200γ21γ22],AI2=[γ110γ1200γ110γ12γ210γ2200γ210γ22],B1B1=[τ2110000τ11τ210000τ21τ110000τ221],B2B2=[τ2120000τ12τ220000τ22τ120000τ222].

    Thus S=I2A+AI2+B1B1+B2B2, which is equal to

    S=[2γ11+σ11γ12γ120γ21γ11+γ22+σ120γ12γ210γ11+γ22+σ21γ120γ21γ212γ22+σ22],

    and T=vec(ΘMT+MΘT)+[B1B1+B2B2]vec(MmT+mMT+mmT), which is equal to

    T=[2m1θ1+σ11(2M1m1+m21)m1θ2+m2θ1+σ12(M2m1+m2M1+m2m1)m1θ2+m2θ1+σ21(M1m2+m1M2+m1m2)2m2θ2+σ22(2M2m2+m22)],

    in which M=(M1,M2)T=(EY1,EY2)T is the solution to the ODE system

    ˙M=AM+Θ, (3.7)

    with the initial value M(0)=u. If we let Z=(EY21,EY2Y1,EY1Y2,EY22)T, then we can smooth out the solutions of (3.6) by looking at the solutions of the following ODE system

    dZdt=SZ+T, (3.8)

    By using two ODE systems (3.7) and (3.8), we will develop graphical representations of the four resilience measures, which are the instantaneous return rate, the average return rate, the asymptotic resilience, and the convergence rate, for the stochastic system (3.1) of the competitive type over the relevant parameter regimes in the next subsection.

    As a simple illustration of how environmental noise and microbial interactions combine to drive the rate of return of a microbial community to the equilibrium state after a pulse perturbation, we look at a SgLV model of two competitive species (3.1). Here, we assume that two species are in direct competition with each other, so that their interaction strengths β12 and β21 are negative.

    In each panel of Figure 3, two trajectories of the system (3.1) are produced, in which the trajectories before time zero represent the equilibrium states of the two species and the trajectories after time zero are the recovery dynamics of the two species following a pulse perturbation at time zero. The difference between Figure 3A and Figure 3C is due to the difference in the interaction strength β12, which measures the strength of interspecific competition of Species 2 on Species 1. In Figure 3A, β12 is twice as large as in Figure 3C, while the noise intensities are the same in both plots. In contrast, the difference between Figure 3A and Figure 3B is caused by the difference in the noise intensities of two white noises: In Figure 3B, the noise intensities of dW1dt and dW2dt are five times those in Figure 3A, while the interaction strengths between species are the same, leading to greater fluctuations in the species' abundances. Figure 3D differs from Figure 3A in that the noise intensities increase by a factor of 5 and the interaction strength β12 decreases by a factor of 2.

    Figure 3.  Dynamics of the stochastic system given by (3.1) with four parameter sets. Panel A corresponds to the parameter Set 1 (low noise, strong interaction): (α1,α2)=(6,4), (β11,β12,β21,β22)=(0.05,0.03,0.01,0.027), and (τ11,τ12,τ21,τ22)=(0.01,0.015,0.01,0.02). Panel B corresponds to the parameter Set 2 (high noise, strong interaction): (α1,α2)=(6,4), (β11,β12,β21,β22)=(0.05,0.03,0.01,0.027), and (τ11,τ12,τ21,τ22)=(0.05,0.075,0.05,0.1). Panel C corresponds to the parameter Set 3 (low noise, weak interaction): (α1,α2)=(6,4), (\beta_{11}, \beta_{12}, \beta_{21}, \beta_{22}) = (0.05, -0.015, -0.01, 0.027) , and (\tau_{11}, \tau_{12}, \tau_{21}, \tau_{22}) = (0.01, 0.015, 0.01, 0.02) . Panel D corresponds to the parameter Set 4 (high noise, weak interaction): (\alpha_1, \alpha_2) = (6, 4) , (\beta_{11}, \beta_{12}, \beta_{21}, \beta_{22}) = (0.05, -0.03, -0.01, 0.027) , and (\tau_{11}, \tau_{12}, \tau_{21}, \tau_{22}) = (0.05, 0.075, 0.05, 0.1) .

    Resilience in a pulse-perturbed stochastic system can be illustrated by this example. When comparing Figure 3A and Figure 3C, we observe that decreasing the interaction strength \beta_{12} reduces the return time of the system to the equilibrium state after a pulse perturbation. That is, lower competition between two species intensifies the resilience of the system under the same environmental regime. When comparing Figure 3A and Figure 3B, increasing the noise intensities decreases the time required for recovery trajectories to return to an equilibrium state. In this example, it appears that increasing environmental noise actually increases stability, which contrasts with the intuition that increasing noise destabilizes the stability of the system [21,22,23]. Figure 3D shows the combined effect of environmental noise and interactions between species on the resilience of a stochastic system. In the next subsection, we will use our theoretical framework to test our observations in Figure 3.

    To demonstrate the usefulness of our theoretical framework developed above, we apply the procedure to calculate the resilience of a SgLV (described in Figure 2) to the simulated data in Figure 3. If we observe Figure 3 in Section 2, the recovery trajectories of two species return to an equilibrium state at time 4.2 after a pulse perturbation at time zero in Panel A. Reducing the interaction strength \beta_{12} by 2-fold relative to Panel A, Panel C shows that the return time to the equilibrium state is reduced to approximately 2.4 after the same pulse perturbation as in Panel A at time zero. This means that the microbial community in Panel C is more resilient than that in Panel A. Increasing the noise intensities five times compared with Panel A, the return time to the equilibrium state in Panel B decreases to some time around 3.6 with the same perturbation as in Panel A. In other words, the microbial community in Panel C is more resilient than that in Panel A. This shows that environmental noise influences the resilience of the microbial community. Panel D shows the fastest return time, which is approximately 1.8 , to the equilibrium state among communities when we decrease the interaction strength \beta_{12} two times and increase the noise intensities five times. Below, we calculate four resilience measures for each panel of Figure 3 to confirm our observation that the resilience of the microbial community is driven by environmental noise and to disentangle the sources of the effects of environmental fluctuations on the resilience of a microbiome system. To calculate the resilience measures, the first step is to linearize the stochastic system (3.1) at its stationary distribution \pi^* , and then we can obtain the linearized stochastic system (3.5), where all coefficients \Phi_i , \gamma_{ij} , and m_i ( i, j = 1, 2 ) are computed and given in Table 1. Note that (m_1, m_2)^T is the expectation of the stationary distribution \pi^* . Figures 4A, 4B, 5A, and 5B show that the stochastic dynamics of both the original stochastic system (3.1) and the linearized stochastic system (3.5) converge to the equilibrium state characterized by the stationary distribution \pi^* following the same pulse perturbation at time 0 in four scenarios corresponding to the four parameter sets in Table 1. In Figure 4A, we see that the trajectories of the linearized system approach and merge into those of the original system between time 4 and 4.5 , while the time in Figure 4B is between 3 and 3.5 . In Figure 5A, the merging time of the trajectories of both systems is between 2.5 and 3. The fastest merging time between 1.5 and 2 is shown in Figure 5B. Thus, the patterns of the return time to the equilibrium state in both the original system and the linearized system when applied to the same pulse perturbation are the same, even though the dynamics of the trajectories of both systems are quite different before the merging time. Therefore, we are able to utilize a much simpler linearized system (3.5) to investigate the resilience of a microbial community whose temporal dynamics are captured by the original system (3.1).

    Table 1.  Coefficients of the linearized stochastic system (3.5) are computed in four scenarios that correspond to the four parameter sets in Figure 3.
    Parameter set 1 Parameter set 2 Parameter set 3 Parameter set 4
    m_1 38.7470 38.8413 84.4804 84.4508
    m_2 135.3518 134.9853 118.2069 117.8869
    \Phi_{1} 0.0813 0.3254 0.2431 0.7731
    \Phi_{2} 0.4263 1.6171 0.3562 1.3958
    \gamma_{11} -1.9353 -1.9337 -4.2211 -4.2134
    \gamma_{12} -1.1624 -1.1652 -1.2672 -1.2668
    \gamma_{21} -1.3535 -1.3499 -1.1821 -1.1789
    \gamma_{22} -3.6062 -3.5876 -3.1492 -3.1318

     | Show Table
    DownLoad: CSV
    Figure 4.  Recovery dynamics (Panels A and B) of both the linearized system (3.1) and the original system (3.5) following the same pulse perturbation (10, 15) , and the dynamics of four resilience measures (average return rate, rate of convergence (Panels C and D) and instantaneous return rate, asymptotic resilience (Panels E and F)) in two scenarios of parameter sets 1 and 2.
    Figure 5.  Recovery dynamics (Panels A and B) of both the linearized system (3.1) and the original system (3.5) following the same pulse perturbation (10, 15) , and the dynamics of four resilience measures (average return rate, rate of convergence (Panels C and D) and instantaneous return rate, asymptotic resilience (Panels E and F)) in two scenarios of parameter sets 3 and 4.

    Next, we transform the linearized system (3.5) into the two linear ODE systems (3.7) and (3.8) by, respectively, finding the first and second moment of solutions and then using vectorization in linear algebra. Then four resilience measures \mathcal{R}_t^{\text{ins}} , \mathcal{R}_t^{\text{ave}} , \mathcal{R}_c , and \mathcal{R}_{\infty} are calculated on the basis of W(t) , the mean square difference between the solution of the linearized system (3.5), the expectation of the stationary distribution \pi^* , and the mean square stability matrix S . Note that the mean square stability matrix S and W(t) can be calculated from the linearized coefficients shown in Table 1. The last four panels of Figure 4 and Figure 5 show the dynamics of four resilience measures over time in the four scenarios above.

    Two resilience measures {\mathcal{R}}_t^{\text{ins}} and {\mathcal{R}}_t^{\text{ave}} , which are deterministic functions of time, can help us track the short-term behavior of \ln W(t) . When {\mathcal{R}}_t^{\text{ins}} > 0 , \frac{d}{dt}\ln W(t)^{1/2} < 0 , which implies that \ln W(t) is decreasing to its equilibrium point at time t . In other words, when {\mathcal{R}}_t^{\text{ins}} > 0 , the solution \widetilde X(t) of the linearized system (3.5) moves closer to the equilibrium state \pi^* at time t . By the same argument, when {\mathcal{R}}_t^{\text{ins}} < 0 , \ln W(t) is increasing, and hence the solution \widetilde X(t) moves away from the equilibrium state \pi^* at time t . In terms of {\mathcal{R}}_t^{\text{ave}} , if {\mathcal{R}}_t^{\text{ave}} > 0 , then \ln W(t) < \ln W(0) , which means that the solution \widetilde X(t) comes closer to the equilibrium state at time t , since the end of the perturbation at time 0. If {\mathcal{R}}_t^{\text{ave}} < 0 , then the solution \widetilde X(t) moves far away from the equilibrium state since the end of the perturbation. The deterministic function {\mathcal{R}}_t^{\text{ins}} measures the rate of change in \ln W(t) . So we can use it to track when \ln W(t) comes back to its equilibrium point after the perturbation is applied. It is straightforward to see that the time when \ln W(t) reaches its equilibrium point (which is a constant) is the time when {\mathcal{R}}_t^{\text{ins}} crosses 0. Therefore, we can estimate the return time of the solution \widetilde X(t) to the equilibrium state \pi^* when the first time {\mathcal{R}}_t^{\text{ins}} crosses 0. In the last four panels of Figure 4, the return time to the equilibrium state decreases from 4.45 to 3.08 when the noise intensities are increased by five times. The same decreasing pattern in the return time occurs in the last four panels of Figure 5, which is that the return times go down from 2.75 to 1.9 . This pattern can be explained by the physical meaning of W(t) . By our definition in Section 2.4, W(t) is the sum of all the variances of the solution components of the linearized system (3.5), which measures how far the solution of the linearized system (3.5) is from the equilibrium state \pi^* . Then high noise makes the equilibrium state \pi^* have higher variance so that there is less recovery required following perturbation to return to \pi^* . So, the return to the equilibrium state \pi^* is actually faster than in the system with high noise than with low noise.

    Two resilience measures {\mathcal{R}}_c and {\mathcal{R}}_{\infty} , which are constants, can be used to predict the resilience of the stochastic system (3.1) by looking at the gap {\mathcal{R}}_c - {\mathcal{R}}_{\infty} and how fast the average return rate {\mathcal{R}}_t^{\text{ave}} enters this gap. In Table 2, when we increase the noise five times, the gap {\mathcal{R}}_c-{\mathcal{R}}_{\infty} decreases from 1.2588 (parameter Set 1: low noise, strong interaction) to 1.2508 (parameter Set 2: high noise, strong interaction). From Panels C and D in Figure 4, we also observe that the time when the average return rate {\mathcal{R}}_t^{\text{ave}} moves into the gap between {\mathcal{R}}_c and {\mathcal{R}}_{\infty} is shorter when the noise increases. This pattern is the same as we move from parameter Set 3 (low noise, weak interaction) to parameter Set 4 (high noise, weak interaction). The pattern of a decreased gap implies shorter for moving into the time gap, caused by increasing noise, which can be elucidated by looking at the mathematical meanings of the two resilience measures {\mathcal{R}}_c and {\mathcal{R}}_{\infty} . First, the quantity {\mathcal{R}}_c measures how fast the covariant matrix P(t) , which is the solution of the system (2.19), converges to its equilibrium state. Basically, {\mathcal{R}}_c equals the negative half of the real part of the largest eigenvalue of the mean square stability matrix S , which means that {\mathcal{R}}_c tells us about the smallest convergence rate among the convergence rates of the elements of the covariant matrix P(t) . While {\mathcal{R}}_{\infty} can be considered as an equilibrium state of the average return rate {\mathcal{R}}_t^{\text{ave}} , and so {\mathcal{R}}_{\infty} approximately equals the smallest convergence rate among the convergence rates of the diagonal elements of the covariance matrix P(t) . Therefore, the gap {\mathcal{R}}_c-{\mathcal{R}}_{\infty} is always positive when noise appears in the original system (3.1). Without noise, this gap will be zero, and hence these two resilience measures coincide. Our simulations (Figures 4 and 5) and calculations (Table 2) demonstrate that increasing noise will decrease the gap {\mathcal{R}}_c-{\mathcal{R}}_{\infty} , which, in turn, decreases the time when {\mathcal{R}}_t^{\text{ave}} enters the gap. This implies that increasing the level of noise increases the resilience of the system.

    Table 2.  The difference {\mathcal{R}}_c-{\mathcal{R}}_{\infty} computed from the linearized coefficients in Table 1 in four scenarios corresponding to the four parameter sets in Figure 3.
    Low noise, strong interaction High noise, strong interaction
    {\mathcal{R}}_c^1=1.2635 {\mathcal{R}}_c^2=1.2539
    {\mathcal{R}}_{\infty}^1=0.0047 {\mathcal{R}}_{\infty}^2=0.0031
    {\mathcal{R}}_c^1 - {\mathcal{R}}_{\infty}^1=1.2588 {\mathcal{R}}_c^2 - {\mathcal{R}}_{\infty}^2=1.2508
    Low noise, weak interaction High noise, weak interaction
    {\mathcal{R}}_c^3=2.3488 {\mathcal{R}}_c^4=2.3307
    {\mathcal{R}}_{\infty}^3=0.0048 {\mathcal{R}}_{\infty}^4=0.0032
    {\mathcal{R}}_c^3 - {\mathcal{R}}_{\infty}^3=2.3440 {\mathcal{R}}_c^4 - {\mathcal{R}}_{\infty}^4=2.3275

     | Show Table
    DownLoad: CSV

    The concept of resilience has been used for a long time in community ecology, although it is much less developed for microbial systems, even though resilience is at the center of many pressing questions in microbiome research. Microbiome communities differ from macroecological communities in at least two important ways that impact the modeling of ecological stability. First, while stability metrics in macroecological systems have been developed for first-order multivariate autoregressive models [21,22], the nonlinear generalized Lotka–Volterra model is more commonly used in microbiome studies. This kind of model has been applied to study time-series inference and stability in intestinal microbiota [7,38], to model primary succession of murine gut communities [35], and to capture dynamics of complex microbial metapopulations in aquatic environments [36,37]. Second, and perhaps more importantly, microbial time-series data are incredibly noisy. Some of this noise is measurement error due to the sampling and sequencing processes involved in data collection. These challenges have been documented in various microbial systems, from cheese communities [40] to human microbiome studies with dense temporal sampling [41,42]. Furthermore, noise arises in experimental studies of gut microbiota reconstitution and synthetic community design [43,44], as well as in investigations of microbiome dynamics at different timescales [45,46]. But there is also intrinsic noise in the system, with sources including fluctuations in temperature, different local concentrations of species, or slight variations among individuals.

    Currently, existing definitions of resilience are based on the theory of ordinary differential equations or difference equations, so all existing stability metrics to measure resilience are not functions of noise. In this paper, we have built a general framework to allow stochasticity to be deeply involved in the dynamics of microbial communities. Specifically, we have proposed an SgLV model to describe the temporal dynamics of a microbial community with white noise added to each interaction. We then developed a general theoretical framework for how to calculate four resilience measures based on the SgLV model. Each measure captures a different facet of stability and, together, they give a picture of resilience.

    All four resilience measures are inherently functions of noise, since they are defined in terms of the ODE system established by the variance matrix of the linearized SDE system. Therefore, all four measures can be manipulated by the magnitude of the noise. One surprising result to emerge from our framework is that the return to the original stationary state is actually faster in the system with high noise than with low noise. This is primarily driven by the fact that the stationary distribution has higher variance in the system with high noise, so that there is less recovery required following a perturbation to return to the stationary distribution.

    The next step of this work is the possibility of applying this general framework to time-series data from real-world microbiomes. This application involves a fundamental question regarding the identifiability and estimability of the parameters within our proposed stochastic generalized Lotka–Volterra (SgLV) model. Parameter identifiability remains a significant challenge, particularly for complex hierarchical models involving microbiome dynamics. While traditional approaches to identifiability, such as those described in Browning 2020 [47] and in Remien 2021 [48], focus primarily on deterministic systems, fewer methodologies explicitly address stochastic models. To address this critical gap, methodologies such as data cloning, as described by Lele et al., 2010 [49] may be particularly valuable. Although originally designed for generalized linear mixed models, data cloning could be adapted to our SgLV framework to provide a practical graphical test for assessing parameter estimability. This approach involves generating multiple replicated datasets (clones) from microbiome time-series data and using Markov Chain Monte Carlo (MCMC) methods to sample from the resulting posterior distributions. By assessing whether the parameters have degenerate posterior distributions as the number of clones increases, we can determine whether each parameter—or specific functions of parameters, such as the resilience metrics proposed in our study—is estimable. Such a rigorous assessment of estimability would significantly enhance the reliability of the scientific conclusions drawn from applying our framework to real-world microbiome data.

    An important direction of future research is the application of our proposed stochastic generalized Lotka–Volterra (SgLV) framework and resilience measures to practical scenarios in microbiome studies. For example, predicting the recovery dynamics of microbial communities after perturbations such as single-dose antibiotic treatments represents a significant area of potential application. Recent studies, including those by Hayashi et al., 2024 [24] and Ponciano et al., 2024 [25], have demonstrated that microbial community compositions can shift between alternative stable states under environmental perturbations, influenced by both deterministic selection and stochastic drift. Using our developed resilience metrics, researchers could quantify the microbiome's rate and likelihood of returning to a "healthy" or pre-perturbation state, providing a valuable predictive tool for microbiome resilience. This approach could significantly aid experimental design, inform intervention strategies, and guide the interpretation of the microbiome's recovery patterns following ecological disturbances.

    Nevertheless, applying our SgLV framework and resilience metrics to real-world microbiome data requires careful consideration of several microbial-data-specific challenges. Microbiome datasets frequently suffer from the undersampling of rare taxa, which can lead to inaccurate abundance estimates and consequently affect the reliability of the calculated resilience measures. Additionally, relative abundance data, as commonly obtained through sequencing technologies, complicate the accurate estimation of interaction parameters because these data lack information on the absolute abundances. As emphasized by Ponciano et al., 2024 [25], addressing these challenges might involve combining sequencing-based relative abundances with absolute abundance quantification methods, such as quantitative polymerase chain reaction (PCR), to improve the estimation accuracy of interaction coefficients. Additionally, sparse temporal sampling can obscure rapid microbial dynamics, potentially reducing the reliability of resilience predictions. Future studies might employ robust parameter estimation methods such as data cloning (Lele et al., 2010 [49]) to assess parameter estimability rigorously and thus enhance confidence in resilience measures calculated from real-world microbiome time-series data.

    Recently, Flamant et al., 2020 [50] introduced an innovative approach using neural network solution bundles to approximate the solutions of ordinary differential equations, which could be extended effectively to stochastic generalized Lotka–Volterra (SgLV) models for microbiome studies. In this approach, neural networks are trained to approximate solution bundles of differential equations over a broad parameter space, enabling efficient exploration of model dynamics and rapid fitting to empirical data. Specifically, once trained, these neural network solution bundles could quickly approximate the trajectories predicted by the SgLV model, significantly reducing the computational overhead compared with traditional numerical solvers. When applied to real-world microbiome time-series data, this methodology could effectively address common data challenges. For instance, by efficiently handling sparse temporal sampling, neural network approximations could infer microbiome dynamics even when the data resolution is limited. Furthermore, the flexibility of neural networks might better accommodate biases arising from relative abundance data, potentially integrating with methods used to infer absolute abundances. Lastly, the approach could also help mitigate the challenges arising from undersampling rare taxa by efficiently exploring parameter spaces to identify robust solutions despite incomplete information on abundance. Thus, adopting neural network solution bundles presents a promising strategy for reliably applying the SgLV framework to complex microbiome datasets.

    All authors declare that they have not used Artificial Intelligence (AI) tools in the creation of this article.

    Research reported in this publication was supported by the National Institute of General Medical Sciences of the National Institutes of Health under Award Number P20GM104420.

    All authors declare that there are no conflicts of interest regarding the publication of this manuscript.



    [1] R. L. Siegel, K. D. Miller, A. Jemal, Cancer statistics, Ca-Cancer J. Clin., 69 (2019), 7–34. https://doi.org/10.3322/caac.21551 doi: 10.3322/caac.21551
    [2] C. DeSantis, J. Ma, L. Bryan, A. Jemal, Breast cancer statistics, Ca-Cancer J. Clin., 64 (2014), 52–62. https://doi.org/10.3322/caac.21203 doi: 10.3322/caac.21203
    [3] G. Giamas, A. Filipović, J. Jacob, W. Messier, H. Zhang, D. Yang, et al., Kinome screening for regulators of the estrogen receptor identifies LMTK3 as a new therapeutic target in breast cancer, Nat. Med., 17 (2011), 715–719. https://doi.org/10.1038/nm.2351 doi: 10.1038/nm.2351
    [4] Q. Feng, Z. Zhang, M. J. Shea, C. J. Creighton, C. Coarfa, S. G. Hilsenbeck, et al., An epigenomic approach to therapy for tamoxifen-resistant breast cancer, Cell Res., 24 (2014), 809–819. https://doi.org/10.1038/cr.2014.71 doi: 10.1038/cr.2014.71
    [5] B. Shaker, K. M. Tran, C. Jung, D. Na, Introduction of advanced methods for structure-based drug discovery, Curr. Bioinf., 16 (2021), 351–363. https://doi.org/10.2174/1574893615999200703113200 doi: 10.2174/1574893615999200703113200
    [6] L. Cai, C. Lu, J. Xu, Y. Meng, P. Wang, X. Fu, et al., Drug repositioning based on the heterogeneous information fusion graph convolutional network, Briefings Bioinf., 22 (2021), bbab319. https://doi.org/10.1093/bib/bbab319 doi: 10.1093/bib/bbab319
    [7] A. Ben Brahim, L. Mohamed, Ensemble feature selection for high dimensional data: a new method and a comparative study, Adv. Data Anal. Classif., 12 (2018), 937–952. https://doi.org/10.1007/s11634-017-0285-y doi: 10.1007/s11634-017-0285-y
    [8] L. Meng, N. Masuda, Epidemic dynamics on metapopulation networks with node2vec mobility, J. Theor. Biol., 534 (2022), 110960. https://doi.org/10.1016/j.jtbi.2021.110960 doi: 10.1016/j.jtbi.2021.110960
    [9] D. H. Le, D. Nguyen Ngoc, Drug repositioning by integrating known disease-gene and drug-target associations in a semi-supervised learning model, Acta Biotheor., 66 (2018), 315–331. https://doi.org/10.1007/s10441-018-9325-z doi: 10.1007/s10441-018-9325-z
    [10] R. Su, J. Hu, Q. Zou, B. Manavalan, L. Wei, Empirical comparison and analysis of web-based cell-penetrating peptide prediction tools, Briefings Bioinf., 21 (2020), 408–420. https://doi.org/10.1093/bib/bby124 doi: 10.1093/bib/bby124
    [11] Y. Yang, L. Chen, Identification of drug-disease associations by using multiple drug and disease networks, Curr. Bioinf., 17 (2022), 48–59. https://doi.org/10.2174/1574893616666210825115406 doi: 10.2174/1574893616666210825115406
    [12] Y. Saeys, A. Thomas, Y. Van de Peer, Robust feature selection using ensemble feature selection techniques, in Joint European Conference on Machine Learning and Knowledge Discovery in Databases, (2008), 313–325. https://doi.org/10.1007/978-3-540-87481-2_21
    [13] B. Seijo-Pardo, I. Porto-Díaz, V. Bolón-Canedo, A. Alonso-Betanzos, Ensemble feature selection: Homogeneous and heterogeneous approaches, Knowledge-Based Syst., 118 (2017), 124–139. https://doi.org/10.1016/j.knosys.2016.11.017 doi: 10.1016/j.knosys.2016.11.017
    [14] S. Zhang, Y. Chen, W. Zhang, R. Feng, A novel ensemble deep learning model with dynamic error correction and multi-objective ensemble pruning for time series forecasting, Inf. Sci., 544 (2021), 427–445. https://doi.org/10.1016/j.ins.2020.08.053 doi: 10.1016/j.ins.2020.08.053
    [15] H. Liu, Z. Duan, F. Han, Y. Li, Big multi-step wind speed forecasting model based on secondary decomposition, ensemble method and error correction algorithm, Energy Convers. Manage., 156 (2018), 525–541. https://doi.org/10.1016/j.enconman.2017.11.049
    [16] Z. Zhang, B. Krawczyk, S. Garcìa, A. Rosales-Pérez, F. Herrera, Empowering one-vs-one decomposition with ensemble learning for multi-class imbalanced data, Knowledge-Based Syst., 106 (2016), 251–263. https://doi.org/10.1016/j.knosys.2016.05.048 doi: 10.1016/j.knosys.2016.05.048
    [17] H. Guo, Y. Li, Y. Li, X. Liu, J. Li, BPSO-Adaboost-KNN ensemble learning algorithm for multi-class imbalanced data classification, Eng. Appl. Artif. Intell., 49 (2016), 176–193. https://doi.org/10.1016/j.engappai.2015.09.011 doi: 10.1016/j.engappai.2015.09.011
    [18] A. K. Sharma, R. Srivastava, Protein secondary structure prediction using character bi-gram embedding and bi-LSTM, Curr. Bioinf., 16 (2021), 333–338. https://doi.org/10.2174/1574893615999200601122840 doi: 10.2174/1574893615999200601122840
    [19] F. Weng, H. Zhang, C. Yang, Volatility forecasting of crude oil futures based on a genetic algorithm regularization online extreme learning machine with a forgetting factor: The role of news during the COVID-19 pandemic, Resour. Policy, 73 (2021), 102148. https://doi.org/10.1016/j.resourpol.2021.102148 doi: 10.1016/j.resourpol.2021.102148
    [20] Y. Xu, Y. Ma, Z. Zhu, J. Li, T. Lu, Construct comprehensive indicators through a signal extraction approach for predicting housing price crises, PloS One, 17 (2022), e0272213. https://doi.org/10.1371/journal.pone.0272213 doi: 10.1371/journal.pone.0272213
    [21] F. Weng, J. Zhu, C. Yang, W. Gao, H. Zhang, Analysis of financial pressure impacts on the health care industry with an explainable machine learning method: China versus the USA, Expert Syst. Appl., 210 (2022), 118482. https://doi.org/10.1016/j.eswa.2022.118482 doi: 10.1016/j.eswa.2022.118482
    [22] R. Polikar, Ensemble learning, in Ensemble Machine Learning, Springer, Boston, MA, (2012), 1–34. https://doi.org/10.1007/978-1-4419-9326-7_1
    [23] T. Chen, C. Guestrin, Xgboost: A scalable tree boosting system, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (2016), 785–794. https://doi.org/10.1145/2939672.2939785
    [24] L. Breiman, Random forests, Mach. Learn., 45 (2001), 5–32. https://doi.org/10.1023/A:1010933404324 doi: 10.1023/A:1010933404324
    [25] P. Bühlmann, S. Van De Geer, Statistics for High-Dimensional Data: Methods, Theory and Applications, Springer Science & Business Media, 2011. https://doi.org/10.1007/978-3-642-20192-9
    [26] L. Huang, S. Chen, Z. Ling, Y. Cui, Q. Wang, Non-invasive load identification based on LSTM-BP neural network, Energy Rep., 7 (2021), 485–492. https://doi.org/10.1016/j.egyr.2021.01.040 doi: 10.1016/j.egyr.2021.01.040
    [27] Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proc. IEEE, 86 (1998), 2278–2324. https://doi.org/10.1109/5.726791 doi: 10.1109/5.726791
    [28] H. Altun, A. Bilgil, B. C. Fidan, Treatment of multi-dimensional data to enhance neural network estimators in regression problems, Expert Syst. Appl., 32 (2007), 599–605. https://doi.org/10.1016/j.eswa.2006.01.054 doi: 10.1016/j.eswa.2006.01.054
    [29] D. E. Rumelhart, E. H. Geoffrey, R. J. Williams, Learning representations by back-propagating errors, Nature, 323 (1986), 533–536. https://doi.org/10.1038/323533a0 doi: 10.1038/323533a0
    [30] Y. Nakamura, O. Hasegawa, Nonparametric density estimation based on self-organizing incremental neural network for large noisy data, IEEE Trans. Neural Networks Learn. Syst., 28 (2016), 8–17. https://doi.org/10.1109/TNNLS.2015.2489225 doi: 10.1109/TNNLS.2015.2489225
    [31] W. Sun, Q. Gao, Exploration of energy saving potential in China power industry based on Adaboost back propagation neural network, J. Cleaner Prod., 217 (2019), 257–266. https://doi.org/10.1016/j.jclepro.2019.01.205 doi: 10.1016/j.jclepro.2019.01.205
    [32] C. Yan, T. Zhang, Y. Sun, H. Tang, H. Li, A hybrid variable selection method based on wavelet transform and mean impact value for calorific value determination of coal using laser-induced breakdown spectroscopy and kernel extreme learning machine, Spectrochim. Acta, Part B, 154 (2019), 75–81. https://doi.org/10.1016/j.sab.2019.02.007 doi: 10.1016/j.sab.2019.02.007
    [33] N. M. Nasrabadi, Pattern recognition and machine learning, J. Electron. Imaging, 16 (2007), 049901. https://doi.org/10.1117/1.2819119 doi: 10.1117/1.2819119
    [34] P. Tang, X. Yan, Y. Nan, S. Xiang, S. Krammer, T. Lasser, FusionM4Net: A multi-stage multi-modal learning algorithm for multi-label skin lesion classification, Med. Image Anal., 76 (2022), 102307. https://doi.org/10.1016/j.media.2021.102307 doi: 10.1016/j.media.2021.102307
    [35] F. Weng, Y. Chen, Z. Wang, M. Hou, J. Luo, Z. Tian, Gold price forecasting research based on an improved online extreme learning machine algorithm, J. Ambient Intell. Hum. Comput., 11 (2020), 4101–4111. https://doi.org/10.1007/s12652-020-01682-z doi: 10.1007/s12652-020-01682-z
    [36] K. Zhang, S. Zhang, Y. Song, L. Cai, B. Hu, Double decoupled network for imbalanced obstetric intelligent diagnosis, Math. Biosci. Eng., 19 (2022), 10006–10021. https://doi.org/10.3934/mbe.2022467 doi: 10.3934/mbe.2022467
    [37] J. Wang, Prediction of postoperative recovery in patients with acoustic neuroma using machine learning and SMOTE-ENN techniques, Math. Biosci. Eng., 19 (2022), 10407–10423. https://doi.org/10.3934/mbe.2022487 doi: 10.3934/mbe.2022487
    [38] C. Wei, K. Sohn, C. Mellina, A. Yuille, F. Yang, Crest: A class-rebalancing self-training framework for imbalanced semi-supervised learning, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2021), 10857–10866.
    [39] D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, preprint, arXiv: 1412.6980. https://doi.org/10.48550/arXiv.1412.6980
    [40] P. Wang, K. Li, B. Xiao, K. Li, Multi-objective optimization for joint task offloading, power assignment, and resource allocation in mobile edge computing, IEEE Internet Things J., 9 (2021), 11737–11748. https://doi.org/10.1109/JIOT.2021.3132080
    [41] R. Zheng, M. Li, Z. Liang, F. Wu, Y. Pan, J. Wang, SinNLRR: a robust subspace clustering method for cell type detection by non-negative and low-rank representation, Bioinformatics, 35 (2019), 3642–3650. https://doi.org/10.1093/bioinformatics/btz139 doi: 10.1093/bioinformatics/btz139
    [42] P. Wang, W. Zhu, B. Liao, L. Cai, L. Peng, J. Yang, Predicting influenza antigenicity by matrix completion with antigen and antiserum similarity, Front. Microbiol., 9 (2018), 2500. https://doi.org/10.3389/fmicb.2018.02500 doi: 10.3389/fmicb.2018.02500
    [43] Z. Dimitris, Healthcare access as an important element for the EU's socioeconomic development: Greece's residents' opinions during the COVID-19 pandemic, Natl. Account. Rev., 4 (2022), 362–377. https://doi.org/10.3934/NAR.2022020 doi: 10.3934/NAR.2022020
    [44] F. Corradin, M. Billio, R. Casarin, Forecasting economic indicators with robust factor models, Natl. Account. Rev., 4 (2022), 167–190. https://doi.org/10.3934/NAR.2022010 doi: 10.3934/NAR.2022010
    [45] D. Panarello, G. Tassinari, The consequences of COVID-19 on older adults: evidence from the SHARE Corona Survey, Natl. Account. Rev., 4 (2022), 56–73. https://doi.org/10.3934/NAR.2022004 doi: 10.3934/NAR.2022004
    [46] Z. Li, H. Chen, B. Mo, Can digital finance promote urban innovation? Evidence from China, Borsa Istanbul Rev., 2022 (2022). https://doi.org/10.1016/j.bir.2022.10.006 doi: 10.1016/j.bir.2022.10.006
    [47] Y. Liu, P. Failler, Y. Ding, Enterprise financialization and technological innovation: Mechanism and heterogeneity, PLoS One, 17 (2022), e0275461. https://doi.org/10.1371/journal.pone.0275461 doi: 10.1371/journal.pone.0275461
  • Reader Comments
  • © 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(2406) PDF downloads(138) Cited by(4)

Figures and Tables

Figures(5)  /  Tables(6)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog