Review Special Issues

Advances in computational methods for identifying cancer driver genes


  • Cancer driver genes (CDGs) are crucial in cancer prevention, diagnosis and treatment. This study employed computational methods for identifying CDGs, categorizing them into four groups. The major frameworks for each of these four categories were summarized. Additionally, we systematically gathered data from public databases and biological networks, and we elaborated on computational methods for identifying CDGs using the aforementioned databases. Further, we summarized the algorithms, mainly involving statistics and machine learning, used for identifying CDGs. Notably, the performances of nine typical identification methods for eight types of cancer were compared to analyze the applicability areas of these methods. Finally, we discussed the challenges and prospects associated with methods for identifying CDGs. The present study revealed that the network-based algorithms and machine learning-based methods demonstrated superior performance.

    Citation: Ying Wang, Bohao Zhou, Jidong Ru, Xianglian Meng, Yundong Wang, Wenjie Liu. Advances in computational methods for identifying cancer driver genes[J]. Mathematical Biosciences and Engineering, 2023, 20(12): 21643-21669. doi: 10.3934/mbe.2023958

    Related Papers:

    [1] Wentao Hu, Yufeng Shi, Cuixia Chen, Ze Chen . Optimal strategic pandemic control: human mobility and travel restriction. Mathematical Biosciences and Engineering, 2021, 18(6): 9525-9562. doi: 10.3934/mbe.2021468
    [2] Pannathon Kreabkhontho, Watchara Teparos, Thitiya Theparod . Potential for eliminating COVID-19 in Thailand through third-dose vaccination: A modeling approach. Mathematical Biosciences and Engineering, 2024, 21(8): 6807-6828. doi: 10.3934/mbe.2024298
    [3] Qian Shen . Research of mortality risk prediction based on hospital admission data for COVID-19 patients. Mathematical Biosciences and Engineering, 2023, 20(3): 5333-5351. doi: 10.3934/mbe.2023247
    [4] H. Swapnarekha, Janmenjoy Nayak, H. S. Behera, Pandit Byomakesha Dash, Danilo Pelusi . An optimistic firefly algorithm-based deep learning approach for sentiment analysis of COVID-19 tweets. Mathematical Biosciences and Engineering, 2023, 20(2): 2382-2407. doi: 10.3934/mbe.2023112
    [5] Mattia Zanella, Chiara Bardelli, Mara Azzi, Silvia Deandrea, Pietro Perotti, Santino Silva, Ennio Cadum, Silvia Figini, Giuseppe Toscani . Social contacts, epidemic spreading and health system. Mathematical modeling and applications to COVID-19 infection. Mathematical Biosciences and Engineering, 2021, 18(4): 3384-3403. doi: 10.3934/mbe.2021169
    [6] Jun Liu, Xiang-Sheng Wang . Dynamic optimal allocation of medical resources: a case study of face masks during the first COVID-19 epidemic wave in the United States. Mathematical Biosciences and Engineering, 2023, 20(7): 12472-12485. doi: 10.3934/mbe.2023555
    [7] Tao Chen, Zhiming Li, Ge Zhang . Analysis of a COVID-19 model with media coverage and limited resources. Mathematical Biosciences and Engineering, 2024, 21(4): 5283-5307. doi: 10.3934/mbe.2024233
    [8] Giulia Luebben, Gilberto González-Parra, Bishop Cervantes . Study of optimal vaccination strategies for early COVID-19 pandemic using an age-structured mathematical model: A case study of the USA. Mathematical Biosciences and Engineering, 2023, 20(6): 10828-10865. doi: 10.3934/mbe.2023481
    [9] Holly Gaff, Elsa Schaefer . Optimal control applied to vaccination and treatment strategies for various epidemiological models. Mathematical Biosciences and Engineering, 2009, 6(3): 469-492. doi: 10.3934/mbe.2009.6.469
    [10] Alberto d'Onofrio, Mimmo Iannelli, Piero Manfredi, Gabriela Marinoschi . Epidemic control by social distancing and vaccination: Optimal strategies and remarks on the COVID-19 Italian response policy. Mathematical Biosciences and Engineering, 2024, 21(7): 6493-6520. doi: 10.3934/mbe.2024283
  • Cancer driver genes (CDGs) are crucial in cancer prevention, diagnosis and treatment. This study employed computational methods for identifying CDGs, categorizing them into four groups. The major frameworks for each of these four categories were summarized. Additionally, we systematically gathered data from public databases and biological networks, and we elaborated on computational methods for identifying CDGs using the aforementioned databases. Further, we summarized the algorithms, mainly involving statistics and machine learning, used for identifying CDGs. Notably, the performances of nine typical identification methods for eight types of cancer were compared to analyze the applicability areas of these methods. Finally, we discussed the challenges and prospects associated with methods for identifying CDGs. The present study revealed that the network-based algorithms and machine learning-based methods demonstrated superior performance.



    In the lack of any known treatment protocol or that of a cure, one of earliest responses of the outbreak of SARS-CoV-2 or the COVID-19 disease [1] was to establish a boundary around the epicenter of the outbreak (Hubei province in China)—a cordon sanitaire—on January 23, 2020 to prevent the infection from spreading. It had, nevertheless, spread out, triggering similar responses from various other countries at varying degree of duration and scale of restrictions (see e.g., [2]). Many such restrictions exist till date, while some of it were lifted either for a short or a longer time.

    Indeed, cordon sanitaire is an old technique of infectious disease containment. The use of the phrase goes back to 1821, when 30000 French troops were deployed by Duke de Richelieu apparently to prevent yellow fever to spread from Barcelona to France [3], but its first documented use dates even further back to 1523 in Malta [4]. With a varying degree of successes in the past, the scale of its implementation has never been larger than the current one – affecting almost the entire population on the planet. While it is still early to discuss the full impact of such restrictions on different spheres of the society, it is possible to assess some of the impacts of the restrictions on spreading of the disease, on early economic fallout and the burdens placed on the health infrastructures.

    It is possible to place the question of imposition and lifting of the cordon sanitaire as an optimization problem. The gains it makes in terms of containing the spreading of infection, the costs that need to be paid in terms of higher infections within the contained community and the economic fallout due to halting of businesses and finally the constraints that the corresponding health infrastructure is able to bear the burden of the growing infections, are the parameters to be considered in the problem. We outline here the above mentioned factors from the points of views of (a) early analysis of the data for COVID-19 and past data of other epidemics, (b) study of compartmentalized models that capture the qualitative picture in terms of few parameters, and (c) artificial intelligence (AI) and machine learning (ML) approaches.

    In the early stages of the spreading of COVID-19, data driven approaches were able to trace the correlations of travel patterns and infection spreading (see e.g., [5]) in China. It has a more well documented study for earlier epidemics (SARS [6], Ebola [7]) although in a much smaller scale. Nevertheless, it is crucial to study these data driven approaches, given that the effects seen in the real data for the imposed restrictions were later used as inputs for the various other approaches such as compartmentalized models and also as training sets for ML approaches. Therefore, in Section 2 we outline such studies that essentially correlate the infection spreading with human traffic. The clear positive correlation in the early stages and a subsequent anti-correlation [8] outlines the mechanisms of primary and secondary stages of the infections, which are very useful insights for subsequent models.

    The mathematical modeling of epidemic spreading also has a long history [9,10]. It was the pioneering physicist Daniel Bernoulli who first introduced the mathematical model approaches of epidemic spreading [11] in 1766. Since then, the most used model have been the Susceptible-Infected-Removed (SIR) model [12] and its other variants, generally called compartmentalized models [13,14] – where the total population is divided into groups of populations and the dynamics of the model proceeds through movements of the populations between these compartments i.e., a susceptible individual can get infected and then subsequently recover and so on. Extensions of this model include introduction of other plausible compartments e.g., exposed, representing individuals who came in contact with infected population but not yet showing symptoms. Even further divisions depending on the severity of the infections can estimate the load of patients needing extensive medical attention. The key parameters in these models are the rates at which the populations are relabeled from one compartment to the other i.e., infection rate, recovery rate and so on. These parameters are sometimes estimated from the data driven approaches mentioned above and in other cases these are estimated using a combination of clinical evidences and available data (see e.g., [15]) to maintain the connection between the models and the real world dynamics of the epidemic spreading. Also, the effects of imposed restrictions are assumed to be mirrored in the variations in these parameters. The models with such estimated parameters and their variations are then used to estimate the spreading scale of the epidemic and the possible effects of movement restrictions. Furthermore, given the correlation of the scale of epidemic spreading and the negative impact on economy (see e.g., [16]), it also gives an insight into the economic cost. Therefore, a dynamical optimization of the imposed restrictions can be attempted. We outline these efforts in Section 3.

    Finally, a multidimensional set of data with many attributes is something that can used for a systematic statistical trend analysis to gain insights that are not immediately apparent. This brings in the machine learning approaches for the study of the real data for the pandemic. There are specific areas in which the AI-ML approaches can help in advancing our understanding [19]. The early warning of the outbreak, the predictions for total infections and/or end-time for the pandemic, implementations of physical distancing are some such areas. The outstanding challenge in these approaches is the lack of sufficient training data sets that are reliable for a stable prediction. In the case of COVID-19, data from previous epidemics (SARS, Ebola, Zika virus) were used in some cases with suitable adjustments (see e.g., [20,21]), in some other cases synthetic data from optimized parameters of a simplified model [22] were also used. The successes and limitations of these approaches are discussed in Section 4.

    In drawing any conclusion on the effectiveness of a mitigation strategy for an epidemic, it is essential to analyze its effect on the real data. It is often challenging task to have a reliable set of data – not only due to the lack of testing or documentations, but also due to the noise accumulations in news outlets to social media around a highlighted event [25].

    Nevertheless, there have been many attempts to understand different aspects of the COVID-19 pandemic, such as estimation of reproduction rate [26], forecasting of end-time [27] to effectiveness of protective drugs [28], from the analysis of the available data,

    In terms of the movement restriction strategies, at the early stage of the spread of COVID-19, it was possible to trace the correlation between the travel pattern from the Hubei province and the detection of infected individuals outside the province. Indeed until end of January, 2020, 80% of all cases were detected within the province [8] and only after that cases outside the province started rising. Kraemer et al. [8] studied the human mobility pattern using the data from Baidu Inc. and recorded the effects of imposing the cordon sanitaire from 23rd January, 2020. Their finding suggests that the initial bias in the age group and gender in the detected cases were due to the travel history of those individuals to the Hubei province. Indeed, following the imposition of the restrictions, those biases eventually disappeared, suggesting that the cases after that time were due to the secondary infections. Indeed, there was a very clear positive correlation between the COVID-19 growth rate and other provinces in China and the human mobility from Wuhan, before the travel restriction was imposed (see Figure 1). The correlation started decreasing after a week of the imposed restrictions and beyond that it showed negative correlation. This implies that an early imposition of the travel restriction helps in containing the infection, but such restrictions are less useful, at least in the context of [8], when secondary infections started spreading outside a localized region. This was a key observation that formed the basis of the input parameters of the mathematical modeling approaches that we discuss in the next section.

    Figure 1.  The figure on the left (from [8] with permission from AAAS), depicts the relationship between the human mobility and the rate of infection in China, before and after the imposition of cordon sanitaire. There is a clear positive correlation between the two quantities before imposition of restrictions. On the right hand side figure, simulations of SIR model with optimized mobility of individuals among different regions of varying degree of risks (see [34]) are shown. For different duration of travel restrictions (indicated by the start and end dates), the fraction of infected individuals moved correlate strongly with total infection fraction.

    There are a myriads of factors that can influence a respiratory infection such as COVID-19. First, the interaction patterns of humans, the carriers of the virus, is complex and highly heterogeneous and to a large extent without much of accessible data. Second, especially during the first months of the virus spreading, lack of testing facilities contributed to much of the fluctuations in the data. Such fluctuations continue even till date, given that a substantial portion of the infected individuals are not symptomatic [29] but can still be infected and thereby can infect others. Third, the effective virulence of the infection is a dynamic quantity. This is because of the mutation of the virus itself [30] and also because of the various restrictive measures imposed. Both of these factors vary with time and as well as space. Therefore, the complexity of the system and the noise in the available data are both very high.

    Nevertheless, attempts to formulate a mathematical model based description of epidemics have been made for over several centuries [11]. This is partly because models provide us with insights that are otherwise inaccessible by simply studying the data. In complex systems, simplified model approaches have been very useful in gaining critical insight into the system, even though the models in question ignored many realistic features of the system under study. An outstanding example of success comes from the study of magnetism phenomena through the Ising model [31,32]. However, in modeling epidemic dynamics, there are certain challenges regarding the choices of parameter values, on which the main conclusions are dependent. These choices, therefore, need to be based on medical inputs and also on data analysis.

    While the above mentioned criticisms are applicable for epidemic spreading models, there are two points to note before we proceed into the specific modeling approaches. First, the goals of epidemic spreading models and that of (laboratory scale) physical systems can vary. With just a model alone, without inputs from real data, no epidemic model attempts in making quantitative predictions. Second, although a critical point does not exist in epidemic spreading models in itself, it has been shown using spatial pattern of the spreading data for COVID-19 pandemic that it follows a fractal growth [33]. Indeed, it was also shown recently [22] that if a simple model is to make predictions having least errors with the real data, the parameters in the model is to be set in such a way as to have the resulting spreading pattern in fractal form. Although not arising out of a criticality in the epidemic model, there exist scale free characteristics in such fractal geometry. With this in mind, we discuss the various models and the results of incorporating movement restrictions in those models for the case of COVID-19 spreading.

    Bernoulli first proposed such an attempt in 1766 [11]. This class of models are sometimes termed as compartmentalized models, since the basic idea involves dividing the total population into groups, based on their exposure (or lack of it) to the virus. The most used and the most simple version of the model involves dividing (at any instant of time t) the entire population into three groups: Susceptible S(t), Infected I(t) and Removed R(t) (see Figure 2(a)). First proposed in 1927 [12], the model assumes that the total population S(t)+I(t)+R(t)=N is constant throughout the dynamics. At t=0, of course, N=S0+I0, where S0 and I0 represent the initial infection and susceptible population respectively. A susceptible individual, while coming in contact with an infected individual, can get infected with a certain rate r, and an infected individual is removed (due to recovery or death) with a rate α. There is no scope of re-infection in this model, although other variants exist [9,10] where such scenarios are considered.

    Figure 2.  A schematic representation of compartmentalized models of epidemic spreading. (a) The SIR model (see Eq (3.1)), representing the simplest variant that qualitatively reproduces the dynamics of the pandemic. (b) The SEIRD model (see e.g., [35,49]) depicted here divides the total population, which is assumed to be constant, in different groups and the arrows indicate the directions in which the population can move from one compartment/state to the other and x, y indicate the corresponding rates. The numerical values of x and y are estimated depending on the context of the model application and the mean-field governing equations are given in Eq (3.2).

    A mean-field treatment of the model is straightforward, which involves writing down the differential equations governing each of the three groups:

    dS(t)dt=rI(t)S(t)dI(t)dt=rI(t)S(t)αI(t)dR(t)dt=αI(t). (3.1)

    The temporal evolution of the infected number I(t) from this model in mean-field and in compact lattices behaves in a way similar to a wave of infection in COVID-19 (and other epidemic) spreading. In this form, the model does not give multiple peaks in infections. Indeed, it is easy to see that the maximum value of the infection will be Imax=I0+S01q(1+ln(qS0)), where q=r/α=R0/N, where R0 is the reproduction rate. The quantity Imax is important because, this gives the estimate of the maximum load the healthcare infrastructure needs to support. SIR models were used in studying effects of optimal migrations (see e.g., [34]). Indeed, in more realistic variants of the model, this is quantity that were estimated for various different countries in order to make the above mentioned load and also to design optimization strategies of implementing mitigating responses, including travel restrictions.

    The above mentioned variant is the simplest one that gives the qualitative features of the current pandemic. However, there are multiple other variants of the model that includes more realistic features. These still fall under the category of the compartmentalized models, since the basic feature of dividing the population into different compartments/states still exists. Among these variants, one is the SEIR model, where the additional state E(t) denotes the part of the population that are exposed to the virus (see Figure 2). The exposed individuals can infect susceptible population. This is an important extension, particularly when the maximum case load is to be estimated. In Xing et al. [35] effects of migrations were explicitly included. The dynamics evolved following the equations:

    dSdt=β1SE+β2SIN+(ab)SdEdt=β1SE+β2SINδE+(ab)EdIdt=δEmI+(ab)IdQdt=mIγQdRdt=γQ, (3.2)

    where Q denotes the confirmed cases, δ is the infection rate, γ is the recovery rate, m is the confirmation rate, β1,β2 denote the transmission incidence rates from exposed individuals and infected individuals to susceptible individuals, respectively and a,b denote the immigration and emigration rates respectively. The exposed individuals can infect the susceptible population, but they are pre-symptomatic, as opposed to the infected population. If free mixing is allowed, then β1=β2. They differ only when exposed individuals are quarantined following a potential exposure. The model parameters can then be estimated from actual data and effects of travel restrictions can be studied. In particular, a restriction in travel would put the parameters a and b to zero, which can again be incorporated during the dynamics with the resumption of work. In other words, a dynamical variation of these parameter values can reflect the changes in policies regarding travel restrictions and the subsequent changes in the total infection rate.

    Another variant of the compartmentalized model is the SIRD mode, where the final state refers to death due to the disease. Other than these, there are more case specific variants that, for example, consider various severity of health conditions following an infection (see e.g., [36]). Such details of the model requires additional input from the real data, which are done for some specific countries/regions.

    Apart from adding different states in the original SIR model, another direction of realistic extensions have been to incorporate the effects of the model topology. The above mentioned mean-field nature of the dynamics can prevail only under well homogeneous mixing of the population, which is certainly not the case particularly when travel restrictions are imposed. Also, the overly restrictive fixed lattice arrangements, where the infections can only spread through nearest or next nearest neighboring individuals, is unrealistic. For both of these limits (lattice models and mean-field), one way to reach the intermediate realistic scale is to tune the infection rate. Another way to achieve the intermediate state is to modify the topology in which the model is studied. This can involve pruning the fully connected graph to, say, an Euclidean topology [37], or to introduce disorder in the lattice models, say, in terms of site dilution [22]. In [23], for example, the above mentioned spatial patterns were explicitly taken into account in a SEIR-like model. Particularly, the effects of presymptomatic infectious individuals were studied in a metacommunity model for a network of 107 nodes, representing provinces and metropolitans in Italy with closely monitored population movements. The role of presymptomatic infectious individuals in spreading the virus underlines the need of containment measures and restrictions of population movements.

    In all the above mentioned variants, the movement time for population is infinitesimally slow. However, a finite speed of spreading of the epidemic is more realistic version, which was considered in [24]. The hyperbolic partial differential equations considered there are reduced to the classical SIR model in appropriate limits (zero relaxation time in each state and infinite propagation speed). Here also the role of imposition of travel restrictions in the epidemic spreading is captured, particularly in the scenarios where the reproduction ratio is higher then one.

    As mentioned before, Tuite et al. [36] studies a SEIR type model for estimating health infrastructure load in Ontario, Canada. The model is structured in 5-year age group layers. The interactions within the age groups [38], the presence of comorbidities (hypertension, heart diseases, asthma, stroke, diabetes and cancer) were also considered in estimating the severity of the infection (e.g., required ICU care). The dynamics was initiated with uniformly distributed initial infections and then the effects of control strategies such as extensive testing and physical distancing measures were studied using a fixed duration and also in a dynamically tuned manner depending on required ICU cares. It was found that dynamically introduced restriction measures were more effective than a fixed duration restriction, with potentially shorter period of physical distancing.

    The effect of heterogeneity, particularly in age-group structure plays a rather complex role in spreading of COVID-19 (see e.g., [39,40]). In [41], the authors pointed out the role of uncertainty in the available data for an age-structured community. Age heterogeneity is also reflected in the social mixing, which in turn can affect herd immunity [42]. An early lifting of restrictions, therefore, can trigger increase in the total infection (see also [34]).

    In the US also such compartmentalized model (SEAIR) approach was taken to find optimal control in the outbreak [43]. The additional state A(t) represents the estimated 20–40% of the asymptomatic cases, who can still be carriers of the infection. Here also it was concluded that the effect of interventions (testing, isolation, physical distancing) are more effective in the early stages of the dynamics than at later stages, even if the measures are more drastic later on. Also, a periodic on-off strategy, similar to [36], is found to be more effective in controlling the spreading and also conjectured to be more palatable.

    A similar approach was taken by Prem et al. [44] for the spreading of infection in Wuhan, China. As in [43], a SEIR model with different age groups having different rates of infections were studied. The effects of imposing continued restrictions, modeled by taking the corresponding interaction matrices between different age groups, seen to lower the total infection rate. Also, an early lifting of such restrictions leads to secondary peaks (see also [34]). With a similar SEIR type model, it was shown in [45] that the effective reproduction index Rt decayed 2.35 on January 16 (one week before cordon sanitaire) to 1.05 on January 31 (one week after cordon sanitaire). This also reinforce the benefit of early imposition of restrictions.

    As discussed above, there is a general consensus regarding the benefit of early imposition of cordon sanitaire in reducing the load on healthcare systems. A subsequent dynamical (on-off) interventions (travel restrictions), rather than a prolonged period of restriction, also seem to work better in reducing the total spreading. However, the optimization needs to consider the relative rates in which the cordoned-off and the remaining population gets affected. Also, while relaxing the restrictions, the optimization function for an individual may not be the same as the global optimized state.

    Espinozo et al. [46] noted that when unrestricted movements are allowed between two low risk communities, the chances of secondary infections increase in those communities, but the overall epidemic size is reduced. On the other hand, imposition of cordon sanitaire around a high risk community – the original practice of such type – reduces secondary infections, but increases the overall epidemic size, since the infection greatly affects the high risk community. Therefore, it is not straightforward to asses the benefits of such travel restrictions and also the time of removal of such restrictions. Indeed, the overall process of implementing the mitigation strategies can be viewed from the point of view of control theory [47] with a limit on the maximum active cases as a constraint that represent the load on health care infrastructure. In the following section, we will discuss whether a machine learning approach can optimize the restriction times, so as to limit the spreading of epidemic. Before that, however, it is also interesting to note that while the objective of the governments would be to optimize the travel restrictions so as to minimize the epidemic size at a reduced economic fallout, from the point of view of an individual, that objective may not match. Particularly, given a chance, an individual would travel to a lower risk community rather than to stay in a higher risk community. But given that many other individuals might also try the same, the said low risk community might not remain low risk due to spreading of secondary infections. This situation can be viewed from a game theory perspective [34], where the situation is that of a set of coupled minority games, played in parallel. The low-risk regions play the role of a limited resource that the agents compete to obtain. If only two regions were considered, then it would be the classical minority game limit i.e., a less crowded (infected) region is more beneficial. But as the number of regions is higher than two, and the agents can choose between any two of those multiple regions, it still is a minority game problem from the perspective of a single agent, but it is a couple minority game because the agents present in particular region at any instance might have different regions as their other (second) choice. It was seen that a restriction on the number of travel upon an individual is more effective than imposing a full stoppage of travel. But similar to what is noted in [44], for example, an early lifting of the restrictions can bring a second wave of infections (Figure 3). However, this is an idealized scenario from several aspects, especially the limited numbers of regions considered, the homogeneity assumed in the population, the limited choices of movement for each agent and so on.

    Figure 3.  Simulations of SIR model with optimized movements between communities of different risks (see [34]). It is seen that the infected fraction in the high-risk region Ii(t)/N shows a secondary peak once the travel restrictions are lifted early (time period indicated in the figures). However, the overall (relative) size of the second-wave peak (Itot/N) is seen to be larger (see also [44,46]).

    Here we aim to revisit the recent scientific contributions based on Artificial Intelligence (AI) to the fight against COVID-19 pandemic. In recent past applications of AI in different aspects of epidemiology is instrumental in policy and medical analysis measuring the cost of the pandemic in terms of lives and economic damage (see e.g., [17,18]) etc. The recent literature ranges from early warning, tracking and prediction to social control which often influence the migration of the people to avoid the viral disease (see e.g., [19]). In January 2020 China imposed very strict lockdown to contain the very first Covid-19 outbreak, which were in place till April 2020. During that period researchers were speculative about the impact of these policies on virus spreading. The AI based techniques are primarily used to predict the duration of social restrictions in different geographical regions as it helped in reducing the number of infections significantly. In this direction Yang et al. [48] employed a modified susceptible-exposed-infected-removed (SEIR) epidemiological strategy to predict the epidemic progression by including the people's migration data prior to and after the January 2020 along with the COVID-19 epidemiological at that point of time. The authors used the Long-Short-Term-Memory (LSTM) model of recurrent neural network (RNN) to estimate the number of newly infected people by processing various time-series problems. The 2003 SARS outbreak statistics is used to train the devised model. The devised model fed with the COVID-19 spreading parameters, such as the rate of spreading, infection probability, recovery rate etc. This SEIR based approach was useful in estimating peaks and sizes of the COVID-19 epidemic. The model constrained by the inadequate data set which results in relatively simple network configuration and may suffer from overfitting problem. In a similar study Xing et al. [35] studied the impact of migration of people using Baidu's migration data of Guangdong and Hunan provinces. As mentioned before, the author developed a three-stage dynamical model. It uses SIER, where a time variant function is used for susceptible S(t), infected I(t), exposed E(t) and removed R(t) individuals (see Figure 2). In the first stage i.e., early stage of the epidemic spreading the model assumes that the confirmed individuals Q(t) are not migrating. And the COVID-19 transmission dynamical modeling is represent using Eq (3.2). The model parameters were estimated using mobility data from Baidu.

    Further, very similar models were used for the remaining two stages to characterize the imposition of the social curbs and resumption of the regular life respectively. Afterwards the mathematical analysis of only first stage is carried out and reproduction rate is calculated. The other parameters values were calculated from Baidu's data and using the methodologies such as least-square method. The result shows that scale of infection is low in the province which emigrated the population. However, the province receives the population is exactly the opposite. And the authors predicted that the province which emigrated the population in the first stage may suffer after the easing of the social curbs (see also [34] in this context). However, this work suffers from many shortcomings such as limited and erroneous data availability, not considering the asymptomatic population and spatial diffusion characteristics.

    To study the health and economic impact Khadilkar et al. [49] devised an AI-based system. It predicts the best possible lockdown policies to control Covid19 spreading and minimize its economic impact. The reinforcement learning based approach learns from different policies which are represented as a function of disease and population parameters. The disease progression model is primarily based on SIER as depicted in the Figure 2. Where S is susceptible, E represents exposed, IS represents infected, IA indicates asymptomatic, D is dead, and R indicates recovered individuals. Further the number indicates the probabilities in the transitions from one state to another. The The approach exposes the limitations of the imperfect lockdowns and it can be utilized to investigate various policies by using tunable parameters. Further, the model may be useful to determine more fine-grained social curbs to prevent the COVID-19 spreading.

    In another reinforcement learning based approach by Ohi et al. [50] demonstrated how an agent's actions may have different possible outcome based on the spreading of the disease and economic conditions. A virtual pandemic is similar to the COVID-19 is simulated to train the system. Afterwards the training the agent chooses the optimal strategy which reduces epidemic spreading in a financially viable manner. The analysis of the results shows that, to reduce the first surge of infections the system opted for a longer period of lockdown. Again, to curb the successive waves of infections the system chooses a combination of recurrent lockdowns and shorter periods of lockdowns. Although, the model is able to provide a middle ground between epidemic spreading and economic gains. However, a comparative study between humanitarian loss and economic gains when total lockdown is imposed and when recurrent lockdown would have been interesting.

    The first response of the governments in most of the countries to the outbreak of COVID-19 have been to impose restrictions on human mobility from high infected regions. Following the spreading of the virus in most countries in the world, the subsequent response have been to quarantine the infected population and also to impose local restrictions on human mobility. Those restrictions, while helpful in limiting the maximum number of active cases, did have and will continue to have severe societal and economic impacts. Here we reviewed the multiple facets of such restrictions on mobility in different countries, based on the analysis of data, study of models and machine learning approaches. The numerical results discussed here (see e.g., [34,44,47]) suggests that while an early imposition of restrictions are useful, for the subsequent period, a periodic relaxations of the restrictions is perhaps a more effective/palatable strategy than to have a prolonged imposition of restrictions.

    The process of optimizing the restriction period is not straightforward and likely to differ among different countries, based on their socio-economic activities and healthcare infrastructure. A major challenge in finding such optimized strategy has been to gather noise-free data regarding the spreading dynamics of the virus. Due to the complex nature of the human interactions, compartmentalized modeling approaches are also hard to implement. However, we have discussed various efforts that address these issues. For example, an age-based hierarchy in the models seem to help the optimization process, given that the nature of interactions and required health-care vary among different age groups. Also, in using data driven machine learning approaches, use of earlier epidemic data for training can be a useful strategy.

    The authors declare no conflict of interests.



    [1] B. Vogelstein, N. Papadopoulos, V. E. Velculescu, S. Zhou, L. A. Diaz, K. W. Kinzler, Cancer genome landscapes, Science, 339 (2013), 1546–1558. https://doi.org/10.1126/science.1235122 doi: 10.1126/science.1235122
    [2] J. Gao, B. A. Aksoy, U. Dogrusoz, G. Dresdner, B. Gross, S. O. Sumer, et al., Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal, Sci. Signal, 6 (2013), l1. https://doi.org/10.1126/scisignal.2004088 doi: 10.1126/scisignal.2004088
    [3] S. Agajanian, O. Oluyemi, G. M. Verkhivker, Integration of random forest classifiers and deep convolutional neural networks for classification and biomolecular modeling of cancer driver mutations, Front. Mol. Biosci., 6 (2019), 44. https://doi.org/10.3389/fmolb.2019.00044 doi: 10.3389/fmolb.2019.00044
    [4] M. I. Klein, V. L. Cannataro, J. P. Townsend, D. F. Stern, H. Zhao, Identifying combinations of cancer drivers in individual patients, bioRxiv, (2019), 674234. https://doi.org/10.1101/674234 doi: 10.1101/674234
    [5] F. Cheng, J. Zhao, Z. Zhao, Advances in computational approaches for prioritizing driver mutations and significantly mutated genes in cancer genomes, Briefings Bioinf., 17 (2016), 642–656. https://doi.org/10.1093/bib/bbv068 doi: 10.1093/bib/bbv068
    [6] W. F. Guo, S. W. Zhang, T. Zeng, T. Akutsu, L. Chen, Network control principles for identifying personalized driver genes in cancer, Briefings Bioinf., 21 (2020), 1641–1662. https://doi.org/10.1093/bib/bbz089 doi: 10.1093/bib/bbz089
    [7] M. Sinkala, Mutational landscape of cancer-driver genes across human cancers, Sci. Rep., 13 (2023), 12742. https://doi.org/ARTN 1274210.1038/s41598-023-39608-2
    [8] M. S. Lawrence, P. Stojanov, C. H. Mermel, J. T. Robinson, L. A. Garraway, T. R. Golub, et al., Discovery and saturation analysis of cancer genes across 21 tumour types, Nature, 505 (2014), 495–501. https://doi.org/10.1038/nature12912 doi: 10.1038/nature12912
    [9] N. D. Dees, Q. Zhang, C. Kandoth, M. C. Wendl, W. Schierding, D. C. Koboldt, et al., MuSiC: identifying mutational significance in cancer genomes, Genome Res., 22 (2012), 1589–1598. https://doi.org/10.1101/gr.134635.111 doi: 10.1101/gr.134635.111
    [10] D. Tamborero, A. Gonzalez-Perez, N. Lopez-Bigas, OncodriveCLUST: exploiting the positional clustering of somatic mutations to identify cancer genes, Bioinformatics, 29 (2013), 2238–2244. https://doi.org/10.1093/bioinformatics/btt395 doi: 10.1093/bioinformatics/btt395
    [11] J. P. Hou, J. Ma, DawnRank: discovering personalized driver genes in cancer, Genome Med., 6 (2014), 56. https://doi.org/10.1186/s13073-014-0056-8 doi: 10.1186/s13073-014-0056-8
    [12] F. Vandin, E. Upfal, B. J. Raphael, De novo discovery of mutated driver pathways in cancer, Genome Res., 22 (2012), 375–385. https://doi.org/10.1101/gr.120477.111 doi: 10.1101/gr.120477.111
    [13] S. Zhao, J. Liu, P. Nanga, Y. Liu, A. E. Cicek, N. Knoblauch, et al., Detailed modeling of positive selection improves detection of cancer driver genes, Nat. Commun., 10 (2019), 3399. https://doi.org/10.1038/s41467-019-11284-9 doi: 10.1038/s41467-019-11284-9
    [14] A. Bashashati, G. Haffari, J. Ding, G. Ha, K. Lui, J. Rosner, et al., DriverNet: uncovering the impact of somatic driver mutations on transcriptional networks in cancer, Genome Biol., 13 (2012), R124. https://doi.org/10.1186/gb-2012-13-12-r124 doi: 10.1186/gb-2012-13-12-r124
    [15] E. O. Paull, D. E. Carlin, M. Niepel, P. K. Sorger, D. Haussler, J. M. Stuart, Discovering causal pathways linking genomic events to transcriptional states using Tied Diffusion Through Interacting Events (TieDIE), Bioinformatics, 29 (2013), 2757–2764. https://doi.org/10.1093/bioinformatics/btt471 doi: 10.1093/bioinformatics/btt471
    [16] M. D. Leiserson, F. Vandin, H. T. Wu, J. R. Dobson, J. V. Eldridge, J. L. Thomas, et al., Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes, Nat. Genet., 47 (2015), 106–114. https://doi.org/10.1038/ng.3168 doi: 10.1038/ng.3168
    [17] A. Cho, J. E. Shim, E. Kim, F. Supek, B. Lehner, I. Lee, MUFFINN: cancer gene discovery via network analysis of somatic mutation data, Genome Biol., 17 (2016), 129. https://doi.org/10.1186/s13059-016-0989-x doi: 10.1186/s13059-016-0989-x
    [18] Y. Hou, B. Gao, G. Li, Z. Su, MaxMIF: A new method for identifying cancer driver genes through effective data integration, Adv. Sci., 5 (2018), 1800640. https://doi.org/10.1002/advs.201800640 doi: 10.1002/advs.201800640
    [19] P. Jia, Z. Zhao, VarWalker: personalized mutation network analysis of putative cancer genes from next-generation sequencing data, PLoS Comput. Biol., 10 (2014), e1003460. https://doi.org/10.1371/journal.pcbi.1003460 doi: 10.1371/journal.pcbi.1003460
    [20] J. Song, W. Peng, F. Wang, J. Wang, Identifying driver genes involving gene dysregulated expression, tissue-specific expression and gene-gene network, BMC Med. Genomics, 12 (2019), 168. https://doi.org/10.1186/s12920-019-0619-z doi: 10.1186/s12920-019-0619-z
    [21] D. Bertrand, K. R. Chng, F. G. Sherbaf, A. Kiesel, B. K. Chia, Y. Y. Sia, et al., Patient-specific driver gene prediction and risk assessment through integrated network analysis of cancer omics profiles, Nucleic Acids Res., 43 (2015), e44. https://doi.org/10.1093/nar/gku1393 doi: 10.1093/nar/gku1393
    [22] C. A. Miller, S. H. Settle, E. P. Sulman, K. D. Aldape, A. Milosavljevic, Discovering functional modules by identifying recurrent and mutually exclusive mutational patterns in tumors, BMC Med. Genomics, 4 (2011), 34. https://doi.org/10.1186/1755-8794-4-34 doi: 10.1186/1755-8794-4-34
    [23] M. D. Leiserson, H. T. Wu, F. Vandin, B. J. Raphael, CoMEt: a statistical approach to identify combinations of mutually exclusive alterations in cancer, Genome Biol., 16 (2015), 160. https://doi.org/10.1186/s13059-015-0700-7 doi: 10.1186/s13059-015-0700-7
    [24] M. D. Leiserson, D. Blokh, R. Sharan, B. J. Raphael, Simultaneous identification of multiple driver pathways in cancer, PLoS Comput. Biol., 9 (2013), e1003054. https://doi.org/10.1371/journal.pcbi.1003054 doi: 10.1371/journal.pcbi.1003054
    [25] S. Cristea, J. Kuipers, N. Beerenwinkel, pathTiMEx: Joint inference of mutually exclusive cancer pathways and their progression dynamics, J. Comput. Biol., 24 (2017), 603–615. https://doi.org/10.1089/cmb.2016.0171 doi: 10.1089/cmb.2016.0171
    [26] S. Constantinescu, E. Szczurek, P. Mohammadi, J. Rahnenfuhrer, N. Beerenwinkel, TiMEx: a waiting time model for mutually exclusive cancer alterations, Bioinformatics, 32 (2016), 968–975. https://doi.org/10.1093/bioinformatics/btv400 doi: 10.1093/bioinformatics/btv400
    [27] G. Ciriello, E. Cerami, C. Sander, N. Schultz, Mutual exclusivity analysis identifies oncogenic network modules, Genome Res., 22 (2012), 398–406. https://doi.org/10.1101/gr.125567.111 doi: 10.1101/gr.125567.111
    [28] B. H. Hristov, M. Singh, Network-based coverage of mutational profiles reveals cancer genes, Cell Syst., 5 (2017), 221–229. https://doi.org/10.1016/j.cels.2017.09.003 doi: 10.1016/j.cels.2017.09.003
    [29] J. Song, W. Peng, F. Wang, An entropy-based method for identifying mutual exclusive driver genes in cancer, IEEE/ACM Trans. Comput. Biol. Bioinf., 17 (2020), 758–768. https://doi.org/10.1109/TCBB.2019.2897931 doi: 10.1109/TCBB.2019.2897931
    [30] C. J. Tokheim, N. Papadopoulos, K. W. Kinzler, B. Vogelstein, R. Karchin, Evaluating the evaluation of cancer driver genes, Proc. Natl. Acad. Sci. U.S.A., 113 (2016), 14330–14335. https://doi.org/10.1073/pnas.1616440113 doi: 10.1073/pnas.1616440113
    [31] Y. Han, J. Yang, X. Qian, W. C. Cheng, S. H. Liu, X. Hua, et al., DriverML: a machine learning algorithm for identifying driver genes in cancer sequencing studies, Nucleic Acids Res., 47 (2019), e45. https://doi.org/10.1093/nar/gkz096 doi: 10.1093/nar/gkz096
    [32] A. Colaprico, C. Olsen, M. H. Bailey, G. J. Odom, T. Terkelsen, T. C. Silva, et al., Interpreting pathways to discover cancer driver genes with moonlight, Nat. Commun., 11 (2020), 69. https://doi.org/10.1038/s41467-019-13803-0 doi: 10.1038/s41467-019-13803-0
    [33] P. Luo, Y. Ding, X. Lei, F. X. Wu, deepDriver: Predicting cancer driver genes based on somatic mutations using deep convolutional neural networks, Front. Genet., 10 (2019), 13. https://doi.org/10.3389/fgene.2019.00013 doi: 10.3389/fgene.2019.00013
    [34] P. Chandrashekar, N. Ahmadinejad, J. Wang, A. Sekulic, J. B. Egan, Y. W. Asmann, et al., Somatic selection distinguishes oncogenes and tumor suppressor genes, Bioinformatics, 36 (2020), 1712–1717. https://doi.org/10.1093/bioinformatics/btz851 doi: 10.1093/bioinformatics/btz851
    [35] J. Lyu, J. J. Li, J. Su, F. Peng, Y. E. Chen, X. Ge, et al., DORGE: Discovery of oncogenes and tumor suppressor genes using genetic and epigenetic features, Sci. Adv., 6 (2020). https://doi.org/10.1126/sciadv.aba6784 doi: 10.1126/sciadv.aba6784
    [36] M. Sudhakar, R. Rengaswamy, K. Raman, Novel ratio-metric features enable the identification of new driver genes across cancer types, Sci. Rep., 12 (2022), 5. https://doi.org/10.1038/s41598-021-04015-y doi: 10.1038/s41598-021-04015-y
    [37] J. Lever, E. Y. Zhao, J. Grewal, M. R. Jones, S. J. M. Jones, CancerMine: a literature-mined resource for drivers, oncogenes and tumor suppressors in cancer, Nat. Methods, 16 (2019), 505–507. https://doi.org/10.1038/s41592-019-0422-y doi: 10.1038/s41592-019-0422-y
    [38] O. Collier, V. Stoven, J. P. Vert, LOTUS: A single- and multitask machine learning algorithm for the prediction of cancer driver genes, PLoS Comput. Biol., 15 (2019), e1007381. https://doi.org/10.1371/journal.pcbi.1007381 doi: 10.1371/journal.pcbi.1007381
    [39] J. Reimand, G. D. Bader, Systematic analysis of somatic mutations in phosphorylation signaling predicts novel cancer drivers, Mol. Syst. Biol., 9 (2013), 637. https://doi.org/10.1038/msb.2012.68 doi: 10.1038/msb.2012.68
    [40] L. Qu, Z. Wang, H. Zhang, Z. Wang, C. Liu, W. Qian, et al., The analysis of relevant gene networks based on driver genes in breast cancer, Diagnostics, 12 (2022), 2882. https://doi.org/10.3390/diagnostics12112882 doi: 10.3390/diagnostics12112882
    [41] X. Shi, H. Teng, L. Shi, W. Bi, W. Wei, F. Mao, et al., Comprehensive evaluation of computational methods for predicting cancer driver genes, Briefings Bioinf., 23 (2022), bbab548. https://doi.org/10.1093/bib/bbab548 doi: 10.1093/bib/bbab548
    [42] A. C. Gumpinger, K. Lage, H. Horn, K. Borgwardt, Prediction of cancer driver genes through network-based moment propagation of mutation scores, Bioinformatics, 36 (2020), i508–i515. https://doi.org/10.1093/bioinformatics/btaa452 doi: 10.1093/bioinformatics/btaa452
    [43] S. Ng, E. A. Collisson, A. Sokolov, T. Goldstein, A. Gonzalez-Perez, N. Lopez-Bigas, et al., PARADIGM-SHIFT predicts the function of mutations in multiple cancers using pathway impact analysis, Bioinformatics, 28 (2012), i640–i646. https://doi.org/10.1093/bioinformatics/bts402 doi: 10.1093/bioinformatics/bts402
    [44] K. Shi, L. Gao, B. Wang, Discovering potential cancer driver genes by an integrated network-based approach, Mol. Biosyst., 12 (2016), 2921–2931. https://doi.org/10.1039/c6mb00274a doi: 10.1039/c6mb00274a
    [45] C. Suo, O. Hrydziuszko, D. Lee, S. Pramana, D. Saputra, H. Joshi, et al., Integration of somatic mutation, expression and functional data reveals potential driver genes predictive of breast cancer survival, Bioinformatics, 31 (2015), 2607–2613. https://doi.org/10.1093/bioinformatics/btv164 doi: 10.1093/bioinformatics/btv164
    [46] E. Hodzic, R. Shrestha, K. Zhu, K. Cheng, C. C. Collins, S. Cenk Sahinalp, Combinatorial detection of conserved alteration patterns for identifying cancer subnetworks, Gigascience, 8 (2019), giz024. https://doi.org/10.1093/gigascience/giz024 doi: 10.1093/gigascience/giz024
    [47] E. Lusito, B. Felice, G. D'Ario, A. Ogier, F. Montani, P. P. Di Fiore, et al., Unraveling the role of low-frequency mutated genes in breast cancer, Bioinformatics, 35 (2018), 36–46. https://doi.org/10.1093/bioinformatics/bty520 doi: 10.1093/bioinformatics/bty520
    [48] F. Li, L. Gao, X. Ma, X. Yang, Detection of driver pathways using mutated gene network in cancer, Mol. Biosyst., 12 (2016), 2135–2141. https://doi.org/10.1039/C6MB00084C doi: 10.1039/C6MB00084C
    [49] B. Gao, G. Li, J. Liu, Y. Li, X. Huang, Identification of driver modules in pan-cancer via coordinating coverage and exclusivity, Oncotarget, 8 (2017), 36115–36126. https://doi.org/10.18632/oncotarget.16433 doi: 10.18632/oncotarget.16433
    [50] D. Silverbush, S. Cristea, G. Yanovich-Arad, T. Geiger, N. Beerenwinkel, R. Sharan, Simultaneous integration of multi-omics data improves the identification of cancer driver modules, Cell Syst., 8 (2019), 456–466 e5. https://doi.org/10.1016/j.cels.2019.04.005 doi: 10.1016/j.cels.2019.04.005
    [51] A. Garavand, C. Salehnasab, A. Behmanesh, N. Aslani, A. H. Zadeh, M. Ghaderzadeh, Efficient model for coronary artery disease diagnosis: a comparative study of several machine learning algorithms, J. Healthcare Eng., 2022 (2022), 5359540. https://doi.org/10.1155/2022/5359540 doi: 10.1155/2022/5359540
    [52] S. J. Malebary, Y. D. Khan, Evaluating machine learning methodologies for identification of cancer driver genes, Sci. Rep., 11 (2021), 12281. https://doi.org/10.1038/s41598-021-91656-8 doi: 10.1038/s41598-021-91656-8
    [53] S. W. Zhang, Z. N. Wang, Y. Li, W. F. Guo, Prioritization of cancer driver gene with prize-collecting steiner tree by introducing an edge weighted strategy in the personalized gene interaction network, BMC Bioinf., 23 (2022), 341. https://doi.org/10.1186/s12859-022-04802-y doi: 10.1186/s12859-022-04802-y
    [54] P. H. Acosta, V. Panwar, V. Jarmale, A. Christie, J. Jasti, V. Margulis, et al., Intratumoral resolution of driver gene mutation heterogeneity in renal cancer using deep learning, Cancer Res., 82 (2022), 2792–2806. https://doi.org/10.1158/0008-5472.CAN-21-2318 doi: 10.1158/0008-5472.CAN-21-2318
    [55] F. Sadoughi, M. Ghaderzadeh, A hybrid particle swarm and neural network approach for detection of prostate cancer from benign hyperplasia of prostate, Stud. Health Technol. Inf., 205 (2014), 481–485.
    [56] A. J. Moshayedi, A. S. Roy, A. Kolahdooz, S. Yang, Deep learning application pros and cons over algorithm, EAI Endorsed Trans. AI Rob., 1 (2022), 1–13
    [57] M. Gheisari, G. Wang, M. Z. A. Bhuiyan, A survey on deep learning in big data, in 2017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC), (2017), 173–180.
    [58] U. D. Akavia, O. Litvin, J. Kim, F. Sanchez-Garcia, D. Kotliar, H. C. Causton, et al., An integrated approach to uncover drivers of cancer, Cell, 143 (2010), 1005–1017. https://doi.org/10.1016/j.cell.2010.11.013 doi: 10.1016/j.cell.2010.11.013
    [59] Y. Chen, J. Hao, W. Jiang, T. He, X. Zhang, T. Jiang, et al., Identifying potential cancer driver genes by genomic data integration, Sci. Rep., 3 (2013), 3538. https://doi.org/10.1038/srep03538 doi: 10.1038/srep03538
    [60] K. M. Jagodnik, Y. Shvili, A. Bartal, HetIG-PreDiG: A heterogeneous integrated graph model for predicting human disease genes based on gene expression, PLoS One, 18 (2023), e0280839. https://doi.org/10.1371/journal.pone.0280839 doi: 10.1371/journal.pone.0280839
    [61] Y. Chen, X. Wu, R. Jiang, Integrating human omics data to prioritize candidate genes, BMC Med. Genomics, 6 (2013), 57. https://doi.org/10.1186/1755-8794-6-57 doi: 10.1186/1755-8794-6-57
    [62] Z. Tian, M. Guo, C. Wang, L. Xing, L. Wang, Y. Zhang, Constructing an integrated gene similarity network for the identification of disease genes, J. Biomed. Semant., 8 (2017), 32. https://doi.org/10.1186/s13326-017-0141-1 doi: 10.1186/s13326-017-0141-1
    [63] L. Chin, J. N. Andersen, P. A. Futreal, Cancer genomics: from discovery science to personalized medicine, Nat. Med., 17 (2011), 297–303. https://doi.org/10.1038/nm.2323 doi: 10.1038/nm.2323
    [64] R. Edgar, M. Domrachev, A. E. Lash, Gene Expression Omnibus: NCBI gene expression and hybridization array data repository, Nucleic Acids Res., 30 (2002), 207–210. https://doi.org/10.1093/nar/30.1.207 doi: 10.1093/nar/30.1.207
    [65] J. Zhang, R. Bajari, D. Andric, F. Gerthoffert, A. Lepsa, H. Nahal-Bose, et al., The international cancer genome consortium data portal, Nat. Biotechnol., 37 (2019), 367–369. https://doi.org/10.1038/s41587-019-0055-9 doi: 10.1038/s41587-019-0055-9
    [66] Cancer Cell Line Encyclopedia Consortium, Genomics of Drug Sensitivity in Cancer Consortium, Pharmacogenomic agreement between two cancer cell line data sets, Nature, 528 (2015), 84–87. https://doi.org/10.1038/nature15736
    [67] J. Pinero, J. M. Ramirez-Anguita, J. Sauch-Pitarch, F. Ronzano, E. Centeno, F. Sanz, et al., The DisGeNET knowledge platform for disease genomics: 2019 update, Nucleic Acids Res., 48 (2020), D845–D855. https://doi.org/10.1093/nar/gkz1021 doi: 10.1093/nar/gkz1021
    [68] D. Repana, J. Nulsen, L. Dressler, M. Bortolomeazzi, S. K. Venkata, A. Tourna, et al., The Network of Cancer Genes (NCG), a comprehensive catalogue of known and candidate cancer genes from cancer sequencing screens, Genome Biol., 20 (2019), 1. https://doi.org/10.1186/s13059-018-1612-0 doi: 10.1186/s13059-018-1612-0
    [69] M. Sedova, M. Iyer, Z. Li, L. Jaroszewski, K. W. Post, T. Hrabe, et al., Cancer3D 2.0: interactive analysis of 3D patterns of cancer mutations in cancer subsets, Nucleic Acids Res., 47 (2019), D895–D899. https://doi.org/10.1093/nar/gky1098 doi: 10.1093/nar/gky1098
    [70] R. Mosca, J. Tenorio-Laranga, R. Olivella, V. Alcalde, A. Ceol, M. Soler-Lopez, et al., dSysMap: exploring the edgetic role of disease mutations, Nat. Methods, 12 (2015), 167–168. https://doi.org/10.1038/nmeth.3289 doi: 10.1038/nmeth.3289
    [71] E. P. Consortium, An integrated encyclopedia of DNA elements in the human genome, Nature, 489 (2012), 57–74. https://doi.org/10.1038/nature11247 doi: 10.1038/nature11247
    [72] E. C. Roadmap, A. Kundaje, W. Meuleman, J. Ernst, M. Bilenky, A. Yen, et al., Integrative analysis of 111 reference human epigenomes, Nature, 518 (2015), 317–330. https://doi.org/10.1038/nature14248 doi: 10.1038/nature14248
    [73] R. Andersson, C. Gebhard, I. Miguel-Escalada, I. Hoof, J. Bornholdt, M. Boyd, et al., An atlas of active enhancers across human cell types and tissues, Nature, 507 (2014), 455–461. https://doi.org/10.1038/nature12787 doi: 10.1038/nature12787
    [74] G. T. Consortium, The Genotype-Tissue Expression (GTEx) project, Nat Genet., 45 (2013), 580–585. https://doi.org/10.1038/ng.2653 doi: 10.1038/ng.2653
    [75] S. A. Forbes, D. Beare, P. Gunasekaran, K. Leung, N. Bindal, H. Boutselakis, et al., COSMIC: exploring the world's knowledge of somatic mutations in human cancer, Nucleic Acids Res., 43 (2015), D805–D811. https://doi.org/10.1093/nar/gku1075 doi: 10.1093/nar/gku1075
    [76] T. S. Keshava Prasad, R. Goel, K. Kandasamy, S. Keerthikumar, S. Kumar, S. Mathivanan, et al., Human protein reference database—2009 update, Nucleic Acids Res., 37 (2009), D767–D772. https://doi.org/10.1093/nar/gkn892 doi: 10.1093/nar/gkn892
    [77] A. Chatr-Aryamontri, B. J. Breitkreutz, S. Heinicke, L. Boucher, A. Winter, C. Stark, et al., The BioGRID interaction database: 2013 update, Nucleic Acids Res., 41 (2013), D816–D823. https://doi.org/10.1093/nar/gks1158 doi: 10.1093/nar/gks1158
    [78] D. Szklarczyk, A. L. Gable, K. C. Nastou, D. Lyon, R. Kirsch, S. Pyysalo, et al., The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets, Nucleic Acids Res., 49 (2021), D605–D612. https://doi.org/10.1093/nar/gkaa1074 doi: 10.1093/nar/gkaa1074
    [79] B. Turner, S. Razick, A. L. Turinsky, J. Vlasblom, E. K. Crowdy, E. Cho, et al., iRefWeb: interactive analysis of consolidated protein interaction data and their supporting evidence, Database, 2010 (2010), baq023. https://doi.org/10.1093/database/baq023 doi: 10.1093/database/baq023
    [80] L. Licata, L. Briganti, D. Peluso, L. Perfetto, M. Iannuccelli, E. Galeota, et al., MINT, the molecular interaction database: 2012 update, Nucleic Acids Res., 40 (2012), D857–D861. https://doi.org/10.1093/nar/gkr930 doi: 10.1093/nar/gkr930
    [81] S. Kerrien, B. Aranda, L. Breuza, A. Bridge, F. Broackes-Carter, C. Chen, et al., The IntAct molecular interaction database in 2012, Nucleic Acids Res., 40 (2012), D841–D846. https://doi.org/10.1093/nar/gkr1088 doi: 10.1093/nar/gkr1088
    [82] M. J. Cowley, M. Pinese, K. S. Kassahn, N. Waddell, J. V. Pearson, S. M. Grimmond, et al., PINA v2.0: mining interactome modules, Nucleic Acids Res., 40 (2012), D862–D865. https://doi.org/10.1093/nar/gkr967 doi: 10.1093/nar/gkr967
    [83] P. V. Hornbeck, J. M. Kornhauser, S. Tkachev, B. Zhang, E. Skrzypek, B. Murray, et al., PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse, Nucleic Acids Res., 40 (2012), D261–D270. https://doi.org/10.1093/nar/gkr1122 doi: 10.1093/nar/gkr1122
    [84] F. Diella, S. Cameron, C. Gemund, R. Linding, A. Via, B. Kuster, et al., Phospho.ELM: a database of experimentally verified phosphorylation sites in eukaryotic proteins, BMC Bioinf., 5 (2004), 79. https://doi.org/10.1186/1471-2105-5-79 doi: 10.1186/1471-2105-5-79
    [85] P. Minguez, I. Letunic, L. Parca, L. Garcia-Alonso, J. Dopazo, J. Huerta-Cepas, et al., PTMcode v2: a resource for functional associations of post-translational modifications within and between proteins, Nucleic Acids Res., 43 (2015), D494–D502. https://doi.org/10.1093/nar/gku1081 doi: 10.1093/nar/gku1081
    [86] R. Mosca, A. Ceol, P. Aloy, Interactome3D: adding structural details to protein networks, Nat. Methods, 10 (2013), 47–53. https://doi.org/10.1038/nmeth.2289 doi: 10.1038/nmeth.2289
    [87] R. Mosca, A. Ceol, A. Stein, R. Olivella, P. Aloy, 3did: a catalog of domain-based interactions of known three-dimensional structure, Nucleic Acids Res., 42 (2014), D374–D379. https://doi.org/10.1093/nar/gkt887 doi: 10.1093/nar/gkt887
    [88] M. J. Meyer, J. Das, X. Wang, H. Yu, INstruct: a database of high-quality 3D structurally resolved protein interactome networks, Bioinformatics, 29 (2013), 1577–1579. https://doi.org/10.1093/bioinformatics/btt181 doi: 10.1093/bioinformatics/btt181
    [89] M. Kanehisa, S. Goto, Y. Sato, M. Kawashima, M. Furumichi, M. Tanabe, Data, information, knowledge and principle: back to metabolism in KEGG, Nucleic Acids Res., 42 (2014), D199–D205. https://doi.org/10.1093/nar/gkt1076 doi: 10.1093/nar/gkt1076
    [90] T. Kelder, M. P. van Iersel, K. Hanspers, M. Kutmon, B. R. Conklin, C. T. Evelo, et al., WikiPathways: building research communities on biological pathways, Nucleic Acids Res., 40 (2012), D1301–D1307. https://doi.org/10.1093/nar/gkr1074 doi: 10.1093/nar/gkr1074
    [91] D. Croft, A. F. Mundo, R. Haw, M. Milacic, J. Weiser, G. Wu, et al., The Reactome pathway knowledgebase, Nucleic Acids Res., 42 (2014), D472–D477. https://doi.org/10.1093/nar/gkt1102 doi: 10.1093/nar/gkt1102
    [92] C. F. Schaefer, K. Anthony, S. Krupa, J. Buchoff, M. Day, T. Hannay, et al., PID: the pathway interaction database, Nucleic Acids Res., 37 (2009), D674–D679. https://doi.org/10.1093/nar/gkn653 doi: 10.1093/nar/gkn653
    [93] E. G. Cerami, B. E. Gross, E. Demir, I. Rodchenkov, O. Babur, N. Anwar, et al., Pathway Commons, a web resource for biological pathway data, Nucleic Acids Res., 39 (2011), D685–D690. https://doi.org/10.1093/nar/gkq1039 doi: 10.1093/nar/gkq1039
    [94] H. Mi, A. Muruganujan, D. Ebert, X. Huang, P. D. Thomas, PANTHER version 14: more genomes, a new PANTHER GO-slim and improvements in enrichment analysis tools, Nucleic Acids Res., 47 (2019), D419–D426. https://doi.org/10.1093/nar/gky1038 doi: 10.1093/nar/gky1038
    [95] A. Franceschini, D. Szklarczyk, S. Frankild, M. Kuhn, M. Simonovic, A. Roth, et al., STRING v9.1: protein-protein interaction networks, with increased coverage and integration, Nucleic Acids Res., 41 (2013), D808–D815. https://doi.org/10.1093/nar/gks1094 doi: 10.1093/nar/gks1094
    [96] M. Imielinski, A. H. Berger, P. S. Hammerman, B. Hernandez, T. J. Pugh, E. Hodis, et al., Mapping the hallmarks of lung adenocarcinoma with massively parallel sequencing, Cell, 150 (2012), 1107–1120. https://doi.org/10.1016/j.cell.2012.08.029 doi: 10.1016/j.cell.2012.08.029
    [97] E. Hodis, I. R. Watson, G. V. Kryukov, S. T. Arold, M. Imielinski, J. P. Theurillat, et al., A landscape of driver mutations in melanoma, Cell, 150 (2012), 251–263. https://doi.org/10.1016/j.cell.2012.06.024 doi: 10.1016/j.cell.2012.06.024
    [98] G. Wu, X. Feng, L. Stein, A human functional protein interaction network and its application to cancer data analysis, Genome Biol., 11(2010), R53. https://doi.org/10.1186/gb-2010-11-5-r53 doi: 10.1186/gb-2010-11-5-r53
    [99] The Cancer Genome Atlas Network, Comprehensive molecular portraits of human breast tumours, Nature, 490 (2012), 61–70. https://doi.org/10.1038/nature11412
    [100] L. B. Alexandrov, S. Nik-Zainal, D. C. Wedge, S. A. Aparicio, S. Behjati, A. V. Biankin, et al., Signatures of mutational processes in human cancer, Nature, 500 (2013), 415–421. https://doi.org/10.1038/nature12477 doi: 10.1038/nature12477
    [101] T. Davoli, A. W. Xu, K. E. Mengwasser, L. M. Sack, J. C. Yoon, P. J. Park, et al., Cumulative haploinsufficiency and triplosensitivity drive aneuploidy patterns and shape the cancer genome, Cell, 155 (2013), 948–962. https://doi.org/10.1016/j.cell.2013.10.011 doi: 10.1016/j.cell.2013.10.011
    [102] H. Rizvi, F. Sanchez-Vega, K. La, W. Chatila, P. Jonsson, D. Halpenny, et al., Molecular determinants of response to anti-programmed cell death (PD)-1 and anti-programmed death-ligand 1 (PD-L1) blockade in patients with non-small-cell lung cancer profiled with targeted next-generation sequencing, J. Clin. Oncol., 36 (2018), 633–641. https://doi.org/10.1200/jco.2017.75.3384 doi: 10.1200/jco.2017.75.3384
    [103] R. D. Kumar, A. C. Searleman, S. J. Swamidass, O. L. Griffith, R. Bose, Statistically identifying tumor suppressors and oncogenes from pan-cancer genome-sequencing data, Bioinformatics, 31 (2015), 3561–3568. https://doi.org/10.1093/bioinformatics/btv430 doi: 10.1093/bioinformatics/btv430
    [104] Y. Mao, H. Chen, H. Liang, F. Meric-Bernstam, G. B. Mills, K. Chen, CanDrA: cancer-specific driver missense mutation annotation with optimized features, PLoS One, 8 (2013), e77945. https://doi.org/10.1371/journal.pone.0077945 doi: 10.1371/journal.pone.0077945
    [105] L. G. Martelotto, C. K. Ng, M. R. De Filippo, Y. Zhang, S. Piscuoglio, R. S. Lim, et al., Benchmarking mutation effect prediction algorithms using functionally validated cancer-related missense mutations, Genome Biol., 15 (2014), 484. https://doi.org/10.1186/s13059-014-0484-1 doi: 10.1186/s13059-014-0484-1
    [106] M. H. Bailey, C. Tokheim, E. Porta-Pardo, S. Sengupta, D. Bertrand, A. Weerasinghe, et al., Comprehensive characterization of cancer driver genes and mutations, Cell, 173 (2018), 371–385.e18. https://doi.org/10.1016/j.cell.2018.02.060 doi: 10.1016/j.cell.2018.02.060
    [107] I. Martincorena, K. M. Raine, M. Gerstung, K. J. Dawson, K. Haase, P. Van Loo, et al., Universal patterns of selection in cancer and somatic tissues, Cell, 173 (2018), 1823. https://doi.org/10.1016/j.cell.2018.06.001 doi: 10.1016/j.cell.2018.06.001
    [108] R. Andrades, M. Recamonde-Mendoza, Machine learning methods for prediction of cancer driver genes: a survey paper, Briefings Bioinf., 23 (2022). https://doi.org/10.1093/bib/bbac062 doi: 10.1093/bib/bbac062
    [109] S. Parvandeh, L. A. Donehower, K. Panagiotis, T. K. Hsu, J. K. Asmussen, K. Lee, et al., EPIMUTESTR: a nearest neighbor machine learning approach to predict cancer driver genes from the evolutionary action of coding variants, Nucleic Acids Res., 50 (2022), e70. https://doi.org/10.1093/nar/gkac215 doi: 10.1093/nar/gkac215
    [110] K. Wong, T. M. Keane, J. Stalker, D. J. Adams, Enhanced structural variant and breakpoint detection using SVMerge by integration of multiple detection methods and local assembly, Genome Biol., 11 (2010), R128. https://doi.org/10.1186/gb-2010-11-12-r128 doi: 10.1186/gb-2010-11-12-r128
    [111] H. Carter, S. Chen, L. Isik, S. Tyekucheva, V. E. Velculescu, K. W. Kinzler, et al., Cancer-specific high-throughput annotation of somatic mutations: computational prediction of driver missense mutations, Cancer Res., 69 (2009), 6660–6667. https://doi.org/10.1158/0008-5472.CAN-09-1133 doi: 10.1158/0008-5472.CAN-09-1133
    [112] H. A. Shihab, J. Gough, D. N. Cooper, I. N. Day, T. R. Gaunt, Predicting the functional consequences of cancer-associated amino acid substitutions, Bioinformatics, 29 (2013), 1504–1510. https://doi.org/10.1093/bioinformatics/btt182 doi: 10.1093/bioinformatics/btt182
    [113] X. Lu, X. Li, P. Liu, X. Qian, Q. Miao, S. Peng, The integrative method based on the module-network for identifying driver genes in cancer subtypes, Molecules, 23 (2018), 183. https://doi.org/10.3390/molecules23020183 doi: 10.3390/molecules23020183
    [114] F. Yuan, X. Cao, Y. H. Zhang, L. Chen, T. Huang, Z. Li, et al., Identification of novel lung cancer driver genes connecting different omics levels with a heat diffusion algorithm, Front. Cell Dev. Biol., 10 (2022), 825272. https://doi.org/10.3389/fcell.2022.825272 doi: 10.3389/fcell.2022.825272
    [115] M. Tsuchiya, M. Tomita, M. Hashimoto, Robust global regulations of gene expression in biological processes: a major driver of cell fate decision revealed, in 2012 ICME International Conference on Complex Medical Engineering (CME), (2012), 744–749. https://doi.org/10.1109/ICCME.2012.6275649
  • This article has been cited by:

    1. Koichiro Maki, Analytical tool for COVID-19 using an SIR model equivalent to the chain reaction equation of infection, 2023, 233, 03032647, 105029, 10.1016/j.biosystems.2023.105029
    2. Chao Wu, Chun-yan He, Jia-ran Yan, Hong-li Zhang, Lu Li, Ci Tian, Nana Chen, Qing-yi Wang, Yu-hai Zhang, Hong-juan Lang, Psychological capital and alienation among patients with COVID-19 infection: the mediating role of social support, 2023, 20, 1743-422X, 10.1186/s12985-023-02055-6
  • Reader Comments
  • © 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(2171) PDF downloads(81) Cited by(0)

Figures and Tables

Figures(2)  /  Tables(6)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog