Research article Special Issues

Prediction of influential proteins and enzymes of certain diseases using a directed unimodular hypergraph


  • Protein-protein interaction (PPI) analysis based on mathematical modeling is an efficient means of identifying hub proteins, corresponding enzymes and many underlying structures. In this paper, a method for the analysis of PPI is introduced and used to analyze protein interactions of diseases such as Parkinson's, COVID-19 and diabetes melitus. A directed hypergraph is used to represent PPI interactions. A novel directed hypergraph depth-first search algorithm is introduced to find the longest paths. The minor hypergraph reduces the dimension of the directed hypergraph, representing the longest paths and results in the unimodular hypergraph. The property of unimodular hypergraph clusters influential proteins and enzymes that are related thereby providing potential avenues for disease treatment.

    Citation: Sathyanarayanan Gopalakrishnan, Swaminathan Venkatraman. Prediction of influential proteins and enzymes of certain diseases using a directed unimodular hypergraph[J]. Mathematical Biosciences and Engineering, 2024, 21(1): 325-345. doi: 10.3934/mbe.2024015

    Related Papers:

    [1] Jiyun Shen, Yiyi Xia, Yiming Lu, Weizhong Lu, Meiling Qian, Hongjie Wu, Qiming Fu, Jing Chen . Identification of membrane protein types via deep residual hypergraph neural network. Mathematical Biosciences and Engineering, 2023, 20(11): 20188-20212. doi: 10.3934/mbe.2023894
    [2] Zhengtao Xi, Tongqiang Liu, Haifeng Shi, Zhuqing Jiao . Hypergraph representation of multimodal brain networks for patients with end-stage renal disease associated with mild cognitive impairment. Mathematical Biosciences and Engineering, 2023, 20(2): 1882-1902. doi: 10.3934/mbe.2023086
    [3] Babak Khorsand, Abdorreza Savadi, Javad Zahiri, Mahmoud Naghibzadeh . Alpha influenza virus infiltration prediction using virus-human protein-protein interaction network. Mathematical Biosciences and Engineering, 2020, 17(4): 3109-3129. doi: 10.3934/mbe.2020176
    [4] Wenjun Xu, Zihao Zhao, Hongwei Zhang, Minglei Hu, Ning Yang, Hui Wang, Chao Wang, Jun Jiao, Lichuan Gu . Deep neural learning based protein function prediction. Mathematical Biosciences and Engineering, 2022, 19(3): 2471-2488. doi: 10.3934/mbe.2022114
    [5] Peter Hinow, Edward A. Rietman, Sara Ibrahim Omar, Jack A. Tuszyński . Algebraic and topological indices of molecular pathway networks in human cancers. Mathematical Biosciences and Engineering, 2015, 12(6): 1289-1302. doi: 10.3934/mbe.2015.12.1289
    [6] Yongyin Han, Maolin Liu, Zhixiao Wang . Key protein identification by integrating protein complex information and multi-biological features. Mathematical Biosciences and Engineering, 2023, 20(10): 18191-18206. doi: 10.3934/mbe.2023808
    [7] Jinmiao Song, Shengwei Tian, Long Yu, Qimeng Yang, Qiguo Dai, Yuanxu Wang, Weidong Wu, Xiaodong Duan . RLF-LPI: An ensemble learning framework using sequence information for predicting lncRNA-protein interaction based on AE-ResLSTM and fuzzy decision. Mathematical Biosciences and Engineering, 2022, 19(5): 4749-4764. doi: 10.3934/mbe.2022222
    [8] Zhihong Zhang, Yingchun Luo, Meiping Jiang, Dongjie Wu, Wang Zhang, Wei Yan, Bihai Zhao . An efficient strategy for identifying essential proteins based on homology, subcellular location and protein-protein interaction information. Mathematical Biosciences and Engineering, 2022, 19(6): 6331-6343. doi: 10.3934/mbe.2022296
    [9] Linlu Song, Shangbo Ning, Jinxuan Hou, Yunjie Zhao . Performance of protein-ligand docking with CDK4/6 inhibitors: a case study. Mathematical Biosciences and Engineering, 2021, 18(1): 456-470. doi: 10.3934/mbe.2021025
    [10] Haipeng Zhao, Baozhong Zhu, Tengsheng Jiang, Zhiming Cui, Hongjie Wu . Identification of DNA-protein binding residues through integration of Transformer encoder and Bi-directional Long Short-Term Memory. Mathematical Biosciences and Engineering, 2024, 21(1): 170-185. doi: 10.3934/mbe.2024008
  • Protein-protein interaction (PPI) analysis based on mathematical modeling is an efficient means of identifying hub proteins, corresponding enzymes and many underlying structures. In this paper, a method for the analysis of PPI is introduced and used to analyze protein interactions of diseases such as Parkinson's, COVID-19 and diabetes melitus. A directed hypergraph is used to represent PPI interactions. A novel directed hypergraph depth-first search algorithm is introduced to find the longest paths. The minor hypergraph reduces the dimension of the directed hypergraph, representing the longest paths and results in the unimodular hypergraph. The property of unimodular hypergraph clusters influential proteins and enzymes that are related thereby providing potential avenues for disease treatment.



    The protein-protein interaction (PPI) network represents the biological interactions between proteins [1], in which the nodes represent proteins and the edges represent interactions between proteins [2]. PPI is a vital tool in the identification of influential proteins and the cellular and molecular functions of proteins [3,4]. Identifying essential proteins may give a new perception in drug discovery for diseases and have control in signaling pathways, gene expressions and others. The proteins have different types of interactions between themselves. Direct or binary interactions and indirect (n-ary) or pathway interactions are a few of them [5]. There are various perspectives, such as experimental and computational, that can be utilized to analyze PPI and thereby to predict the influential proteins. However, PPI networks are intricate and hence computational analysis is more expensive [6].

    Figure 1.  Graphical abstract.

    Traditional experiments are efficient in analyzing biological networks. However, these methods reasonably consume a long running time and more cost. Thus, the researchers employ computational models to analyze biological networks [7,8,9,10,11,12].

    The graph is an effective tool for analysing biological networks, identifying patterns, significant components and many more. However, analyzing multi-way interactions of biological networks using graphs is difficult since graphs represent binary relations. Hypergraphs are the generalization of graphs that represent the n-ary relations and effectively analyze the multi-way interactions of biological networks. Additionally, compared to graph methods, hypergraphs have a lower computational complexity. For instance, using hypergraphs, Klimm et al. [13] predicted essential genes from a multi-protein network and established that the hypergraphs are more efficient for analzing multi-protein networks (complex networks) than pairwise graphs. Feng et al. [14] identified critical genes for pathogenic viral responses using hypergraph and infers that hypergraph potentially predicts the essential genes for complex biological networks.

    In complex biological network analysis, exploring pathways or indirect interactions is challenging. Investigating the indirect or pathway interactions helps determine the nature of biological networks. Traversal algorithms efficiently identify pathway interactions. Depth-first search (DFS), a robust algorithm, is one of the traversal algorithms that significantly identifies the pathway interactions. Here, the directed hypergraph DFS has been constructed to obtain the pathway interaction between proteins. A minor hypergraph, an induced hypergraph, reduces the complexity of the analysis of pathway interactions by reducing dimension.

    Some induced hypergraphs (obtained by dimensionality reduction using minor hypergraphs) are unimodular. The unimodular property of a hypergraph provides better classifications and clusters for data. For instance, Swaminathan et al. [15,16] used unimodular hypergraphs for DNA sequencing, and Madhu et al. [17] applied them to the multi-objective optimization problem for disease classification.

    In this work, PPI networks for three diseases, Parkinson's, COVID-19 and diabetes mellitus, are constructed as directed hypergraphs, and the pathway interactions between the host proteins are obtained using directed hypergraph DFS. Then, the recursive application of the minor hypergraph algorithm reduces the dimension further and results in a unimodular matrix.

    Section 2 reviews some related work, Section 3 presents the proposed method, Section 4 presents the results and discussion and the final section delivers concluding remarks.

    This section discusses previous works on three diseases: Parkinson's disease, COVID-19 and diabetes mellitus and their associated enzymes.

    Typically, Parkinson's disease is an age-related disorder of brain functioning, a kind of neurological disorder. It progressively affects the nervous system and the nerve's controlled body parts. The tremors, slowed movement, rigid muscles, imbalanced posture and balance, loss of automatic movement and speech and writing changes are some symptoms of Parkinson's. It causes various brain disorder problems and leads to death if the nervous system does not work properly.

    Kim et al. [18] reviewed the literature on treating neurological disease (ND), specifically Alzheimer's and Parkinson's, and they inferred that asparagine endopeptidase, neprilysin, amyloid-degrading and insulin-degrading enzymes efficiently treat ND. They notably concluded that the amyloid-degrading enzymes potentially have a vital role in treating ND. Goldstein et al. [19] investigated the role of regulatory enzyme tyrosine hydroxylase (TH) in stimulating L-DOPA collection and hypothesized that modulating TH levels could reduce the dopamine deficit and increase striatal L-DOPA levels. It provides a new strategy to treat Parkinson's disease. Rasch et al. [20] reviewed articles related to the role of TH and infers from clinical research that L-DOPA offered the best prospects for treating Parkinson's disease that concurred with Goldstein's [19] opinion.

    Nakano et al. [21] analyzed Parkinson's disease using mouse models and performed two types of adenosine triphosphate (ATP) enzyme regulation. They suggested that maintaining the ATP level would be an efficient treatment for Parkinson's disease. Angelopoulou et al. [22] analyzed COVID-19 and Parkinson's disease and found that ACE2 was a crucial enzyme for both diseases. They also identified essential features of Parkinson's that are related to ACE2.

    COVID-19 is a virus from the family of Severe Acute Respiratory Syndrome (SARS). COVID-19 causes cold, cough, fever and tiredness at the beginning stage. Later, it becomes complicated, affect the weaker body parts and organs and lead to diseases, organ failure and death. Also, COVID-19 predominantly affects the lungs.

    Estrada et al. [23] analyzed a PPI of COVID-19 and concluded that some COVID-19 patients might be affected by Parkinson's disease. Jia et al. [24] inspected the human viral-host PPI of COVID-19 using two tensor-decomposition models (CP-N3 and ComplEx-N3) and a knowledge graph (constructed from biomedical information). They predicted the links between viruses and antiviral drug attributes. Guo et al. [25] analyzed the PPI of COVID-19 using the concept of core decomposition and dense graphs. They initially took the PPI of an affected person and developed it using a bio mine database [26] for analysis and obtained some of the gene hypotheses of humans for predicting further clarification for detecting the drug for COVID-19.

    Hasan et al. [27] analyzed blood cells of COVID-19 affected people to obtain hub proteins. Also, they discovered hub proteins to develop therapeutic drugs for COVID-19 using gene ontology. Li et al. [28] found that ACE2, a cytokine-related enzyme, could be a receptor for SARS-COV-2, potentially the first receptor that helps the SARS-COV-2 viral protein to enter the human body. They also constructed a PPI of COVID-19 and identified hub genes involved in viral cytokine activity; these were beneficial for drug discovery in a biological laboratory. Messina et al. [29] analyzed a PPI of viral and human proteins involved in COVID-19 and presented a pathogenic mechanism for COVID-19 host protein interactions.

    Diabetes mellitus is the metabolism of heterogeneous disturbances in which chronic hyperglycaemia is the key. The impaired insulin action or secretion or both is the cause of diabetes mellitus. Some of the symptoms are unexpected weight loss, frequent urination, blurry vision, feeling tired all the time and many infections. Various types of diseases like coronary, neurological and organ damage are the causes of diabetes. Types 1 and 2 are the most common types of diabetes.

    Noroozi et al. [30] analyzed a sample of 9991 adults and inferred that the liver enzymes alanine aminotransferase (ALT), aspartate aminotransferase (AST), alkaline phosphatase (ALP) and γ-glutamyl-transferase (GGT) were related to higher odds of diabetes. Chen et al. [31] sampled 132,377 adults and found that the enzymes GGT, ALT, ALP and AST were associated with high-risk factors for type Ⅱ diabetes. Ross et al. [32] characterized serological biomarkers related to the exocrine pancreas, analyzed their role and found three pancreas enzymes, amylase, lipase and trypsinogen, which are potential risk factors for type Ⅰ diabetes in serological biomarkers of pancreas volume. Al-Kouh et al. [33] studied eight male Wistar rats with high glucose levels and induced diabetes to evaluate the potential of angiotensin-converting enzyme (ACE) and angiotensin-Ⅱ receptor in the renin-angiotensin system (RAS) for the prevention of heart ischaemia/reperfusion injury. They found that the RAS protected the heart from diabetes and that glucose transporter type 4 pathways protected against cytokine-related diseases.

    In this work, we analyze PPI networks of diseases using our proposed method, which consists of the following steps:

    ● Construction of a directed hypergraph for PPI.

    ● Finding the pathway interaction of proteins using directed hypergraph DFS.

    ● Construction of a path-directed hypergraph and its corresponding matrix using the pathway interactions.

    ● Dimensionality reduction of the matrix of the path-directed hypergraph using the properties of minor hypergraphs.

    ● Clustering of proteins using the unimodular matrix.

    The algorithm of the proposed method is given in the Algorithm 1.

    Algorithm 1: Unimodular Clustering
    INPUT: PPI Network
    OUTPUT: Clusters
      1: Procedure: CDHG(PPINetwork) (Algorithm 2)
      2: Procedure: DHG_DFS(DHG,s,seen=ϕ,path=ϕ) (Algorithm 3)
      3: Procedure: CPM(paths) (Algorithm 5)
      4: Procedure: DR(PDHG) (Algorithm 6)
      5: Procedure: Clustering (Algorithm 7)

    If there is a PPI between the viral or host proteins vi, i={1,2,,s;sN} and human proteins

    {hj|j=1,2,,r;rN},

    then the hyperedge of a directed hypergraph (DHG)

    E(DHG)=(T(E(DHG)),H(E(DHG)))

    is constructed, where

    T(E(DHG))={vi}

    and

    H(E(DHG))={hj|j=1,2,,r;rN}.

    The PPI network of the host (viral) and the corresponding host-human proteins of diseases are constructed as a directed hypergraph using the Algorithm 2.

    Algorithm 2: Construction of directed hypergraph (CDHG)
    INPUT: PPI Network
    OUTPUT: Directed hypergraph (DHG)
      1: for every vi (viral or host protein) in PPI do
      2:   for every hjvi (human protein) in PPI do
      3:     if vi has a PPI with hj then
      4:       Construct E(DHG) of DHG with T(E(DHG))=vi and add hj to H(E(DHG))
      5:     else
      6:       Continue
      7:     end if
      8:   end for
      9: end for
      10: Return DHG.

    Definition 3.1. [Minimal hypergraph] [34] The directed hypergraph DHMG (hyperedges are either in B-arc or F-arc) with |E(DHMG)|=nm, and there is no DHMG with |E(DHMG)|=nm<nm, is said to be minimal hypergraph.

    Theorem 1. [34] If DHG=(VDHG,EDHG) is a directed hypergraph with |VDHG|=n, then there exists a minimal hypergraph DHBMG=(VDHG,EHBMG) such that every directed hyperedge EHBMG of HBMG is B-hyperarc or HBMG is a B-hypergraph. Also, there exist a minimal hypergraph HFMG=(VDHG,EHFMG) such that every hyperedge EHFMG of HFMG is F-hyperarc or HFMG is an F-hypergraph.

    From the Theorem 1, the maximum possible number of hyperedges equals the number of nodes (n) of the directed hypergraph. So, the maximum number of iterations required for every node is n, a parameter.

    DFS is a widely used traversal algorithm to obtain the longest paths for the nodes in a network, which performs an exhaustive node search until reaching the required source. It uses the stack for appending the visited nodes and terminates its process when no new elements exist as non-visited nodes or when the stack is empty.

    In this paper, the DHG_DFS algorithm is designed to obtain the paths. Each viral or host protein is the origin of a path in which the disease's viral (or host) and human proteins represent the source and destination, respectively. The directed hypergraph obtained using Algorithm 2 is an input for the DHG_DFS Algorithm 3 to get the pathway interactions for viral or host proteins.

    Algorithm 3: Directed Hypergraph Depth First Search (DHG_DFS)
    INPUT: DHG_DFS(DHG,s,seen,nrpath)
    OUTPUT: Paths of nodes
      1: nrpath={s},seen={}
      2: DHG_DFSR(DHG,s,seen,nrpath)
      3: Return paths

    The directed hypergraph DFS procedure works as follows and the parameter used in DHG_DFS is the length of the path. Here, the length of the path is 1,000,000:

    a) Take a viral or host protein as a root node.

    b) Perform deep searching for the relevant human proteins using a directed hypergraph from the viral or host protein.

    c) Mark all human proteins as "seen" when they have been visited.

    d) Terminate the search process when there is no new human protein to visit.

    The pathway matrix is constructed with the following steps:

    i) Row elements are viral or host proteins;

    ii) Column elements are human proteins that are involved in pathway interactions.

    Algorithm 4: Recursive DHG_DFS
    INPUT: DHG_DFSR(DHG,s,seen,nrpath)
    OUTPUT: Paths
      1: paths=Φ
      2: Add the node s to seen
      3: for every neighbour nr of s in DHG do
      4:   if nrseen then
      5:     Add the nodes in path and nr to nrpath
      6:     Add nrpath to the paths set
      7:     DHG_DFS(DHG,nr,seen,nrpath)
      8:   end if
      9: end for
      10: Return paths

    Assume an interaction exists between the viral or host protein and human proteins based on pathway interactions obtained using Algorithm 2. The enzymes that correspond to the human protein's diseases are then taken. Thus, the procedure for assigning matrix entries to viral or host and human proteins is defined based on the following and the procedure for obtaining matrix, and its corresponding hypergraph is given in Algorithm 5:

    PDHG[vihj]={1,ifhjinvolved in at least one enzyme related to diseases, 0,otherwise.

    Algorithm 5: Construction of Pathway Matrix (CPM)
    INPUT: paths
    OUTPUT: PDHG
      1: for vi in viral or host protein do
      2:   for hj in human protein do
      3:     tmp = Number of enzymes related to proteins hj
      4:     if tmp1 then
      5:       PDHG[vihj]=1
      6:      else if tmp==0 then
      7:       PDHG[vihj]=0
      8:     end if
      9:   end for
      10:   if PDHG[vihj]=1 then
      11:     PDHG=CDHG(vihj)
      12:   end if
      13: end for
      14: Return PDHG

    The dimensionality of the path-directed hypergraph or pathway interaction matrix is reduced using minor hypergraphs. Here are some preliminary definitions and properties of minor hypergraphs.

    The hyperedge eE(DHG) is said to be a subhyperedge if there exists a hyperedge eE(DHG) such that ee. If ee, then e is a proper subhyperedge [35].

    For a hypergraph DHG, the hypergraph obtained from DHG by contracting an hyperedge {vx,vy,vz,}E(DHG) is the hypergraph DHG/e or DHG with V(DHG/e)=V(DHG){vx,vy,vz,}{vxyz} and

    E(DHG/e)={hE(DHG)|h{vx,vy,vz,}=ψ}{(he){vxyz}|hE(DHG),he=ϕ}

    In other words, vxy is the new contracted vertex, and every hyperedge containing either x or y is set to contain vxy.

    Let DHG be a hypergraph. Then, the minor of DHG, denoted DHG (DHGDHG), can be obtained using the following set of operations [35]:

    i) Removal of a vertex.

    ii) Contraction of the vertices that share a common hyperedge.

    iii) Addition of a hyperedge with contracted vertices.

    iv) Removal of proper subhyperedges.

    There are different types of directed hyperedges based on pathway matrix, as follows:

    i. Take a viral or host protein (vi) as the 'head' of the hyperedge, and human proteins (hjs) as the 'tail', with PDHG[vihj]=1.

    ii. Take the 'head' to be viral or host proteins (vis) and the 'tail' to be human protein hj, with PDHG[vihj]=1.

    The standard elementary operations of matrices are

    ● Interchange any two rows (columns).

    ● Subtracting or adding any row (column) from another.

    ● Multiply the scalar -1 with any row (column).

    Consider a unimodular matrix C=(cij) with m rows and n columns, then define the following operations to obtain the dimensionality reduced unimodular matrices:

    ● Interchange any column with zero in an ascending manner.

    ● Delete the columns with zeros from the matrix.

    ● If there is a column ci1, with the leading element as non-zero, then make all cj1=0 using elementary operations.

    ● Delete the column ci1 and reduce the dimensionality of the matrix with dimension m×(n1).

    ● Make all the columns ck1>0,k=1,2,,m, separately as a reduced matrices.

    Raghavachari [36] proved Theorem 2 concerning the above operations.

    Theorem 2. The given matrix C=(cij) with m rows and n columns is totally unimodular if and only if each of the above defined reduced matrices are totally unimodular.

    Suppose that the viral or host proteins vi,vj have interactions with the human proteins {ha,hb,hc} and {hd,he,hf} respectively. Then, the elementary row (column) operations between the rows (columns) of the proteins vi,vj in the interaction matrix may lead to an erroneous interaction between proteins. Thus, the following set of operations are defined to overcome this:

    ● Apply the logical operator OR between any two rows (ri) (columns (ci)) and (rj) (cj), if rirj=ri or rirj=rj (for columns cicj=ci or cicj=cj).

    ● Make the identical rows (ri) (columns (ci)) and (rj) ((cj)) as zero rows (r0) (columns (c0)) by using the logical operator XOR between rows (columns) (i.e., ) rirj=r0 (for columns cicj=c0).

    ● Delete the zero rows (r0) (columns (c0)).

    ● Suppose there are no new rows (columns) that satisfy rirj=ri or rirj=rj (for columns cicj=ci or cicj=cj). Then make all distinct rows (rk) (columns (ck)) as a reduced matrices with dimension s×t, where s<m and t<n.

    Table 1.  Equivalent minor and matrix operations.
    Minor hypergraph operations Equivalent operations on matrices
    Removal of a vertex Removal of zero rows (columns)
    Contraction of the vertices that share a common hyperedge Logical operator OR between rows (columns)
    Addition of a hyperedge with contracted vertices Logical operator XOR between rows (columns)
    Removal of proper subhyperedges Logical operator OR between rows (columns)

     | Show Table
    DownLoad: CSV

    Theorem 3. Let A=(aij) be a matrix with m rows and n columns. If (aij)=0 or +1, for i{1,2,,m} and j{1,2,,n}. Then, the dimensionality reduced matrix AR=(rij) with dimension s<m rows and t<n columns, obtained using the above operations is totally unimodular.

    Example for reduction of hyperedges

    The following example shows the reduction of directed hyperedges using minor hypergraph operations where the distinct hyperedges are highlighted using different colors in the Figures 25.

    Figure 2.  Given directed hypergraph.
    Figure 3.  fi and fi.
    Figure 4.  ad and ad.
    Figure 5.  fij and fij.

    The following minor hypergraph properties are developed to reduce the length of the directed hyperedges.

    The dimensionality reduction of hyperedge has two subcategories: Removing the hyperedges which have the same head/tail, and the other is the removal of subhyperedge.

    If there are any two hyperedges ei,ej that have the same head/tail, then blend the hyperedges ei,ej as eij (i.e., blend their head/tail).

    For any two hyperedges ei,ej wìth ei is the subhyperedge of the hyperedge ej, blend the hyperedges ei,ej in the following way:

    i) Combine the head/tail of the hyperedges.

    ii) Combine the remaining nodes of the hyperedges using the minor hypergraph operations.

    The resultant hypergraph with reduced hyperedges in the form of matrix is a unimodular that does not require any optimization.

    The procedure to obtain the hypergraph in the reduced dimension is presented in Algorithm 6.

    Algorithm 6: Dimensionality Reduction (DR)
    INPUT: PDHG
    OUTPUT: Dimensionality-reduced Pathway Matrix
      1: for eiE(PDHG) do
      2:   for ejE(PDHG) do
      3:     if eiej then
      4:       Merge ei and ej as eij
      5:       Add eij to E(PDHG)
      6:       Remove ei and ej from E(PDHG)
      7:     else if ej and ei having same H(E(PDHG)/T(E(PDHG) then
      8:       Merge ei and ej as eij
      9:     end if
      10:   end for
      11: end for
      12: Form the Dimensionality-reduced pathway matrix using PDHG
      13: Return Dimensionality-reduced Pathway Matrix

    The matrix that results from the dimensionality reduction procedure is unimodular. We use the properties of this unimodular matrix to cluster disease-related proteins. Some definitions and properties of unimodular hypergraphs are given below.

    For any square sub-matrix of a given matrix, if the determinant value is 1,0, or +1, the matrix is said to be a totally unimodular matrix.

    If the incidence matrix of a hypergraph is totally unimodular, then the hypergraph is said to be a unimodular hypergraph.

    Algorithm 7 presents clustering with the unimodular property.

    Algorithm 7: Clustering
    INPUT: Dimensionality-reduced Pathway Matrix
    OUTPUT: Clusters
      1: Clusters = {}
      2: for i in row do
      3:   for j in col do
      4:     tmp_cluster = {}
      5:     if PDHG[i,j]==1 then
      6:       Add i,j to the tmp_cluster
      7:       Go to jth row
      8:       Find the non-zero entry in jth row and add the corresponding column index k to tmp_cluster
      9:       if k==i then
      10:         break
      11:       else
      12:         Continue
      13:       end if
      14:     end if
      15:   end for
      16:   Add tmp_clusters to Clusters
      17: end for
      18: Return Clusters

    The proposed methodology is implemented to the PPI of the following disease networks [37] using a Google Colab TPU processor:

    ● Pakinson's,

    ● COVID-19,

    ● Diabetes.

    The PPI interactions of diseases are implemented using the following steps:

    ● The viral or host and the corresponding human protein interactions involved in diseases are constructed as a directed hypergraph using CDHG Algorithm 2.

    ● The DHG_DFS Algorithm 3 is applied to the directed hypergraph to obtain the pathway interaction for every viral or host protein involved in the disorder.

    ● The pathway matrix of proteins is constructed using the CPM Algorithm 5.

    ● The dimensionality of the pathway matrix is reduced using the DM Algorithm 6.

    ● The resultant matrix of DM Algorithm 6 is unimodular. Thus, the proteins and their related enzymes of diseases are clustered using the Cluster Algorithm 7.

    Figure 6.  Clustering for Parkinson's disease.

    Table 5 presents the number of nodes (N(DHG)), the number hyperedges (M(DHG)) & edges (M(G)), the path length for hypergraph (PL(DHG)) & graph (PL(G)) used in the proposed method for implementation. The Tables 24 presents the number of influential proteins of the diseases obtained by proposed method and graph algorithm. Also, the number of influential proteins of the diseases equals to the number of proteins in the data-set from Uniprotkb database [37].

    Table 2.  Comparison of influential proteins (Count) For Parkinson's.
    Enzyme Total number of proteins in data-set Number of proteins obtained using our algorithm Number of proteins obtained using graph algorithm
    Tyrosine Hydroxylase 3 3 2
    Amyloid Degrading 4 4 3
    Neprilysin 1 1 0
    Insulin Degrading 4 4 3
    Asparagine Endopeptidas 26 26 10
    Adenosine Triphosphate 4 4 2

     | Show Table
    DownLoad: CSV
    Table 3.  Comparison of influential proteins (Count) For COVID-19.
    Enzyme / Protein list Total number of proteins in data-set Number of proteins obtained using our algorithm Number of proteins obtained using graph algorithm
    DNA Helicase 116 116 58
    DNA Ligase 353 353 213
    DNA Polymerase 658 658 265
    DNA Primase 9 9 4
    Topoisomerase 27 27 15
    Alanine Transaminase 2 2 0
    Alkaline 8 8 3
    At Ⅱ 6 6 0
    Serum 87 87 46
    Glucose Oxidase 149 149 77
    Matrix Metalloproteinases 7 7 4
    Proteolytic 123 123 75
    MurA 1 1 0
    MurC 1 1 0
    MurE 2 2 1
    MurF 4 4 4
    Pancreatic Lipase 5 5 2
    Pepsin 3 3 1
    Lactase 1 1 1
    Maltase 1 1 1
    Trypsin 668 668 316
    Tmprss2 41 41 13
    Ace2 4 4 3
    IL6 27 27 19
    Cytoplasmic 1058 1058 539
    Cytokines 3 3 2
    Cob 2 2 0

     | Show Table
    DownLoad: CSV
    Table 4.  Comparison of influential proteins (Count) for diabetes.
    Enzyme Total number of proteins in data-set Number of proteins obtained using our algorithm Number of proteins obtained using graph algorithm
    Alanine Transaminase 2 2 1
    Alkaline Phosphatase 3 3 0
    Amylase 2 2 2
    Angiotensin Ⅱ 3 3 3
    Angiotensin Converting 3 3 3
    Angiotensin Converting 2 2 2 2
    Aspartate Transaminase 1 1 1
    Gamma Glutamyl Transferase 2 2 1
    Lipase 26 26 20
    Sirtuins 2 2 2

     | Show Table
    DownLoad: CSV
    Table 5.  Comparison of path length parameter of DHG_DFS.
    Disease N(DHG) M(DHG) M(G) PL(DHG) PL(G)
    Parkinson's 2442 126 3752 1,000,000 75,000
    COVID-19 3402 306 6978 1,000,000 84,500
    Diabetes Mellitus 5038 238 5650 1,000,000 10,000

     | Show Table
    DownLoad: CSV

    Using the proposed method, we obtained unimodular matrices for the diseases taken for this study. The unimodular matrices and their possible clusterings are as follows:

    For Parkinson's disease, the unimodular matrix is

    UParkinson=[0110]

    and the possible clusters are

    SPISPIISPI.

    For COVID-19, the unimodular matrix is

    UCOVID19=[1010000101101100]

    Finally, for diabetes, the unimodular matrix is

    UDiabetes=[000000000001000000010000000000000000000100000000000001000000000000000000100000000000000001001000000000000000000100000000000000010000000000100000000000000000000010000100000000000000010000000000000000001000000000000000000000010]

    and the possible clusters are

    SDISDXIISDISDIISDVSDXIIISDIISDVISDXVSDXIVSDVISDIIISDXSDIVSDIXSDVIIISDVIISDIII

    In the case of Parkinson's disease, the proteins identified belong to the enzyme tyrosine hydroxylase (TH). The regulation of TH enzyme may provide a new strategy in the treatment of Parkinson's [19,20], and controlling the activity of the amyloid-degrading enzymes (ADE) (neprilysin, insulin-degrading enzymes, asparagine endopeptidase) could be a crucial role in the treatment of Parkinson's [18]. Also, the algorithm yields the proteins that are components of the enzyme ATP, the maintenance level of ATC, would give the therapeutic strategy for Parkinson's [21]. From the Table 2 some influential proteins were obtained in the Neprilysin enzyme for Parkinson's disease. Moreover, Neprilysin is one of the enzymes involved in the aggregation control of amyloidogenic proteins to cure Parkinson's [18]. However, the graph algorithm does not yields these proteins.

    Figure 7.  Clustering for diabetes.

    For COVID-19, we obtained the proteins that are the components of enzymes Tmprss2, Ace2, liver-related enzymes (IL6, cytoplasmic enzymes, cytokines), the diabetes-mellitus-related enzymes (alanine transaminase, alkaline, At Ⅱ, serum, glucose oxidase), digestive-system-related enzymes (pancreatic lipase, pepsin, lactase, maltase, trypsin), and blood-pressure-related enzymes (matrix metalloproteinases). The enzymes Tmprss2 and Ace2 are the hosts for COVID-19 [38]. The liver-related enzymes may lead to liver injury, diabetes and digestive-related enzymes may increase the risk level of COVID-19, and the blood-pressure-related enzymes may aggregate COVID-19 infection [39].

    Also, the proteins involved in DNA replication (DNA helicase, DNA ligase, DNA polymerase, DNA primase, topoisomerase), stroke-related enzymes (proteolytic enzymes) and tuberculosis-related enzymes (mura, Murc, Mure, Murf) are obtained using the proposed method. Sathyanarayanan et al. [34] analyzed the PPI of COVID-19 and identified influential proteins and corresponding enzymes. The present algorithm identifies the same set of enzymes, with more proteins and other enzymes related to common diseases. Therefore, based on the enzymes, a person affected by COVID-19 may be overwhelmed by the new disease. Thus, clustering hub proteins for COVID-19 is not possible because many intersecting hub proteins are obtained in the enzymes (or clusters). From Table 3, some significant proteins of enzymes Alanine Transaminase, AT-Ⅱ, MurA and MurC are obtained, but these influential proteins and hence the enzymes are not obtained from the graph algorithm.

    For diabetes, we obtain proteins that belong to the enzymes alanine transaminase (ALT), ALP, aspartate transaminase (AST), GGT, sirtuins and the pancreas enzymes (amylase, lipase), as well as the receptor angiotensin-Ⅱ. These enzymes are risk factors for type Ⅰ and type Ⅱ diabetes. The enzymes ALT, ALP, AST and GGT are the factors of diabetes with increased odds. The normal level of these enzymes has a good response in drugs [30,31]. The pancreas enzymes serve as biomarkers for type Ⅰ diabetes. These enzymes may prevent diabetes in the initial stage [32]. The enzyme sirtuins are used in pathogenesis and treating diabetes mellitus [40,41]. These enzymes are risk factors for type Ⅰ and type Ⅱ diabetes [33]. Therefore, these enzymes have a crucial role in treating diabetes. From Table 4, there are no significant proteins in the enzyme ALT by the graph algorithm, but some influential proteins are obtained using the proposed methodology. The proteins in this enzyme are related to the increased odds of diabetes and are used in the drug for diabetes [30].

    In this work, directed hypergraphs and their unimodular properties are being exploited to analyze the PPI of Parkinson's disease, COVID-19 and diabetes mellitus. Here, the pathway network of PPI was obtained using the novel-directed hypergraph depth-first search. The pathway network consists of the most extended possible pathway interactions of PPI. Thus, the complexity of pathway network analysis is a challenge. The minor hypergraph reduces the pathway network dimension and complexity. Hence, the logical operations on the matrices approach to implement the properties of the minor hypergraph for dimensionality reduction are introduced. These logical operations have a significant advantage over elementary operations on matrices. Finally, the properties of unimodular hypergraph identifies the influential enzymes based on the clusters of proteins related to the diseases. It leads to a new prospect to treat the diseases or to identify the pattern and characteristics of diseases in the analysis. Also, the results obtained were verified by comparing the literature on the diseases as presented in Section 4. Furthermore, the proposed methodology outperforms the graph algorithms that are limited in the path length. The future scope is to predict more interaction features of pathway network analysis using biological experimental and / or mathematical modeling, and extending path lengths that could handle MB's and GB's of data.

    The authors declare that have not used Artificial Intelligence (AI) tools in the creation of this article.

    The authors would like to acknowledge SASTRA Deemed University for supporting this research work.

    The authors declare that there are no conflicts of interest.



    [1] J. De Las Rivas, C. Fontanillo, Protein–protein interactions essentials: key concepts to building and analyzing interactome networks, PLoS Comput. Biol., 6 (2010), e1000807. https://doi.org/10.1371/journal.pcbi.1000807 doi: 10.1371/journal.pcbi.1000807
    [2] D. Kurzbach, Network representation of protein interactions: Theory of graph description and analysis, Protein Sci., 25 (2016), 1617–1627. https://doi.org/10.1002/pro.2963 doi: 10.1002/pro.2963
    [3] K. Raman, Construction and analysis of protein–protein interaction networks, Autom. Exp., 2 (2010), 1–11. https://doi.org/10.1186/1759-4499-2-2 doi: 10.1186/1759-4499-2-2
    [4] B. H. Junker, F. Schreiber, Analysis of Biological Networks, John Wiley & Sons, 2011.
    [5] D. Petrey, H. Zhao, S. J. Trudeau, D. Murray, B. Honig, PrePPI: A structure informed proteome-wide database of protein-protein interactions, J. Mol. Biol., 435 (2023), 168052. https://doi.org/10.1016/j.jmb.2023.168052 doi: 10.1016/j.jmb.2023.168052
    [6] D. Vella, S. Marini, F. Vitali, D. D. Silvestre, G. Mauri, R. Bellazzi, MTGO: PPI network analysis via topological and functional module identification, Sci. Rep., 8 (2018), 5499. https://doi.org/10.1038/s41598-018-23672-0 doi: 10.1038/s41598-018-23672-0
    [7] F. Sun, J. Sun, Q. Zhao, A deep learning method for predicting metabolite–disease associations via graph neural network, Brief. Bioinf., 23 (2022), bbac266. https://doi.org/10.1093/bib/bbac266 doi: 10.1093/bib/bbac266
    [8] W. Wang, L. Zhang, J. Sun, Q. Zhao, J. Shuai, Predicting the potential human lncRNA–miRNA interactions based on graph convolution network with conditional random field, Brief. Bioinf., 23 (2022), bbac463. https://doi.org/10.1093/bib/bbac463 doi: 10.1093/bib/bbac463
    [9] H. Gao, J. Sun, Y. Wang, Y. Lu, L. Liu, Q. Zhao, et al., Predicting metabolite–disease associations based on auto-encoder and non-negative matrix factorization, Brief. Bioinf., 24 (2022), bbad259. https://doi.org/10.1093/bib/bbad259 doi: 10.1093/bib/bbad259
    [10] Z. Chen, L. Zhang, J. Sun, R. Meng, S. Yin, Q. Zhao, DCAMCP: A deep learning model based on capsule network and attention mechanism for molecular carcinogenicity prediction, J. Cell. Mol. Med., 27 (2023), 3117–3126. https://doi.org/10.1111/jcmm.17889 doi: 10.1111/jcmm.17889
    [11] R. Meng, S. Yin, J. Sun, H. Hu, Q. Zhao, scAAGA: Single cell data analysis framework using asymmetric autoencoder with gene attention, Comput. Biol. Med., 165 (2023), 107414. https://doi.org/10.1016/j.compbiomed.2023.107414 doi: 10.1016/j.compbiomed.2023.107414
    [12] T. Wang, J. Sun, Q. Zhao, Investigating cardiotoxicity related with hERG channel blockers using molecular fingerprints and graph attention mechanism, Comput. Biol. Med., 153 (2023), 106464. https://doi.org/10.1016/j.compbiomed.2022.106464 doi: 10.1016/j.compbiomed.2022.106464
    [13] F. Klimm, C. M. Deane, G. Reinert, Hypergraphs for predicting essential genes using multiprotein complex data, J. Complex Networks, 9 (2021). https://doi.org/10.1093/comnet/cnaa028 doi: 10.1093/comnet/cnaa028
    [14] S. Feng, E. Heath, B. Jefferson, C. Joslyn, H. Kvinge, H. D. Mitchell, et al., Hypergraph models of biological networks to identify genes critical to pathogenic viral response, BMC Bioinf., 22 (2021), 1–21. https://doi.org/10.1186/s12859-021-04197-2 doi: 10.1186/s12859-021-04197-2
    [15] V. Swaminathan, R. Gangothri, K. Kannan, Unimodular hypergraph for DNA sequencing: A polynomial time algorithm, Proc. Natl. Acad. Sci. India-Phys. Sci., 90 (2020), 49–56. https://doi.org/10.1007/s40010-018-0561-z doi: 10.1007/s40010-018-0561-z
    [16] V. Swaminathan, R. Gangothri, V. Abhishek, B. S. Reddy, K. Kannan, A novel hypergraph-based genetic algorithm (hgga) built on unimodular and anti-homomorphism properties for DNA sequencing by hybridization, Interdiscip. Sci. Comput. Life Sci., 11 (2019), 397–411. https://doi.org/10.1007/s12539-017-0267-y doi: 10.1007/s12539-017-0267-y
    [17] M. R. Nallui, K. Kannan, X. Z. Gao, D. S. Roy, Multiobjective hybrid monarch butterfly optimization for imbalanced disease classification problem, Int. J. Mach. Learn. Cybern., 11 (2020), 1423–1451. https://doi.org/10.1007/s13042-019-01047-9 doi: 10.1007/s13042-019-01047-9
    [18] N. Kim, H. J. Lee, Target enzymes considered for the treatment of Alzheimer's disease and Parkinson's disease, BioMed. Res. Int., 2020 (2020). https://doi.org/10.1155/2020/2010728 doi: 10.1155/2020/2010728
    [19] M. Goldstein, A. Lieberman, The role of the regulatory enzymes of catecholamine synthesis in Parkinson's disease, Neurology, 42 (1992), 8–12. http://europepmc.org/abstract/MED/1350074
    [20] W. D. Rausch, F. Wang, K. Radad, From the tyrosine hydroxylase hypothesis of Parkinson's disease to modern strategies: a short historical overview, J. Neural Transm., 129 (2022), 487–495. https://doi.org/10.1007/s00702-022-02488-3 doi: 10.1007/s00702-022-02488-3
    [21] M. Nakano, H. Imamura, N. Sasaoka, M. Yamamoto, N. Uemura, T. Shudo, et al., ATP maintenance via two types of ATP regulators mitigates pathological phenotypes in mouse models of Parkinson's disease, EBioMedicine, 22 (2017), 225–241. https://doi.org/10.1016/j.ebiom.2017.07.024 doi: 10.1016/j.ebiom.2017.07.024
    [22] E. Angelopoulou, E. Karlafti, V. E. Georgakopoulou, P. Papalexis, S. G. Papageorgiou, T. Tegos, et al., Exploring the role of ACE2 as a connecting link between COVID-19 and Parkinson's disease, Life, 13 (2023), 536. https://doi.org/10.3390/life13020536 doi: 10.3390/life13020536
    [23] E. Estrada, Cascading from SARS-CoV-2 to parkinson's disease through protein-protein interactions, Viruses, 13 (2021), 897. https://doi.org/10.3390/v13050897 doi: 10.3390/v13050897
    [24] T. Jia, Y. Yang, X. Lu, Q. Zhu, K. Yang, X. Zhou, Link prediction based on tensor decomposition for the knowledge graph of COVID-19 antiviral drug, Data Intell., (2022), 1–12. https://doi.org/10.1162/dint_a_00117 doi: 10.1162/dint_a_00117
    [25] Y. Guo, F. Esfahani, X. Shao, V. Srinivasan, A. Thomo, L. Xing, et al., Integrative COVID-19 biological network inference with probabilistic core decomposition, Brief. Bioinf., 23 (2022), bbab455. https://doi.org/10.1093/bib/bbab455 doi: 10.1093/bib/bbab455
    [26] B. S. Kamel, C. R. Voolstra, M. Medina, BioMine-DB: A database for metazoan biomineralization proteins, Biol. Mater. Sci., (2016), 1–9. https://doi.org/10.7287/peerj.preprints.1983v2 doi: 10.7287/peerj.preprints.1983v2
    [27] M. I. Hasan, M. H. Rahman, M. B. Islam, M. Z. Islam, M. A. Hossain, M. A. Moni, Systems Biology and Bioinformatics approach to Identify blood based signatures molecules and drug targets of patient with COVID-19, Inf. Med. Unlocked, 28 (2022), 100840. https://doi.org/10.1016/j.imu.2021.100840 doi: 10.1016/j.imu.2021.100840
    [28] G. Li, X. He, L. Zhang, Q. Ran, J. Wang, A. Xiong, et al., Assessing ACE2 expression patterns in lung tissues in the pathogenesis of COVID-19, J. Autoimmun., 112 (2020), 102463. https://doi.org/10.1016/j.jaut.2020.102463 doi: 10.1016/j.jaut.2020.102463
    [29] F. Messina, E. Giombini, C. Montaldo, A. A. Sharma, A. Zoccoli, R. P. Sekaly, et al., Looking for pathways related to COVID-19: confirmation of pathogenic mechanisms by SARS-CoV-2–host interactome, Cell Death Dis., 12 (2021), 1–10. https://doi.org/10.1038/s41419-021-03881-8 doi: 10.1038/s41419-021-03881-8
    [30] M. N. Karimabad, P. Khalili, F. Ayoobi, A. Esmaeili-Nadimi, C. L. Vecchia, Z. Jamali, Serum liver enzymes and diabetes from the Rafsanjan cohort study, BMC Endocr. Disord., 22 (2022), 1–12. https://doi.org/10.1186/s12902-022-01042-2 doi: 10.1186/s12902-022-01042-2
    [31] S. C. C. Chen, S. P. Tsai, J. Y. Jhao, W. K. Jiang, C. K. Tsao, L. Y. Chang, Liver fat, hepatic enzymes, alkaline phosphatase and the risk of incident type 2 diabetes: a prospective study of 132,377 adults, Sci. Rep., 7 (2017), 4649. https://doi.org/10.1038/s41598-017-04631-7 doi: 10.1038/s41598-017-04631-7
    [32] J. J. Ross, C. H. Wasserfall, R. Bacher, D. J. Perry, K. McGrail, A. L. Posgai, et al., Exocrine pancreatic enzymes are a serological biomarker for type 1 diabetes staging and pancreas size, Diabetes, 70 (2021), 944–954. https://doi.org/10.2337/db20-0995 doi: 10.2337/db20-0995
    [33] A. Al-Kouh, F. Babiker, M. Al-Bader, Renin-angiotensin system antagonism protects the diabetic heart from ischemia/reperfusion injury in variable hyperglycemia duration settings by a glucose transporter type 4-mediated pathway, Pharmaceuticals, 16 (2023), 238. https://doi.org/10.3390/ph16020238 doi: 10.3390/ph16020238
    [34] G. Sathyanarayanan, S. Supriya, N. S. Ranjan, N. Janmenjoy, V. Swaminathan, Central hubs prediction for bio networks by directed hypergraph-GA with validation to COVID-19 PPI, Pattern Recognit. Lett., 153 (2022), 246–253. https://doi.org/10.1016/j.patrec.2021.12.015 doi: 10.1016/j.patrec.2021.12.015
    [35] I. Adler, T. Gavenčiak, T. Klimošová, Hypertree-depth and minors in hypergraphs, Theor. Comput. Sci., 463 (2012), 84–95. https://doi.org/10.1016/j.tcs.2012.09.007 doi: 10.1016/j.tcs.2012.09.007
    [36] M. Raghavachari, A constructive method to recognize the total unimodularity of a matrix, Zeitschrift für Oper. Res., 20 (1976), 59–61. https://doi.org/10.1007/BF01916748 doi: 10.1007/BF01916748
    [37] The UniProt Consortium, UniProt: the Universal Protein knowledgebase in 2023, Nucleic Acids Res., 51 (2023), D523–D531. https://doi.org/10.1093/nar/gkac1052 doi: 10.1093/nar/gkac1052
    [38] M. H. M. E. Alves, L. C. Mahnke, T. C. Macedo, T. K. dos S. Silva, L. B. C. Junior, The enzymes in COVID-19: A review, Biochimie, 197 (2022), 38–48. https://doi.org/10.1016/j.biochi.2022.01.015 doi: 10.1016/j.biochi.2022.01.015
    [39] M. H. M. E. Alves, L. C. Mahnke, T. C. Macedo, T. K. dos Santos Silva, L. B. C. Junior, COVID-19 associated liver injury: An updated review on the mechanisms and management of risk groups, Liver Res., 7 (2023), 207–215. https://doi.org/10.1016/j.livres.2023.07.001 doi: 10.1016/j.livres.2023.07.001
    [40] K. Turkmen, A. Karagoz, A. Kucuk, Sirtuins as novel players in the pathogenesis of diabetes mellitus, World J. Diabetes, 5 (2014), 894. https://doi.org/10.4239/wjd.v5.i6.894 doi: 10.4239/wjd.v5.i6.894
    [41] J. Song, B. Yang, X. Jia, M. Li, W. Tan, S. Ma, et al., Distinctive roles of sirtuins on diabetes, protective or detrimental?, Front. Endocrinol., 9 (2018), 724. https://https://doi.org/10.3389/fendo.2018.00724 doi: 10.3389/fendo.2018.00724
  • This article has been cited by:

    1. Haozhe Yin, Kai Wang, Wenjie Zhang, Ying Zhang, Ruijia Wu, Xuemin Lin, Efficient Computation of Hyper-Triangles on Hypergraphs, 2024, 18, 2150-8097, 729, 10.14778/3712221.3712238
  • Reader Comments
  • © 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1500) PDF downloads(70) Cited by(1)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog