
Citation: Mónica Clapp, Filomena Pacella. Existence of nonradial positive and nodal solutions to a critical Neumann problem in a cone[J]. Mathematics in Engineering, 2021, 3(3): 1-15. doi: 10.3934/mine.2021022
[1] | Huiqing Wang, Sen Zhao, Jing Zhao, Zhipeng Feng . A model for predicting drug-disease associations based on dense convolutional attention network. Mathematical Biosciences and Engineering, 2021, 18(6): 7419-7439. doi: 10.3934/mbe.2021367 |
[2] | Chenhao Wu, Lei Chen . A model with deep analysis on a large drug network for drug classification. Mathematical Biosciences and Engineering, 2023, 20(1): 383-401. doi: 10.3934/mbe.2023018 |
[3] | Saranya Muniyappan, Arockia Xavier Annie Rayan, Geetha Thekkumpurath Varrieth . DTiGNN: Learning drug-target embedding from a heterogeneous biological network based on a two-level attention-based graph neural network. Mathematical Biosciences and Engineering, 2023, 20(5): 9530-9571. doi: 10.3934/mbe.2023419 |
[4] | Wanying Xu, Xixin Yang, Yuanlin Guan, Xiaoqing Cheng, Yu Wang . Integrative approach for predicting drug-target interactions via matrix factorization and broad learning systems. Mathematical Biosciences and Engineering, 2024, 21(2): 2608-2625. doi: 10.3934/mbe.2024115 |
[5] | Xianfang Wang, Qimeng Li, Yifeng Liu, Zhiyong Du, Ruixia Jin . Drug repositioning of COVID-19 based on mixed graph network and ion channel. Mathematical Biosciences and Engineering, 2022, 19(4): 3269-3284. doi: 10.3934/mbe.2022151 |
[6] | Hong Yuan, Jing Huang, Jin Li . Protein-ligand binding affinity prediction model based on graph attention network. Mathematical Biosciences and Engineering, 2021, 18(6): 9148-9162. doi: 10.3934/mbe.2021451 |
[7] | Dong Ma, Shuang Li, Zhihua Chen . Drug-target binding affinity prediction method based on a deep graph neural network. Mathematical Biosciences and Engineering, 2023, 20(1): 269-282. doi: 10.3934/mbe.2023012 |
[8] | Lei Chen, Kaiyu Chen, Bo Zhou . Inferring drug-disease associations by a deep analysis on drug and disease networks. Mathematical Biosciences and Engineering, 2023, 20(8): 14136-14157. doi: 10.3934/mbe.2023632 |
[9] | Jiahui Wen, Haitao Gan, Zhi Yang, Ran Zhou, Jing Zhao, Zhiwei Ye . Mutual-DTI: A mutual interaction feature-based neural network for drug-target protein interaction prediction. Mathematical Biosciences and Engineering, 2023, 20(6): 10610-10625. doi: 10.3934/mbe.2023469 |
[10] | Xuelin Gu, Banghua Yang, Shouwei Gao, Lin Feng Yan, Ding Xu, Wen Wang . Application of bi-modal signal in the classification and recognition of drug addiction degree based on machine learning. Mathematical Biosciences and Engineering, 2021, 18(5): 6926-6940. doi: 10.3934/mbe.2021344 |
Disease is a great threat to human life. Researchers have attempted to design effective schemes for treating different diseases when they are discovered. For a long time, drugs have been deemed to be one of the most effective ways to treat various diseases. However, the utility of a single drug for treating complex diseases is limited as these diseases always involve multiple targets and one drug cannot target them at one time. In this case, combination drug therapy is a popular way to tackle this problem. Multiple drugs are taken at the same time, which can improve drug efficacy and reduce drug resistance [1]. However, this scheme also brings other problems. Some drugs can interact with others and this interaction can induce adverse drug events [2]. On one hand, such events are harmful to patients' health. On the other hand, they can lead to the withdrawal of drugs from the market, bringing great risks to pharmaceutical companies. Clearly, determination of drug-drug interactions (DDIs) is an important topic in drug research.
In vitro experiments and clinical trials are traditional ways to determine DDIs. However, these methods are of low efficiency and cannot give large-scale tests. Furthermore, the execution of these methods is expensive. In recent years, computational methods have become alternative ways to identify DDIs as these methods have some evident advantages, such as high efficiency and low cost. Among current computational methods for predicting DDIs, machine learning-based methods are an important portion. In early examples, investigators employed traditional machine learning algorithms to construct the prediction methods. For example, Chen et al. proposed a scheme to assess the similarity of two drugs and used the nearest neighbor algorithm to identify DDIs based on this scheme [3]. Cheng et al. employed support vector machines to identify DDIs, where each drug pair was represented by features derived from simplified molecular input line entry system (SMILES) formats of two drugs in the pair and their side effects [4]. Ran et al. adopted drug fingerprints and random forest to build a model for identifying DDIs [5]. Cheng et al. integrated the Gaussian interaction profile kernel on DDI profiles and the regularized least squares classifier to predict new DDIs [6]. Rohani and Eslahchi integrated multiple drug similarities to achieve high-level features which were fed into one neural network to train the DDI prediction model [7]. Recently, deep learning algorithms have been deemed to be more powerful than traditional machine learning algorithms and have been successfully applied to various fields [8,9,10,11]. For the prediction of DDIs, several deep learning-based methods have been proposed to date. He et al. proposed a multi-type feature fusion based on graph neural networks for the prediction of DDIs [12]. Yan et al. extracted unified drug mapping features from five drug-related heterogeneous information sources and employed deep neural networks to infer DDIs [13]. Lin et al. proposed a model to predict DDI events using multi-source drug fusion, multi-source feature fusion, and the transformer self-attention mechanism [14]. Feng et al. extracted drug features from a DDI network via a graph convolution network and adopted the deep neural network to make predictions [15]. Liu et al. learnt drug representations from multiple drug feature networks, accessed drug pair features through an attention neural network, and finally used a deep neural network to make predictions [16]. Most current deep learning-based methods require various drug information. Although their performance is excellent, their application is a problem as some types of drug information may not be available for some drugs. Thus, it is necessary to design a deep learning-based method for predicting DDIs which only needs commonly used drug information. This information should be analyzed by a certain deep learning algorithm so that an efficient model can be built.
In this study, a new deep learning-based model was built to predict DDIs. Different from previous deep learning-based models, this model only adopts drug fingerprints, which enhances its application values. With validated DDIs sourced from DrugBank [17,18], a drug network was built, which was fed into the GraphSAGE [19] along with the drug features derived from their fingerprints. Based on the high-level drug features yielded by GraphSAGE, the score of two drugs was calculated via inner product, measuring the likelihood of the interaction between them. The 10-fold cross-validation results indicated that the model had high performance, which was better than the previous model directly using drug fingerprint features. This performance was also competitive compared with some previous powerful models. Furthermore, we elaborated the reasonability of the model architecture, and analyzed the strengths and limitations of the model. Finally, some latent DDIs inferred by this model were analyzed to prove its prediction ability.
The DDI dataset used in this study was retrieved from Ran et al.'s study [5]. This dataset contains 37,496 DDIs, involving 722 experimental validated drugs. These DDIs were extracted from the DDIs reported in DrugBank (https://go.drugbank.com/) [17,18], a well-known dataset containing information on drugs and drug targets. They were regarded as positive samples in this study. The 37,496 DDIs can be indicated by a matrix with 722 rows and 722 columns, denoted as A. Aij=1 if and only if the i-th drug di and j-th drug dj can interact with each other.
To obtain pattern that can identify positive samples, the negative samples were necessary. However, no public datasets collected the validated pairs of drugs that have been determined not to be actual DDIs because they lacked practical applications. The unlabeled pairs of drugs can be latent negative samples. However, they were 222,785 unlabeled pairs, which were much greater in number than the positive samples. In this study, we randomly selected as many unlabeled pairs as there were positive samples, and labeled these the negative samples. The above positive and negative samples constituted a dataset, denoted as S.
In recent years, networks have been deemed a powerful method to investigate various biological problems [20,21,22,23,24] because they can systematically organize all objects and overview each object with all other objects in the background. Here, we also adopted such a method to investigate DDIs.
As the investigated DDIs covered 722 drugs, they were defined as nodes in the constructed network. As for the edges in this network, they were determined by the 37,496 positive samples. In detail, for two nodes in the network, if their corresponding drugs can comprise a positive sample, they were connected by an edge. Therefore, the constructed network contained 722 nodes and 37,496 edges. To show the topological structure of this network, we counted the degree of each node, indicating the number of DDIs containing each drug, as shown in Figure 1. It can be found that some nodes have high degrees, suggesting that the corresponding drugs can interact with many other drugs. Most nodes have degrees lower than 200. The media number was 82 and the average number was 103.87. For convenience, this network was denoted as N. The proposed model can infer new connections (latent DDIs) based on the currently known DDIs.
Fingerprint is widely used in investigation on drug-related problems. Different from the fingerprint of human, which is always modeled as an image processing problem [25,26], a fingerprint for drugs or chemicals is a substructure, which is originally designed to assist in chemical database substructure searching [27]. Later, it was found to be useful for chemical clustering and classification. Based on the fingerprints of a given drug, the drug can be represented by a numeric vector. Although this representation seems an ordinary form to represent drugs, models based on it always provide good performance. Here, we also adopted this form to build the original representation of drugs. In detail, the SMILES formats of 722 drugs were first sourced from DrugBank. Then, the RDKit (http://www.rdkit.org) was adopted to extract the ECFP_4 fingerprint [28] from these formats. This type of fingerprint involves several special substructures, which are generated by the following processes:
First, the Daylight atomic invariants rule is used to assign an integer identifier to each atom in a molecule. This rule considers six properties of an atom, including the number of immediate non-hydrogen neighbors, the valence minus the number of hydrogens, the atomic number, the atomic mass, the atomic charge, and the number of attached hydrogens (both implicit and explicit). In ECFP features, a seventh attribute is commonly used to indicate whether an atom is contained in at least one ring. The given features are then converted into integer values using a hash function, which acts as the initial identifiers of the atoms.
Then, the identifier is updated iteratively to include information about the atom's neighborhood, including the type of bond (single, double, triple, or aromatic) that connects the atom. Each atom collects its own identifier and the identifiers of neighboring atoms into an array. A hash function is then applied to reduce this array to a new single integer identifier. After generating new identifiers for all atoms, they replace the old ones and are added to the fingerprint set. This process is repeated a predefined number of times.
Once completed, any duplicate identifiers in the fingerprint set are removed, leaving only the remaining integer identifiers to define the ECFP.
Finally, a binary vector is constructed for each drug based on its ECFP_4 fingerprints. Each component in this vector indicates the presence and absence of a certain substructure. It is set to 1 if the drug has the corresponding substructure; otherwise, it is set to 0. Here, the ECFP_4 fingerprint contains 1024 substructures. Thus, each drug d is represented by a 1024-dimensional binary vector, formulated by
vd=[f1,f2,⋯,f1024]T. | (1) |
Clearly, this vector contains which substructures the drug d has, reflecting its entire structure. The feature vectors of 722 drugs are aggregated in a matrix, denoted by FMF, where the i-th row is the feature vector of the i-th drug.
In Sections 2.2 and 2.3, two types of drug properties were introduced. For drug fingerprint, it can be deemed as providing linear information of drugs, whereas the drug network provides non-linear information of drugs. Proper fusion of these two types of drug properties was helpful to access high-level drug features. Recently, node embedding learning has been a hot topic in machine learning. The main purpose of this method is to assign an informative feature vector to each node in a given network. Graph convolutional networks (GCNs) [29] are one of the powerful algorithms in this field and has wide applications in bioinformatics [30,31,32,33]. Here, one of its variants, named GraphSAGE [19], was employed. This algorithm was applied to the drug network N and the feature vectors of drugs mentioned in Section 2.3 was set as its input. Based on N and the original feature vectors of the drugs, we attempted to access more informative feature vectors of drugs.
The main idea of GraphSAGE is to efficiently aggregate feature information from a node's local neighborhood, thereby generating a more informative feature representation of the node. Different from GCN, which always considers all direct neighbors, GraphSAGE samples a fixed-size set of neighbors, called local neighborhood. This operation can decrease the computational complex of the algorithm. For node v, its local neighborhood is denoted as N(v). Generally, the aggregation procedures can be run for K rounds, which determines the search depth. In the k-th round, an aggregation function is employed, denoted by AGGREGATEk. The feature vectors of the neighbors in the local neighborhood of v are aggregated using this aggregation function, formulated as
hkN(v)=AGGREGATEk({hk−1u:uϵN(v)}), | (2) |
where hk−1u is the feature vector of u after the (k-1)-th round. Specifically, h0u is the input feature vector of u. This aggregated feature vector is concatenated to the feature vector of v after the (k-1)-th round, which is further refined by a fully connected layer with a nonlinear activation function σ, formulated as
hkv=σ(Wk⋅CONCAT(hk−1v,hkN(v))), | (3) |
where Wk (k∈{1,2,⋯,K}) is a weight matrix, which can be trained. At the end of the k-th round, hkv is updated to hkv/||hkv||2.
In this study, a model based on GraphSAGE was proposed to identify DDIs. The entire procedure is illustrated in Figure 2.
First, the validated DDIs retrieved from DrugBank were used to construct the drug network N as described in Section 2.2. At the same time, the ECFP_4 fingerprints of 722 drugs were extracted, which were used to generate the feature matrix FMF. Then, the above network and feature matrix were fed into GraphSAGE to produce the new feature matrix FMN of 722 drugs. In this procedure, the selected unlabeled pairs of drugs (i.e., negative samples) were employed to train the parameters in GraphSAGE, thereby accessing more powerful drug features. The i-th row of FMN was the new feature vector of the drug di. Based on the new feature representations of drugs, the score between drugs di and dj was calculated by the inner product of their feature vectors, formulated as
S(di,dj)=FMN(di)⋅FMN(dj)T. | (4) |
If this score is larger than a predefined threshold, di and dj were predicted to interact with each other, i.e., they comprised a DDI (positive sample); otherwise, they were predicted to constitute a negative sample.
Loss function and optimization. There are some parameters that can be trained in GraphSAGE (e.g., the weight matrices in the aggregation procedure). These were optimized by standard stochastic gradient descent and backpropagation techniques. As a binary classification problem, we selected the binary cross entropy as the loss function, which is defined as
L=−∑(ylogp(x)+(1−y)log(1−p(x))), | (5) |
where p(x) is the prediction of x yielded by the model and y is the true label. The optimization procedure was implemented using the Adam optimizer [34].
The proposed model mainly used the GraphSAGE to access high-level drug features. Its time complexity was the same as GraphSAGE. According to the original reference of GraphSAGE [19], its time complexity is O(∏Ki=1Si), where K is the number of rounds in aggregation procedures and Si is the fixed number of sampling neighbors in the i-th round.
Cross-validation methods have wide applications in evaluating the performance of classification models [35]. In this method, samples are equally and randomly divided into several parts. Each part is picked up to comprise the test dataset one by one, whereas the rest of the parts constitute the training dataset. The model based on the training dataset is applied to the test dataset to assess its performance. Generally, most studies opt to divide samples into five or ten parts, i.e., 5 or 10-fold cross-validation. Here, we selected 10-fold cross-validation to test the performance of constructed models. In each round of the cross-validation, some edges, corresponding to the positive samples in the test dataset, in the drug network N were discarded. This can strictly isolate the information of test samples from training the model.
To determine the prediction quality of cross-validation results, several measurements have been proposed for binary classification problems. This study adopted the following measurements: precision, recall, overall accuracy (ACC), F1-measure [22,36,37,38,39,40,41,42], and Matthews correlation coefficient (MCC) [43], which can be computed by
Precision=TPTP+FP, | (6) |
Recall=TPTP+FN, | (7) |
F1−measure=2×Recall×PrecisionRecall+Precision, | (8) |
ACC=TP+TNTP+FP+TN+FN, | (9) |
MCC=TP×TN−FP×FN√(TP+FP)×(TP+FN)×(TN+FP)×(TN+FN), | (10) |
where TP/TN stands for true positives/negatives, and FP/FN represents false positives/negatives.
In addition, we employed the receiver operating characteristic (ROC) and precision-recall (PR) curves to fully display the performance of classification models under different thresholds. The ROC curve defines the true positive rate (same as recall) as the Y-axis and the false positive rate (i.e., FPTN+FP) as the X-axis. The PR curve is drawn by defining recall as the X-axis and precision as the Y-axis. The area under the above two curves, denoted by AUROC and AUPR, are important measurements to assess the quality of predicted results. They were given along with the curves in this study.
In this study, we designed a new model for predicting DDIs. The entire procedure is illustrated in Figure 1. This section reports the performance of the model and elaborates on its reasonability, superiority, and limitations. Furthermore, the latent DDIs identified by the model are analyzed.
In the GraphSAGE-based model, some parameters should be determined. All parameters were related to GraphSAGE. First, the aggregation procedures in GraphSAGE were executed in two rounds, i.e., K = 2. In this case, the features of 2-neighbors can be aggregated to the target node. Second, the mean aggregator was selected to aggregate neighbors' features, i.e., Eqs (2) and (3) were combined and detailed as
hkv=σ(Wk⋅MEAN({hk−1v}∪{hk−1u:uϵN(v)})). | (11) |
Finally, the dimensions of output features in two aggregation procedures were set to 128 and 16, i.e., the sizes of W1 and W2 were 128×1024 and 16×128, respectively.
With the parameter setting mentioned in Section 3.1, the GraphSAGE-based model was evaluated by 10-fold cross-validation. The ROC and PR curves are illustrated in Figures 3 and 4, respectively. It can be found that each ROC or PR curve under one fold was nearly perfect. According to the AUROC and AUPR values in these two figures, AUROC values on ten folds changed between 0.9400 and 0.9950, and the AUPR values varied in the same interval. These results suggested the high performance of the model.
In the GraphSAGE-based model, we used the inner product to finally measure the linkage of two drugs. A threshold of 0 can be deemed as a natural choice to determine DDIs, i.e., drug pairs with scores higher than zero were predicted as positive samples; otherwise, they were identified as negative samples. Thus, we counted the measurements under this threshold, which are listed in Table 1. The average precision and recall were 0.967 and 0.798, respectively. The precision was quite high, whereas recall was not very satisfactory. The low recall meant that several positive samples were predicted as negative samples. As for the overall measurements, ACC, F1-measure, and MCC, these were around 0.861, 0.874, and 0.739, respectively. Such performance is acceptable. Thus, it was suggested to use this threshold to determine DDIs.
Fold index | Precision | Recall | ACC | F1-measure | MCC |
1 | 0.980 | 0.826 | 0.887 | 0.897 | 0.788 |
2 | 0.969 | 0.776 | 0.845 | 0.862 | 0.712 |
3 | 0.977 | 0.761 | 0.835 | 0.856 | 0.699 |
4 | 0.976 | 0.789 | 0.857 | 0.872 | 0.735 |
5 | 0.975 | 0.787 | 0.856 | 0.871 | 0.733 |
6 | 0.985 | 0.819 | 0.884 | 0.895 | 0.784 |
7 | 0.991 | 0.834 | 0.897 | 0.905 | 0.808 |
8 | 0.933 | 0.811 | 0.857 | 0.867 | 0.723 |
9 | 0.942 | 0.794 | 0.849 | 0.861 | 0.709 |
10 | 0.944 | 0.783 | 0.841 | 0.856 | 0.697 |
Mean | 0.967 | 0.798 | 0.861 | 0.874 | 0.739 |
The proposed GraphSAGE-based model has several parts. This section conducts some ablation tests to show the contribution of each part.
GraphSAGE is the most important part of the model, and is in charge of generating high-level drug features. Here, we removed this part to reconstruct the model. In this model, the drug fingerprint features (cf. Eq (1)) were directly fed into the scoring scheme (cf. Eq (4)) to measure the interaction likelihood of two drugs. This model was also assessed using 10-fold cross-validation, generating an AUROC of 0.5621 and AUPR of 0.5469, as listed in Table 2. Evidently, such performance was much lower than the GraphSAGE-based model, proving the importance of GraphSAGE in improving the quality of drug features.
Object | Operation | AUROC | AUPR |
GraphSAGE | Removal | 0.5621 | 0.5469 |
Scoring scheme | Replacement | 0.7174 | 0.6388 |
The proposed model adopted the inner product to assess the interaction likelihood of two drugs. To prove this selection was reasonable, we replaced it with the sum operation, i.e., Eq (4) was replaced with the following equation
S(di,dj)=FMN(di)+FMN(dj). | (12) |
A model with this score scheme was built and also evaluated by 10-fold cross-validation, yielding the AUROC and AUPR, listed in Table 2. It can be found that the AUROC was 0.7174, whereas the AUPR was 0.6388. They were also much lower than those of the GraphSAGE-based model, which were all higher than 0.97. This result indicated that the inner product was more effective than the sum operation to capture the interaction strength of two drugs.
With the above arguments, the GraphSAGE-based model demonstrated high performance in identifying DDIs. However, its performance on different drugs may not be the same. As shown in Figure 1, the degrees of nodes in the drug network covers a large range, suggesting that some drugs can interact with many drugs, whereas others can interact with only a few drugs. It was interesting to investigate the performance of the GraphSAGE-based model on the above two drug groups. In view of this, the 722 drugs were equally divided into two groups according to their degrees in the network. The first group contained the drugs with high degrees, whereas the second group consisted of other drugs with low degrees. For convenience, they are called the high and low groups. Accordingly, three DDI groups were constructed, named high-high, high-low and low-low groups. Evidently, the high-high group contained DDIs with two drugs in the high group, the low-low group consisted of DDIs with two drugs in the low group, and the high-low group included DDIs with one drug in the high group and one drug in the low group. For the cross-validation results of the GraphSAGE-based model, its performance on the three DDI groups was individually counted, yielding the AUROC and AUPR values listed in Table 3. These two values for the high-high group were 0.9894 and 0.9967, respectively. They decreased to 0.9461 and 0.8176 for the high-low group. As for the low-low group, they further dropped to 0.6020 and 0.2439. Clearly, the performance on the high-high group was highest, followed by that of the high-low and low-low groups. It was indicated that the GraphSAGE-based model had strong prediction ability on drugs that have been confirmed to interact with many other drugs, whereas its prediction ability on drugs that can interact with few drugs was relatively weak. This result was reasonable because drugs that can interact with many other drugs, which have high degrees in the drug network, can receive abundant information of its neighbors via GraphSAGE, which largely improved their representations, whereas the improvement for drugs that can interact with few drugs was limited.
Group | AUROC | AUPR |
High-high group | 0.9894 | 0.9967 |
High-low group | 0.9461 | 0.8176 |
Low-low group | 0.6020 | 0.2439 |
To show the superiority of the GraphSAGE-based model, we compared it with one previous model [5]. This model was built on the same dataset and also evaluated by 10-fold cross-validation. It directly adopted the drug fingerprint features to generate the representation of DDIs and selected the random forest as the prediction engine. The performance of this model and our model is listed in Table 4. It can be observed that our GraphSAGE-based model provided higher AUROC and AUPR than the previous model. The improvement was about 1%. As mentioned above, drug fingerprint features were directly adopted to constitute the features of DDIs. Such features were not very informative. The GraphSAGE-based model fused the drug fingerprint features and drug network information, containing more informative properties of drugs. This was the main reason why the GraphSAGE-based model was superior to this previous model.
Model | AUROC | AUPR |
GraphSAGE-based model | 0.9704 | 0.9727 |
Ran et al.'s model [5] | 0.9629 | 0.9601 |
NDD [7] | 0.9940 | 0.9470 |
DDIMDL [2] | 0.9979 | 0.9208 |
DPDDI [15] | 0.9560 | 0.9070 |
DANN-DDI [16] | 0.9763 | 0.9709 |
As mentioned in Section 1, several DDI prediction models have been proposed. We selected four models (DDIMDL [2], NDD [7], DPDDI [15], and DANN-DDI [16]) for further comparing with our model, where DDIMDL, DPDDI, and DANN-DDI were constructed using deep learning algorithms. The performance of the above models is listed in Table 4. It can be observed that the proposed model provided the highest AUPR, whereas the AUROC of our model was lower than that of those three models. This result indicates that the proposed model provided competitive performance compared with these previous models. Although the proposed model only adopted drug fingerprints, GraphSAGE perfectly fused the drug fingerprints and currently known DDI information, represented by the drug network. None of previous models can fuse these two types of information so perfectly. They employed more drug properties, such as target, pathway, etc., which enhanced their performance. However, this also induced the application problem because the properties of some drugs were not available. Anyway, the proposed model had special merits. The high performance guaranteed the reliability of its prediction and low number of required properties made the model more widely applicable.
This section presents case studies to further demonstrate the generalization ability of GraphSAGE-based model for predicting unknown DDIs in reliable databases and literature. Table 5 shows the 20 drug pairs with the highest predicted scores (Eq (4)).
Index | Drug 1 | Drug 2 | Score | ||
DrugBank ID | Name | DrugBank ID | Name | ||
1 | DB14043 | Palmidrol | DB14737 | CBN | 180.046 |
2 | DB13950 | WIN 55212-2 | DB14737 | CBN | 95.248 |
3 | DB01482 | Fenethylline | DB02377 | Guanine | 31.747 |
4 | DB02377 | Guanine | DB14132 | 8-chlorotheophylline | 30.833 |
5 | DB01384 | Paramethasone | DB14681 | Cortisone | 29.575 |
6 | DB02377 | Guanine | DB13592 | Etamiphylline | 29.345 |
7 | DB02377 | Guanine | DB13203 | Bamifylline | 29.007 |
8 | DB03322 | Dexpropranolol | DB08807 | Bopindolol | 28.920 |
9 | DB13856 | Fluclorolone | DB14681 | Cortisone | 27.811 |
10 | DB02377 | Guanine | DB13449 | Proxyphylline | 27.800 |
11 | DB02377 | Guanine | DB13634 | Pentifylline | 27.696 |
12 | DB03322 | Dexpropranolol | DB13530 | Mepindolol | 27.084 |
13 | DB01384 | Paramethasone | DB14633 | Prednisolone hemisuccinate | 26.853 |
14 | DB13843 | Cloprednol | DB14681 | Cortisone | 26.431 |
15 | DB03322 | Dexpropranolol | DB06726 | Bufuralol | 25.731 |
16 | DB02377 | Guanine | DB13812 | Bufylline | 25.679 |
17 | DB13856 | Fluclorolone | DB14633 | Prednisolone hemisuccinate | 25.640 |
18 | DB02377 | Guanine | DB13573 | Acefylline | 24.828 |
19 | DB01482 | Fenethylline | DB01978 | 7, 9-Dimethylguanine | 24.747 |
20 | DB02134 | Xanthine | DB02377 | Guanine | 24.545 |
Among the predicted DDIs, palmidrol/CBN ranked first among all predicted DDIs with a significantly high score. CBN is the first cannabis compound purified from the ancient medicinal plant Cannabis sativa. Over 120 phytocannabinoids have been isolated from Cannabis plants, including one of the main and most recognized representatives, tetrahydrocannabinol (THC) [44]. Research has demonstrated that CBN binds to two members of the G protein-coupled receptor family, namely cannabinoid receptors 1 (CB1R) and 2. It is involved in various physiological conditions and human diseases associated with the endocannabinoid system (ECS) [45,46]. CBN is a non-enzymatic oxidation byproduct of THC. Although CBN has the same mechanism of action as THC, it has been poorly studied and is currently limited to phase Ⅱ clinical trials in epidermolysis bullosa [47]. Therefore, predicting the drug interactions of CBN is useful for its clinical development and application.
Palmidrol (Palmitoylethanolamide, PEA) is an endogenous fatty acid amine that mimics several endocannabinoid-driven activities. It is primarily known for its anti-inflammatory, analgesic, and neuroprotective properties [48]. PEA cannot be considered a classic endocannabinoid as it does not bind the classical cannabinoid receptors CB1 and CB2. However, it may have a multi-modal mechanism of action primarily through the activation of the ligand-operated transcription factor PPAR-α. It also indirectly stimulates the effects of both phyto- or endocannabinoids through the ECS, thus targeting similar pathways as cannabinoids [49]. In addition, PEA may enhance the physiological activity of THC by increasing its affinity for a receptor and inhibiting its metabolic degradation. This phenomenon is referred to as the 'entourage effect', which suggests that PEA indirectly enhances the biological effects of endocannabinoids and phytocannabinoids [50,51]. Therefore, taking PEA and CBN together may provide greater benefits than taking them separately [52,53], which supports the predictions of our model.
WIN 55,212-2 (WIN)/CBN had the second-highest predictive score for the drug-drug interaction pair, after palmidrol/CBN. WIN is a synthetic aminoalkylindole derivative that is commonly used as a pharmacological tool to study the biological activity of cannabinoid receptors [54]. The mechanism of action of WIN may involve acting as a full agonist at the CB1 cannabinoid receptor with much higher affinity than THC for this receptor [55,56]. Additionally, it has been shown to activate other receptors including PPARα and PPARγ nuclear receptors [57]. Recent research has further confirmed the mechanism of anti-inflammatory activity of WIN independent of CB1, suggesting that alternative receptors mediate the effects of WIN [54,58]. To the best of our knowledge, there is currently no available data demonstrating the use of WIN in combination with other drugs, including CBN. However, scholars have also taken note of the potential negative side effects of cannabinoid-based drugs in treating several chronic diseases, such as epilepsy, chronic pain, multiple sclerosis, and neurodegenerative diseases [59,60,61,62]. Studies have shown that WIN, due to its aminoalkylindole-type structure, can significantly block G protein-coupled inward rectifier potassium channels 1 (GIRK1) and 2 (GIRK2) activated by CB1 or CB2 under high concentrations [63]. This helps to explain the adverse effects induced by WIN in vivo. However, this blocking effect was not observed in other typical cannabinoids similar in structure to CBN [63]. It is possible that the occurrence of adverse reactions may increase when WIN is used in combination with CBN.
Table 5 shows that Guanine may interact with nine drugs, including Fenethylline and Bufylline, as well as seven xanthine chemical drugs: 8-chlorotheophylline, Etamiphylline, Bamifylline, Proxyphylline, Pentifylline, Acefylline, and Xanthines. These drugs are competitive nonselective phosphodiesterase inhibitors and nonselective adenosine receptor antagonists, and are commonly used as mild stimulants and bronchodilators. As both Fenethylline [64] and Bufylline contain theophylline, a member of the xanthine family, and the nine drugs are discussed together.
According to DrugBank, the nine drugs are classified as experimental drugs, also known as discovery or pre-discovery stage drugs, which have not yet entered clinical trials or are not formally considered as candidate drugs. However, our prediction model suggests that they may interact with guanine, which could be explained by evidence from the drug structure. Guanine (G) is one of the four main nucleobases found in the nucleic acids DNA and RNA, along with adenine, cytosine, and thymine (uracil in RNA). In DNA, guanine pairs with cytosine. Guanine can be deaminated, releasing the amino group as ammonia, to form Xanthine. Xanthines form methylxanthines by adding methyl groups at different positions. This includes the nine drug classes mentioned earlier, as well as theophylline (also known as 1, 3-dimethylxanthine) and theobromine (also known as 3, 7-dimethylxanthine). Table 6 shows synthetic methylxanthines used as drugs that have functional groups other than the methyl group. Johnson et al. conducted a study using ultraviolet absorption and Fourier transform infrared (FTIR) spectroscopic methods to investigate the interaction between naturally occurring methylxanthines (such as theophylline, theobromine, and caffeine) and DNA. The study revealed that theophylline, theobromine, and caffeine interact with all the base pairs of DNA (A-T; G-C) and phosphate groups through hydrogen bond (H-bond) interactions [65]. Our model's prediction is consistent with the observation that nine purine derivative drugs interact with guanine in a similar manner.
Guanine | Drugs predicted to interact with guanine | Score | |
![]() |
Fenethylline | ![]() |
31.747 |
8-chlorotheophylline | ![]() |
30.833 | |
Etamiphylline | ![]() |
29.345 | |
Bamifylline | ![]() |
29.007 | |
Proxyphylline | ![]() |
27.800 | |
Pentifylline | ![]() |
27.696 | |
Bufylline | ![]() |
25.679 | |
Acefylline | ![]() |
24.828 | |
Xanthine | ![]() |
24.545 |
We also noted the following drug pairs in Table 5, Paramethasone/Cortisone; Fluclorolone/Cortisone; Paramethasone/Prednisolone hemisuccinate; Cloprednol/Cortisone; Fluclorolone/Prednisolone hemisuccinate, etc. These drugs belong to a class of steroid hormones that can bind with the cortisol receptor and trigger various metabolic, immunes and homeostatic effects [66]. They are also molecules that inhibit leukocyte infiltration during inflammation, interfere with inflammation response, and suppress humoral immune responses [67]. Our prediction model assigned higher prediction scores for Corticosteroid-related DDI, which may be due to the increased risk or severity of adverse reactions when drugs of the same class are used in combination.
Clinical practice often requires the combined use of multiple drugs due to the complexity of diseases and the possibility that the human body may suffer from multiple diseases simultaneously. It is important to consider the potential risks and benefits of using multiple drugs and to monitor patients closely for any adverse effects. Although DDIs may have intended benefits, they can also result in unintended side effects or toxicities. For instance, studies on the combined use of PEA and CBN are currently limited to animal experiments and phase Ⅱ clinical trials for a single indication. However, it is predicted that this drug combination may yield better results. Another example is the combination of WIN and CBN, which has not yet been reported. It is also predicted that this combination may have corresponding side effects. This is supported by experimental evidence that WIN has varying effects on CB1 receptors at high concentrations.
In clinical practice, doctors can use this model for pre-combination questioning when considering new drug combinations, in addition to traditional drug combinations, to improve collaborative treatment effectiveness. This model can provide an analytical basis for the treatment of specific and complex diseases, as well as for different drug combinations targeting different diseases. If a potential high risk of DDI is predicted, doctors should remain vigilant and consult the latest research progress, such as animal experiments or clinical trials, to verify the situation. This is necessary to ensure patient safety and treatment effectiveness.
Effective methods for obtaining drug information are crucial for ensuring rational drug combinations in modern medicine. Clinicians and pharmacologists need to stay up-to-date with the latest developments in drug development to address the issue of rational drug combinations due to the constantly updated DDI data. This study can provide guidance for the development and implementation of clinical combination drug therapy.
Although the proposed model had high performance and wide applications, it also has some limitations. As mentioned above, the model had poor performance on drugs that can interact with few drugs. This was induced by the structure of the drug network, which was constructed by currently known DDIs. If other associations between drugs were combined, a more complete drug network can be built, thereby improving the quality of features of some drugs and further enhancing the performance of the model. As for the scalability of our model, this was also a problem. The model can only predict novel DDIs between drugs that can interact with at least one drug. If one novel drug has not been detected to interact with any other drugs, it cannot be included in the drug network. In this case, its high-level features cannot be accessed, influencing the predictions of DDIs involving this drug. In future, we will continue this work to set up more perfect models.
This study proposed a GraphSAGE-based model for predicting DDIs. By employing the GraphSAGE, drug fingerprint features and the drug interaction network were perfectly fused, thereby generating high-level drug features. The test results indicated that the model with these features provided high performance. It was superior to the model directly using drug fingerprint features and was competitive compared with other previous models. The model overcame the application problems because it used commonly used drug fingerprint features. Furthermore, the practical value of the GraphSAGE-based model was demonstrated by the case studies, which analyzed the latent DDIs discovered by the model. It is hopeful that the model can be a useful tool to identify novel DDIs.
The authors declare they have not used artificial intelligence (AI) tools in the creation of this article.
The authors declare there is no conflict of interest.
[1] | Adimurthi A, Mancini G (1991) The Neumann problem for elliptic equations with critical nonlinearity. Nonlinear Anal, Sc. Norm. Super. di Pisa Quaderni, Scuola Norm. Sup., Pisa, 9-25. |
[2] | Clapp M (2016) Entire nodal solutions to the pure critical exponent problem arising from concentration. J Differ Equations 261: 3042-3060. |
[3] | del Pino M, Musso M, Pacard F, et al. (2011) Large energy entire solutions for the Yamabe equation. J Differ Equations 251: 2568-2597. |
[4] | Ding WY (1986) On a conformally invariant elliptic equation on R.n. Commun Math Phys 107: 331-335. |
[5] | Fernández JC, Petean J (2020) Low energy nodal solutions to the Yamabe equation. J Differ Equations 268: 6576-6597. |
[6] | Grossi M, Pacella F (1990) Positive solutions of nonlinear elliptic equations with critical Sobolev exponent and mixed boundary conditions. P Roy Soc Edinb A 116: 23-43. |
[7] | Lions PL, Pacella F (1990) Isoperimetric inequalities for convex cones. P Am Math Soc 109: 477- 485. |
[8] | Lions PL, Pacella F, Tricarico M (1988) Best constants in Sobolev inequalities for functions vanishing on some part of the boundary and related questions. Indiana U Math J 37: 301-324. |
[9] | Weth T (2006) Energy bounds for entire nodal solutions of autonomous superlinear equations. Calc Var Partial Dif 27: 421-437. |
[10] | Willem M (1996) Minimax Theorems, Boston: Birkh?user Boston. |
1. | Qinglan Ma, Yulong Shen, Wei Guo, Kaiyan Feng, Tao Huang, Yudong Cai, Machine Learning Reveals Impacts of Smoking on Gene Profiles of Different Cell Types in Lung, 2024, 14, 2075-1729, 502, 10.3390/life14040502 | |
2. | 小露 李, Prediction of circRNA and Disease Association Based on GraphSAGE Model, 2025, 15, 2164-5426, 1, 10.12677/hjcb.2025.151001 |
Fold index | Precision | Recall | ACC | F1-measure | MCC |
1 | 0.980 | 0.826 | 0.887 | 0.897 | 0.788 |
2 | 0.969 | 0.776 | 0.845 | 0.862 | 0.712 |
3 | 0.977 | 0.761 | 0.835 | 0.856 | 0.699 |
4 | 0.976 | 0.789 | 0.857 | 0.872 | 0.735 |
5 | 0.975 | 0.787 | 0.856 | 0.871 | 0.733 |
6 | 0.985 | 0.819 | 0.884 | 0.895 | 0.784 |
7 | 0.991 | 0.834 | 0.897 | 0.905 | 0.808 |
8 | 0.933 | 0.811 | 0.857 | 0.867 | 0.723 |
9 | 0.942 | 0.794 | 0.849 | 0.861 | 0.709 |
10 | 0.944 | 0.783 | 0.841 | 0.856 | 0.697 |
Mean | 0.967 | 0.798 | 0.861 | 0.874 | 0.739 |
Object | Operation | AUROC | AUPR |
GraphSAGE | Removal | 0.5621 | 0.5469 |
Scoring scheme | Replacement | 0.7174 | 0.6388 |
Group | AUROC | AUPR |
High-high group | 0.9894 | 0.9967 |
High-low group | 0.9461 | 0.8176 |
Low-low group | 0.6020 | 0.2439 |
Index | Drug 1 | Drug 2 | Score | ||
DrugBank ID | Name | DrugBank ID | Name | ||
1 | DB14043 | Palmidrol | DB14737 | CBN | 180.046 |
2 | DB13950 | WIN 55212-2 | DB14737 | CBN | 95.248 |
3 | DB01482 | Fenethylline | DB02377 | Guanine | 31.747 |
4 | DB02377 | Guanine | DB14132 | 8-chlorotheophylline | 30.833 |
5 | DB01384 | Paramethasone | DB14681 | Cortisone | 29.575 |
6 | DB02377 | Guanine | DB13592 | Etamiphylline | 29.345 |
7 | DB02377 | Guanine | DB13203 | Bamifylline | 29.007 |
8 | DB03322 | Dexpropranolol | DB08807 | Bopindolol | 28.920 |
9 | DB13856 | Fluclorolone | DB14681 | Cortisone | 27.811 |
10 | DB02377 | Guanine | DB13449 | Proxyphylline | 27.800 |
11 | DB02377 | Guanine | DB13634 | Pentifylline | 27.696 |
12 | DB03322 | Dexpropranolol | DB13530 | Mepindolol | 27.084 |
13 | DB01384 | Paramethasone | DB14633 | Prednisolone hemisuccinate | 26.853 |
14 | DB13843 | Cloprednol | DB14681 | Cortisone | 26.431 |
15 | DB03322 | Dexpropranolol | DB06726 | Bufuralol | 25.731 |
16 | DB02377 | Guanine | DB13812 | Bufylline | 25.679 |
17 | DB13856 | Fluclorolone | DB14633 | Prednisolone hemisuccinate | 25.640 |
18 | DB02377 | Guanine | DB13573 | Acefylline | 24.828 |
19 | DB01482 | Fenethylline | DB01978 | 7, 9-Dimethylguanine | 24.747 |
20 | DB02134 | Xanthine | DB02377 | Guanine | 24.545 |
Guanine | Drugs predicted to interact with guanine | Score | |
![]() |
Fenethylline | ![]() |
31.747 |
8-chlorotheophylline | ![]() |
30.833 | |
Etamiphylline | ![]() |
29.345 | |
Bamifylline | ![]() |
29.007 | |
Proxyphylline | ![]() |
27.800 | |
Pentifylline | ![]() |
27.696 | |
Bufylline | ![]() |
25.679 | |
Acefylline | ![]() |
24.828 | |
Xanthine | ![]() |
24.545 |
Fold index | Precision | Recall | ACC | F1-measure | MCC |
1 | 0.980 | 0.826 | 0.887 | 0.897 | 0.788 |
2 | 0.969 | 0.776 | 0.845 | 0.862 | 0.712 |
3 | 0.977 | 0.761 | 0.835 | 0.856 | 0.699 |
4 | 0.976 | 0.789 | 0.857 | 0.872 | 0.735 |
5 | 0.975 | 0.787 | 0.856 | 0.871 | 0.733 |
6 | 0.985 | 0.819 | 0.884 | 0.895 | 0.784 |
7 | 0.991 | 0.834 | 0.897 | 0.905 | 0.808 |
8 | 0.933 | 0.811 | 0.857 | 0.867 | 0.723 |
9 | 0.942 | 0.794 | 0.849 | 0.861 | 0.709 |
10 | 0.944 | 0.783 | 0.841 | 0.856 | 0.697 |
Mean | 0.967 | 0.798 | 0.861 | 0.874 | 0.739 |
Object | Operation | AUROC | AUPR |
GraphSAGE | Removal | 0.5621 | 0.5469 |
Scoring scheme | Replacement | 0.7174 | 0.6388 |
Group | AUROC | AUPR |
High-high group | 0.9894 | 0.9967 |
High-low group | 0.9461 | 0.8176 |
Low-low group | 0.6020 | 0.2439 |
Model | AUROC | AUPR |
GraphSAGE-based model | 0.9704 | 0.9727 |
Ran et al.'s model [5] | 0.9629 | 0.9601 |
NDD [7] | 0.9940 | 0.9470 |
DDIMDL [2] | 0.9979 | 0.9208 |
DPDDI [15] | 0.9560 | 0.9070 |
DANN-DDI [16] | 0.9763 | 0.9709 |
Index | Drug 1 | Drug 2 | Score | ||
DrugBank ID | Name | DrugBank ID | Name | ||
1 | DB14043 | Palmidrol | DB14737 | CBN | 180.046 |
2 | DB13950 | WIN 55212-2 | DB14737 | CBN | 95.248 |
3 | DB01482 | Fenethylline | DB02377 | Guanine | 31.747 |
4 | DB02377 | Guanine | DB14132 | 8-chlorotheophylline | 30.833 |
5 | DB01384 | Paramethasone | DB14681 | Cortisone | 29.575 |
6 | DB02377 | Guanine | DB13592 | Etamiphylline | 29.345 |
7 | DB02377 | Guanine | DB13203 | Bamifylline | 29.007 |
8 | DB03322 | Dexpropranolol | DB08807 | Bopindolol | 28.920 |
9 | DB13856 | Fluclorolone | DB14681 | Cortisone | 27.811 |
10 | DB02377 | Guanine | DB13449 | Proxyphylline | 27.800 |
11 | DB02377 | Guanine | DB13634 | Pentifylline | 27.696 |
12 | DB03322 | Dexpropranolol | DB13530 | Mepindolol | 27.084 |
13 | DB01384 | Paramethasone | DB14633 | Prednisolone hemisuccinate | 26.853 |
14 | DB13843 | Cloprednol | DB14681 | Cortisone | 26.431 |
15 | DB03322 | Dexpropranolol | DB06726 | Bufuralol | 25.731 |
16 | DB02377 | Guanine | DB13812 | Bufylline | 25.679 |
17 | DB13856 | Fluclorolone | DB14633 | Prednisolone hemisuccinate | 25.640 |
18 | DB02377 | Guanine | DB13573 | Acefylline | 24.828 |
19 | DB01482 | Fenethylline | DB01978 | 7, 9-Dimethylguanine | 24.747 |
20 | DB02134 | Xanthine | DB02377 | Guanine | 24.545 |
Guanine | Drugs predicted to interact with guanine | Score | |
![]() |
Fenethylline | ![]() |
31.747 |
8-chlorotheophylline | ![]() |
30.833 | |
Etamiphylline | ![]() |
29.345 | |
Bamifylline | ![]() |
29.007 | |
Proxyphylline | ![]() |
27.800 | |
Pentifylline | ![]() |
27.696 | |
Bufylline | ![]() |
25.679 | |
Acefylline | ![]() |
24.828 | |
Xanthine | ![]() |
24.545 |