
As a key issue in orchestrating various biological processes and functions, protein post-translational modification (PTM) occurs widely in the mechanism of protein's function of animals and plants. Glutarylation is a type of protein-translational modification that occurs at active ε-amino groups of specific lysine residues in proteins, which is associated with various human diseases, including diabetes, cancer, and glutaric aciduria type I. Therefore, the issue of prediction for glutarylation sites is particularly important. This study developed a brand-new deep learning-based prediction model for glutarylation sites named DeepDN_iGlu via adopting attention residual learning method and DenseNet. The focal loss function is utilized in this study in place of the traditional cross-entropy loss function to address the issue of a substantial imbalance in the number of positive and negative samples. It can be noted that DeepDN_iGlu based on the deep learning model offers a greater potential for the glutarylation site prediction after employing the straightforward one hot encoding method, with Sensitivity (Sn), Specificity (Sp), Accuracy (ACC), Mathews Correlation Coefficient (MCC), and Area Under Curve (AUC) of 89.29%, 61.97%, 65.15%, 0.33 and 0.80 accordingly on the independent test set. To the best of the authors' knowledge, this is the first time that DenseNet has been used for the prediction of glutarylation sites. DeepDN_iGlu has been deployed as a web server (https://bioinfo.wugenqiang.top/~smw/DeepDN_iGlu/) that is available to make glutarylation site prediction data more accessible.
Citation: Jianhua Jia, Mingwei Sun, Genqiang Wu, Wangren Qiu. DeepDN_iGlu: prediction of lysine glutarylation sites based on attention residual learning method and DenseNet[J]. Mathematical Biosciences and Engineering, 2023, 20(2): 2815-2830. doi: 10.3934/mbe.2023132
[1] | Jianhua Jia, Lulu Qin, Rufeng Lei . DGA-5mC: A 5-methylcytosine site prediction model based on an improved DenseNet and bidirectional GRU method. Mathematical Biosciences and Engineering, 2023, 20(6): 9759-9780. doi: 10.3934/mbe.2023428 |
[2] | Honglei Wang, Wenliang Zeng, Xiaoling Huang, Zhaoyang Liu, Yanjing Sun, Lin Zhang . MTTLm6A: A multi-task transfer learning approach for base-resolution mRNA m6A site prediction based on an improved transformer. Mathematical Biosciences and Engineering, 2024, 21(1): 272-299. doi: 10.3934/mbe.2024013 |
[3] | Pingping Sun, Yongbing Chen, Bo Liu, Yanxin Gao, Ye Han, Fei He, Jinchao Ji . DeepMRMP: A new predictor for multiple types of RNA modification sites using deep learning. Mathematical Biosciences and Engineering, 2019, 16(6): 6231-6241. doi: 10.3934/mbe.2019310 |
[4] | Mingshuai Chen, Xin Zhang, Ying Ju, Qing Liu, Yijie Ding . iPseU-TWSVM: Identification of RNA pseudouridine sites based on TWSVM. Mathematical Biosciences and Engineering, 2022, 19(12): 13829-13850. doi: 10.3934/mbe.2022644 |
[5] | Shuai Cao, Biao Song . Visual attentional-driven deep learning method for flower recognition. Mathematical Biosciences and Engineering, 2021, 18(3): 1981-1991. doi: 10.3934/mbe.2021103 |
[6] | Jinmiao Song, Shengwei Tian, Long Yu, Qimeng Yang, Qiguo Dai, Yuanxu Wang, Weidong Wu, Xiaodong Duan . RLF-LPI: An ensemble learning framework using sequence information for predicting lncRNA-protein interaction based on AE-ResLSTM and fuzzy decision. Mathematical Biosciences and Engineering, 2022, 19(5): 4749-4764. doi: 10.3934/mbe.2022222 |
[7] | Peng Ying, Zhongnian Li, Renke Sun, Xinzheng Xu . Complementary label learning based on knowledge distillation. Mathematical Biosciences and Engineering, 2023, 20(10): 17905-17918. doi: 10.3934/mbe.2023796 |
[8] | Dan Yang, Shijun Li, Yuyu Zhao, Bin Xu, Wenxu Tian . An EIT image reconstruction method based on DenseNet with multi-scale convolution. Mathematical Biosciences and Engineering, 2023, 20(4): 7633-7660. doi: 10.3934/mbe.2023329 |
[9] | Jianhua Jia, Yu Deng, Mengyue Yi, Yuhui Zhu . 4mCPred-GSIMP: Predicting DNA N4-methylcytosine sites in the mouse genome with multi-Scale adaptive features extraction and fusion. Mathematical Biosciences and Engineering, 2024, 21(1): 253-271. doi: 10.3934/mbe.2024012 |
[10] | Xing Hu, Minghui Yao, Dawei Zhang . Road crack segmentation using an attention residual U-Net with generative adversarial learning. Mathematical Biosciences and Engineering, 2021, 18(6): 9669-9684. doi: 10.3934/mbe.2021473 |
As a key issue in orchestrating various biological processes and functions, protein post-translational modification (PTM) occurs widely in the mechanism of protein's function of animals and plants. Glutarylation is a type of protein-translational modification that occurs at active ε-amino groups of specific lysine residues in proteins, which is associated with various human diseases, including diabetes, cancer, and glutaric aciduria type I. Therefore, the issue of prediction for glutarylation sites is particularly important. This study developed a brand-new deep learning-based prediction model for glutarylation sites named DeepDN_iGlu via adopting attention residual learning method and DenseNet. The focal loss function is utilized in this study in place of the traditional cross-entropy loss function to address the issue of a substantial imbalance in the number of positive and negative samples. It can be noted that DeepDN_iGlu based on the deep learning model offers a greater potential for the glutarylation site prediction after employing the straightforward one hot encoding method, with Sensitivity (Sn), Specificity (Sp), Accuracy (ACC), Mathews Correlation Coefficient (MCC), and Area Under Curve (AUC) of 89.29%, 61.97%, 65.15%, 0.33 and 0.80 accordingly on the independent test set. To the best of the authors' knowledge, this is the first time that DenseNet has been used for the prediction of glutarylation sites. DeepDN_iGlu has been deployed as a web server (https://bioinfo.wugenqiang.top/~smw/DeepDN_iGlu/) that is available to make glutarylation site prediction data more accessible.
Cells require adaptive strategies to maintain metabolic homeostasis, and post-translational modifications (PTMs) are one of the known adaptive mechanisms [1,2,3]. Post-translational modifications refers to the covalent addition of certain functional groups to a protein after the translation process [4]. These modifications are critical to functional proteomic because they regulate activity, localization, and interaction with other cellular molecules such as proteins, nucleic acids, lipids and cofactors. Cognition and identification of PTM sites can be of direct correlation to cellular processes, including protein degradation, subcellular localization [5] and cellular signaling events [6]. Glutarylation, one of the PTM which occur at active ε-amino groups of specific lysine residues in proteins, was first experimentally identified by Tan et al. [3] in 2014. The study of Tan's group, which combined chemical and biochemical methods, shows that recognizing and understanding glutarylation sites is vital to scientific investigations in many biological processes, such as metabolic processes and mitochondrial functions. Due to the shortcomings of the labor-intensive and time-consuming of the traditional biological sequencing techniques, convenient and available computational methods are required to reduce the cost of obtaining biological information. The use of computational methods to predict various PTM sites led to the creation of a number of matching predictors that provide novel ideas for the identification of PTM sites, including CarbonylDB [7], SulSite-GTB [8], Mal-Prec [9], iDPGK [10], DeepPPSite [11] and DeepSuccinylSite [12].
Although glutarylation is a PTM reported lately, there are already several prediction tools for glutarylation, and the predictor based on traditional machine learning model first appeared. SVM algorithm seems to be quite popular among researchers in 2018. Since Ju and He [13] proposed the first predictor for glutarylation sites based on a biased support vector machine algorithm, Xu et al. [14] and Huang et al. [15] also successively built iGlu-Lys and MDDGlutar predictors based on SVM algorithm, and both have imbalanced datasets, especially the iGlu-Lys predictor with PSPM feature encoding technique achieved MCCs (Mathews Correlation Coefficients) of 0.5098 and 0.5213 for the 10-fold cross-validation and independent testing, respectively.
The key and challenging step in building predictors of PTM sites is the feature engineering. Whether the manually extracting features from existing web applications [16] or using the complex and varied feature encoding methods [17], as well as the feature selection [18] when the feature dimension is large, this step involves how to effectively provide reliable features for the model while minimizing the complexity of the model computation.
The ratio of non-glutarylation sites to glutarylation sites always surpasses 2:1 and often even 5:1 in the natural environment, as in the case of other PTMs, such as succinylation [19]. To address the severe bias problem of traditional machine learning models [20] caused by the imbalance of positive and negative samples, it is common approach to balance the data before feeding them into the model [21], including up-sampling (increasing minority class samples), down-sampling (decreasing majority class samples) and hybrid-sampling (increasing minority class samples while decreasing majority class samples), which results in an excellent performance for traditional machine learning models. For instance, Ju and Wang [22] in 2020 applied the Positive-Unlabeled Learning approach to build a predictor of glutarylation called PUL-GLU, which is essentially down-sampling. Compared to the MCC of 0.54 at 10-fold cross-validation, it decreased by about half on the independent test set, indicating that the generalization ability of the model is still challenging.
With recent rise of deep learning algorithms, the problem above has been solved to a certain extent. Deep learning is a sub-discipline of machine learning, which is based on artificial neural network and has demonstrated splendid performance in proteomics, for instance in retention time prediction, MS/MS spectrum prediction, de novo peptide sequencing, major histocompatibility complex-peptide binding prediction, and protein structure prediction [23,24]. Automatic feature extraction is a significant advantage of the deep learning-based method, comparing with the traditional machine learning-based method.
Benefit from the ease of extracting data representation, glutarylation predictors based on deep learning algorithms have emerged. To the best of our knowledge, there are two deep learning-based predictors have been proposed recently. Sheraz Naseer et al. [25] employed a series deep models and combined with embedding method for feature extraction, which eventually demonstrated an outstanding score of 0.94 in terms of accuracy. LSTM and its derived deep networks were applied by Liu et al. [26] in their predictor named DNN-E to improve protein sequence representation and obtain a great performance with accuracy, specificity, sensitivity, and correlation coefficient of 0.79, 0.89, 0.59, and 0.51, respectively, which suggests that there should be a huge potential for deep learning techniques in the field of glutarylation sites prediction.
In this study, we propose a novel deep learning-based prediction model DeepDN_iGlu to meet the current demand in the task of recognizing the glutarylation site. The overall flow chart is shown in Figure 1. The data collecting and encoding procedure, which is ultimately just a straightforward one-hot encoding method is depicted in the figure's top half. Then the core network architecture of this study is depicted in the middle of the figure. It consists of 4 dense blocks with the same structure (a 4-layer dense block with a growth rate of k = 32), 3 transitions layers between them (all consist of an ELU, a 1 × 1 one-dimensional convolution (Conv 1D) and a 2 × 2 average pooling), a layer of SERNet, and a sigmoid classifier to produce the output findings. Finally, the specific evaluation performance will be presented in Section 3.
The previous study by Ju and He [13] served as the source for the benchmark data in this work. To increase the credibility of the data, they extracted 211 proteins with glutarylation sites from the PLDM [27] database and used the CD-HIT [28,29] tool to eliminate proteins with a similarity of more than 40%. In this study, 187 proteins with 646 glutarylation sites were the end result. 167 proteins, which contained 590 glutarylation sites and 3498 non-glutarylation lysine sites, were employed as the source of the training set. The remaining 20 proteins, which contained 56 glutarylation sites and 428 non-glutarylation lysine sites, were then randomly chosen as the independent test set. Then, applying Chou's peptide cleavage method [30], each amino acid sequence that may contain k-glu is represented as Eq (1).
fδ(K)=A−δA−(δ−1)⋯A−2A−1KA+1A+2⋯A+(δ−1)A+δ | (1) |
where K stands for the central lysine residue, and A for the amino acid residue right next to K. The distance between each amino acid residue and the central lysine residue K is shown by the subscript. The distance is 1 if A is adjacent to the central lysine residue K, 2 if A is separated from it by an amino acid residue, and so on. The subscript's positive sign (+) denotes a position downstream of the central lysine residue K, whereas the subscript's negative sign (–) denotes a position upstream of K. δ is the value of the farthest distance from the central lysine K.
The experimental results, according to Ju's article, are better when δ = 17, which means that the amino acid sequence length L (also known as window size), which was used in this investigation, is 17 × 2 + 1 = 35. It is worth noting that amino acid sequence complementation was carried out using unknown amino acid X in cases where the central lysine residue K had less than 17 amino acid residues on either end to guarantee that all amino acid sequences were the same length for subsequent tests. We obtained a total of 3988 amino acid sequences after cutting the 187 chosen peptides according to the aforementioned rules, of which the amino acid sequences whose central lysine has been confirmed by prior experiments to be a glutarylation site were identified as positive samples and the remaining amino acid sequences were identified as negative samples. Table 1 displays the benchmark dataset's specifics.
Original dataset | Number of proteins | Positive site | Negative site |
Training dataset | 167 | 590 | 3498 |
Testing dataset | 20 | 56 | 428 |
The appropriate features of protein sequences play very important roles in the prediction of PTM sites. In this paper, three candidate coding methods are used: one hot, EAAC and K-Spaced. It should be noted that the value of the second dimension must be set to 21 in order to all three encoding methods to be combined. In this paper, an amino acid sequence after one hot encoding, EAAC encoding and K-spaced encoding respectively all end up as two-dimensional matrices with sizes of 35 × 21, 31 × 21 and 210 × 21 respectively, keeping the size of the second dimension as 21 (20 amino acids plus an unknown amino acid X), then we can concatenate the matrices by rows. For three matrices, then we can end up with (35 + 31 + 210) × 21 = 276 × 21 input matrices, which is the final encoding matrix after the fusion of the three coding methods.
One hot encoding is a common coding method in feature engineering, which is favored by researchers for its unique differentiated encoding as well as its simplicity, speed and convenience. First, the amino acids are arranged alphabetically and labeled starting with 0 in one hot encoding. Given that there are 20 amino acid species and an unknown amino acid X is added at the end, the amino acids are labeled as "0, 1, 2, 3, 4, 5, 6, 7, …, 20" respectively in alphabetical order "ACDEFGHIKLMNPQRSTVWYX". Second, each amino acid was coded separately, and the length of the encoding sequences was the total number of amino acids (20 amino acids plus one unknown amino acid X), and the coding elements were 0 and 1, where the position corresponding to the amino acid label was coded as 1 and the remaining positions were coded as 0. For example, amino acid A is coded as (1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) amino acid C is coded as (0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) and so on.
Therefore, in this study, an amino acid sequence of length L = 35, is encoded as a 35 × 21 two-dimensional matrix with elements 0 and 1.
EAAC is a variation of AAC that treats the full amino acid sequence as a sliding window that may be adjusted in size rather than counting the frequency of occurrence of each individual amino acid within the entire amino acid sequence. For an amino acid sequence, EAAC is encoded in the following specific way.
1) First, set the window size (windows), such as windows = 5, which means that 5 consecutive amino acids are selected each time as the basis for calculating the frequency of each amino acid (including a dummy residue X), which is recorded as the set S1.
2) Calculate the frequency of each amino acid in S1, and the sum of the frequencies of all amino acids in S1 is 1.
3) Slide the window backward to Step 1).
4) Repeat Steps 2) and 3) until the last 5 amino acids in the amino acid sequence are boxed.
In this paper, windows = 5, then an amino acid sequence of length 35 (L = 35) is encoded as a two-dimensional matrix with dimensions (35 – 5 + 1) × 21 = 31 × 21. It is worth noting that EAAC encoding degenerates into AAC encoding when windows = L = 35, however when windows = 1, it produces the same output as one hot encoding.
In contrast to the encoding approach that focuses only on individual amino acids, K-spaced encoding turns its attention to amino acid pairs. This encoding approach extracts the feature information implied by amino acid sequences from another perspective, taking into account the significance of amino acid pairs in feature selection. The steps for K-spaced encoding are as follows.
1) First, the maximum interval k_max of amino acid pairs is specified, such as k_max = 2, which means that the amino acid pair interval k is taken as 0, 1, 2, respectively. The distance between two amino acids (designated as A1 and A2, respectively) that make up an amino acid pair in an amino acid sequence is indicated by the amino acid interval. For example, if k = 0, it means that A1 and A2 are adjacent to each other in the amino acid sequence (interval is 0). The sets S1, S2, and S3 are used to denote the cases of all amino acid pairings that may be derived from an amino acid sequence for k = 0, 1 and 2, respectively.
2) Calculate the frequencies of all possible amino acid pairs (21 × 21 species) occurring in S1, S2 and S3, respectively, and the sum of the frequencies of all amino acid pairs is 1.
In this study, we use k_max = 9 and K-spaced encoding to create a two-dimensional matrix with a dimension of 21 × (9 + 1) 21 × = 210 × 21 using an amino acid sequence of length L = 35.
The goodness of a classification model directly determines the accuracy of the classification results. In this paper, we adopt DenseNet, which has recently made great achievements in the field of deep learning. To the best of the author's knowledge, this is the first application of DenseNet to the prediction of glutarylation sites, although there have been some related studies on the prediction of other PTM sites, such as succinylation [31] and acetylation [32]. The overall structure of the model is shown in Figure 1, where 'dense block × 4 and transition × 3' means that there are 4 structurally identical dense blocks (all consist of a 4-layer dense blocks with growth rate = 32) and three intervening transition layers (all consist of an ELU, a 1 × 1 Conv 1D and a 2 × 2 average pooling) here. Each sequence of amino acids is transformed into a two-dimensional matrix after simple feature encoding, which is used as the input matrix X of the deep learning network. The input matrix X retains the original information while extracting the deep features after multiple layers of stacked dense convolutional blocks. Then, the stacked dense convolutional blocks were followed by SERNet which incorporating the attention residual learning method to maximize the retention of the original information and extract the final features. Finally, a sigmoid function was used to obtain the classification results.
DenseNet [33] proposed a radical dense connectivity mechanism that interconnects all layers, specifically noting that a later layer will preserve the information of all the layers before it. The reuse of feature maps from different layers enhances the propagation of features and improves the feature extraction capability of the network, while reducing the parameters for network training. The DenseNet connectivity mechanism is illustrated by the network architecture in Figure 1.
The primary module of DenseNet that implements the dense connectivity method is called dense block. Each layer in a dense block serves as the input for the subsequent layer and is connected to all the layers that came before it in the channel dimension. For a dense block with L layers, the total number of connections is L (L + 1)/2. The connections are highly dense, and the output of the Lth layer of the network is formulated as Eq (2).
xL=HL([x0,x1,…,xL−1]) | (2) |
where HL(⋅) represents the non-linear transformation function (non-linear transformation), it is a combined layer, which may contain, for example, BN, ReLU, convolution and other layers. In this research, ELU, and Conv 1D are employed together, as shown in Figure 1. Four dense blocks with the same structure were employed (a 4-layer dense block with a growth rate of k = 32).
The transition layer is yet another vital component of DenseNet. The transition layer connects two adjacent dense blocks and reduces the size of the feature map. In this paper, a total of 3 (4 – 1 = 3) transition layers are required as 4 dense blocks are finally selected. The transition layer consists of an ELU, a 1 × 1 Conv 1D and a 2 × 2 average pooling. The transition layer can act as a compression model. Assuming that the number of feature-maps in the dense block before transition is m, the transition layer can generate θ×m feature-maps, where θ (0<θ≤1) is the compression rate. When θ=1, there is no compression and no change in the number of feature-maps following the transition layer. In this study, θ=0.5, meaning that the number of feature-maps drops to half from its initial value following the transition layer.
In this work, we suggest a SENet incorporating the attention residual learning method, which is explained as follows: The standard procedure in SENet is to first perform global average pooling in the spatial dimension (which is called Squeeze), after which channel attention is learned through two fully connected layers, normalized with sigmoid (which is called Excitation), and then multiplied with the original matrix A to produce the feature matrix B after weighting in the spatial dimension. We suggest incorporating attention residual learning into SENet in this paper. The specific method, which in theory is based on the Shortcut Connection concept of ResNet, is to add the newly acquired matrix B to the original matrix A following excitation, so we call it SERNet for short, as shown in the SERNet section of Network Architectures of Figure 1. This can meet the goals of both acquiring the spatial dimensional weighting and maintaining the original matrix information. In the experiments, the two fully connected layers in Excitation have an output dimension of 96 and 6, and the activation functions are ReLU and sigmoid, respectively.
The focal loss function is employed in this study to make up for the shortcomings of the traditional cross-entropy loss function, which pays insufficient attention to minority samples due to the significant imbalance in the number of glutarylation sites and non-glutarylation sites samples.
The focal loss function was proposed by Lin et al. [34] to address the problem of model performance degradation caused by category imbalance in sense object detection, which is also a common problem in PTM sites prediction. Therefore, the focal loss function is introduced in this paper to give a new idea to the data imbalance problem arising in the PTM sites prediction task. The focal loss formulation is shown in Eqs (3) and (4).
pt={p,ify=11−p,otherwise | (3) |
FL(pt)=−αt(1−pt)γlog(pt) | (4) |
where y denotes the sample label, p (pϵ[0,1]) denotes the probability that the predicted sample belongs to 1, and αt(αtϵ[0,1]) in Eq (4) is called the weighting factor, controlling the shared weight of positive and negative samples on the total loss, and its smaller value represents the lower weight of negative samples. In this paper, αt=0.8,γ=2 can achieve the better results.
Scientific and general metrics are the basis for measuring the performance of different models. The prediction problem in this study can be thought of as a dichotomous classification problem in machine learning, and the confusion matrix is the commonly used evaluation metric for classification tasks in this field. In this paper, the four metrics in the confusion matrix are used to evaluate the performance of the classification model, and the specific metrics [35,36] are shown in Eq (5).
{Re,Sn=TPTP+FNSp=TNTN+FPAcc=TP+TNTP+TN+FP+FNMCC=TP×TN−FP×FN√(TP+FP)×(TP+FN)×(TN+FP)×(TN+FN) | (5) |
where TP, TN, FP and FN are abbreviations of True positive, True negative, False positive, and False negative, respectively. TP and TN stand for the number of samples in positive and negative categories that were properly predicted, respectively, whereas FP and FN stand for the number of samples in positive and negative categories that were mistakenly predicted, respectively. Additionally, the receiver operating characteristic (ROC) curve [37]—which is also employed as one of the model evaluation metrics in this paper—is an excellent technique to directly illustrate the prediction outcomes.
In this work, the 10-fold CV is used to evaluate model performance and make prediction findings more stable for comparison with other predictors. The 10-fold CV will be trained ten times, and each time the model is trained, the input data is divided into ten sets (D1, D2, ..., D10). The set D1 is used for model performance testing, and the other nine sets (D2–D10) are used for model training. The process repeats itself until each subset has served as a testing data set a single time.
In this section, comparison studies of the feature encoding methods and the functional modules used for the model (such as the focal loss function) are discussed. It is important to note that the models used in the experiments were all those described in Section 2.3 and that the functional module comparison experiments were carried out so that just one condition was altered while the others were left unchanged. Specifically, the model in Experiment 3.1 employed four dense blocks of the same structure and the focal loss function as the loss function. The number of dense block layers was 4 when the loss function comparison experiments were conducted, and the focal loss function was used when the comparison experiments with different numbers of dense block layers were undertaken.
We performed ablation experiments to determine which of the three feature encoding methods—or which combination of the three—is best suited as input to this model. The experimental results are displayed in Table 2, with the best results denoted in bold. All results are based on 10-fold cross-validation on the training set. The first three rows of Table 2 determine which feature encoding method is chosen for each experiment. If there is a mark "√" in the corresponding row of each encoding method, it indicates that the method is chosen for this experiment; if there is not, it indicates that method is not chosen. The remaining 7 columns in Table 2 except the first column represent 7 experiments (C13+C23+C33=7). The combination approach between the encoding techniques has been discussed in Section 2.2.
One hot | √ | √ | √ | √ | |||
EAAC | √ | √ | √ | √ | |||
K-Spaced | √ | √ | √ | √ | |||
Sn | 0.89 | 0.84 | 0.82 | 0.65 | 0.66 | 0.68 | 0.65 |
Sp | 0.62 | 0.58 | 0.43 | 0.65 | 0.62 | 0.51 | 0.65 |
ACC | 0.65 | 0.61 | 0.47 | 0.65 | 0.63 | 0.54 | 0.65 |
MCC | 0.33 | 0.27 | 0.16 | 0.22 | 0.21 | 0.14 | 0.22 |
AUC | 0.80 | 0.75 | 0.62 | 0.72 | 0.71 | 0.64 | 0.72 |
Table 2 shows that when used as an input to a deep learning model, one hot encoding alone offers unparalleled advantages over other encoding methods, which can be easily seen from the fact that its evaluation metrics are higher than those of other individual or combined encoding methods when used alone.
In this regard, we have reason to believe that simple but highly discriminative encoding is more suitable as input for deep learning models that are very powerful in automatically extracting deep features, such as the one hot encoding used in this experiment. This finding might offer suggestions for developing deep learning-based predictions in the future. The one hot encoding is easily accessible from a convenience standpoint, which makes it easier for researchers to focus mainly on the construction of the model.
We employed two loss functions in this section to independently predict the glutarylation sites in order to verify if the focal loss function is effective for the imbalanced data. The results are shown in Figure 2. The experimental findings demonstrated that the focal loss function is more sensitive to minority class samples than the cross-entropy loss. After employing the focal loss function, both Sn and MCC improved, reaching 0.78 and 0.31, respectively.
This study evaluates the model performance under various numbers of dense blocks since the number of dense blocks in DenseNet is also a significant factor impacting the model performance. In Figure 3, it is intuitively clear that when there are four dense blocks stacked together, the Sp, ACC, and MCC of the model are greater than those in the other scenarios. As a result, in this paper, four dense blocks are chosen to build DenseNet.
Given the availability of existing predictors and the rigorousness of the comparison, we chose two predictors, PUL-GLU and GlutPred, which use the same dataset as this paper, for comparison. Table 3 shows the 10-fold cross-validation performance of DeepDN_iGlu and other predictors. It is worth noting that PUL-GLU screens the training samples (removing some of the negative samples), resulting in a lower imbalance ratio. Thus, our study focuses on a comparison with GlutPred to discuss the predictive performance of the model while maintaining the original ratio of positive and negative samples.
Predictor | Sn (%) | Sp (%) | ACC (%) | MCC | AUC |
PUL-GLU | 66.56 ± 0.73 | 86.43 ± 0.28 | 79.88 ± 0.29 | 0.5384 ± 0.069 | - |
GlutPred | 64.80 ± 0.99 | 76.60 ± 0.28 | 74.90 ± 0.32 | 0.3194 ± 0.087 | 78.06 ± 0.29 |
DeepDN-iGlu | 79.45 ± 8.29 | 63.74 ± 4.44 | 66.00 ± 3.61 | 0.3080 ± 0.068 | 77.25 ± 0.04 |
According to Table 3, the Sn of DeepDN_iGlu is typically 9% higher than the other two predictors at the cost of both Sp and ACC falling. MCC is a measure of great importance to the imbalanced data. As can be observed, even though Sp and ACC reduced, MCC and AUC were essentially the same as GlutPred, proving that the predictor proposed in this research, as opposed to the other two predictors, concentrates more on the accurate distinction of samples of minority category (samples with glutarylation site).
The prediction performance of the three predictors on the same independent test set is shown in Table 4. With Sn at 89.29%, 27.5% higher than GlutPred, and MCC at 0.3309, both significantly higher than the other two predictors. AUC also outperformed GlutPred by roughly 3%. Figure 4 provides a direct-viewing comparison of the two predictors. This shows that the model put forward in this research has a stronger capacity for generalization and better corroborates the finding seen on the 10-fold CV.
Predictor | Sn (%) | Sp (%) | ACC (%) | MCC | AUC |
PUL-GLU | 58.93 | 78.97 | 76.65 | 0.2785 | - |
GlutPred | 51.79 | 78.50 | 75.41 | 0.2238 | 0.7663 |
DeepDN-iGlu | 89.29 | 61.97 | 65.15 | 0.3309 | 0.8087 |
To facilitate the access to glutarylation site prediction data, DeepDN_iGlu has been implemented as an available web server (https://bioinfo.wugenqiang.top/~smw/DeepDN_iGlu/). A screenshot of the DeepDN_iGlu server's interface is shown in Figure 5. The DeepDN_iGlu server is primarily composed of three sections: "Introduction", "Enter query sequences" and "Job Submission". "Introduction" provides a description of the composition of DeepDN_iGlu, functions and usage. To keep the main interface straightforward, users can click the "More info" button to acquire more detailed information on how to use it. In order to produce a prediction, the user can enter one or more protein sequences in FASTA format using "Enter query sequences" and then make a prediction by clicking the "submit" button. The user can view the specific requirements for the input sequence by selecting the "example" button. With "job submission", users have an easy way to obtain a lot of prediction data; all they need to do is enter their email address, project name, and forecast file in the appropriate areas, press the "submit" button, and they will acquire the prediction results file in their email.
In this study, we developed a brand-new deep learning-based prediction model for glutarylation sites named DeepDN_iGlu adopting attention residual learning method and DenseNet. According to the experimental results and the various related evaluation data, DeepDN_iGlu has obtained a satisfactory prediction performance with Sn, Sp, ACC, MCC and AUC of 79.45%, 63.74%, 66.00%, 0.3080 and 0.77 at 10-fold cross-validation, while 89.29%, 61.97%, 65.15%, 0.33 and 0.80 accordingly on the independent test set. Additionally, the MCC of 0.33 on the independent test set shows that DeepDN_iGlu has better generalization ability. Another key benefit of DeepDN_iGlu is the feature encoding, which is quite straightforward and only requires one-hot encoding. Finally, a user-friendly web server was designed to make it convenient for researchers to obtain data on lysine glutarylation sites prediction.
To a certain extent, the benefits of deep learning for a big number of data cannot be sufficiently utilized due to the little amount of data in the present glutarylation sites. The training samples are further reduced in the 10-fold cross-validation process, so it causes the model to be slightly less effective for the 10-fold cross-validation than on the independent test set, which will be a problem we need to solve in the future. But as glutarylation research advances, the contradiction of sparse data will gradually disappear, opening up more possibilities for prediction models based on deep learning of glutarylation sites.
The authors are grateful for the constructive comments and suggestions made by the reviewers. This work was partially supported by the National Natural Science Foundation of China (Nos. 61761023, 62162032 and 31760315), the Natural Science Foundation of Jiangxi Province, China (Nos. 20202BABL202004 and 20202BAB202007), the Scientific Research Plan of the Department of Education of Jiangxi Province, China (GJJ190695). These funders had no role in the study design, data collection and analysis, decision to publish or preparation of manuscript.
The dataset and source code used in this study can be easily derived from https://github.com/Michae-l/DeepDN-iGlu.
The authors declare that they have no competing interests.
[1] |
E. Furuya, K. Uyeda, Regulation of phosphofructokinase by a new mechanism. An activation factor binding to phosphorylated enzyme, J. Biol. Chem., 255 (1980), 11656–11659. https://doi.org/10.1016/s0021-9258(19)70181-1 doi: 10.1016/s0021-9258(19)70181-1
![]() |
[2] |
C. Lu, C. B. Thompson, Metabolic regulation of epigenetics, Cell Metab., 16 (2012), 9–17. https://doi.org/10.1016/j.cmet.2012.06.001 doi: 10.1016/j.cmet.2012.06.001
![]() |
[3] |
M. Tan, C. Peng, K. A. Anderson, P. Chhoy, Z. Xie, L. Dai, et al., Lysine glutarylation is a protein posttranslational modification regulated by SIRT5, Cell Metab., 19 (2014), 605–617. https://doi.org/10.1016/j.cmet.2014.03.014 doi: 10.1016/j.cmet.2014.03.014
![]() |
[4] |
S. Ahmed, A. Rahman, M. Hasan, A. Mehedi, S. Ahmad, S. M. Shovan, Computational identification of multiple lysine PTM sites by analyzing the instance hardness and feature importance, Sci. Rep., 11 (2021), 18882. https://doi.org/10.1038/s41598-021-98458-y doi: 10.1038/s41598-021-98458-y
![]() |
[5] |
G. S. McDowell, A. Philpott, New insights into the role of ubiquitylation of proteins, Int. Rev. Cell Mol. Biol., 325 (2016), 35–88. https://doi.org/10.1016/bs.ircmb.2016.02.002 doi: 10.1016/bs.ircmb.2016.02.002
![]() |
[6] |
L. D. Vu, K. Gevaert, I. De Smet, Protein language: post-translational modifications talking to each other, Trends Plant Sci., 23 (2018), 1068–1080. https://doi.org/10.1016/j.tplants.2018.09.004 doi: 10.1016/j.tplants.2018.09.004
![]() |
[7] |
R. S. P. Rao, N. Zhang, D. Xu, I. M. Moller, CarbonylDB: a curated data-resource of protein carbonylation sites, Bioinformatics, 34 (2018), 2518–2520. https://doi.org/10.1093/bioinformatics/bty123 doi: 10.1093/bioinformatics/bty123
![]() |
[8] |
M. Wang, X. Cui, B. Yu, C. Chen, Q. Ma, H. Zhou, SulSite-GTB: identification of protein S-sulfenylation sites by fusing multiple feature information and gradient tree boosting, Neural Comput. Appl., 32 (2020), 13843–13862. https://doi.org/10.1007/s00521-020-04792-z doi: 10.1007/s00521-020-04792-z
![]() |
[9] |
X. Liu, L. Wang, J. Li, J. Hu, X. Zhang, Mal-Prec: computational prediction of protein Malonylation sites via machine learning based feature integration, BMC Genomics, 21 (2020), 812. https://doi.org/10.1186/s12864-020-07166-w doi: 10.1186/s12864-020-07166-w
![]() |
[10] |
K. Y. Huang, F. Y. Hung, H. J. Kao, H. H. Lau, S. L. Weng, iDPGK: characterization and identification of lysine phosphoglycerylation sites based on sequence-based features, BMC Bioinf., 21 (2020), 568. https://doi.org/10.1186/s12859-020-03916-5 doi: 10.1186/s12859-020-03916-5
![]() |
[11] |
S. Ahmed, M. Kabir, M. Arif, Z. U. Khan, D. J. Yu, DeepPPSite: a deep learning-based model for analysis and prediction of phosphorylation sites using efficient sequence information, Anal. Biochem., 612 (2021), 113955. https://doi.org/10.1016/j.ab.2020.113955 doi: 10.1016/j.ab.2020.113955
![]() |
[12] |
N. Thapa, M. Chaudhari, S. McManus, K. Roy, R. H. Newman, H. Saigo, et al., DeepSuccinylSite: a deep learning based approach for protein succinylation site prediction, BMC Bioinf., 21 (2020), 63. https://doi.org/10.1186/s12859-020-3342-z doi: 10.1186/s12859-020-3342-z
![]() |
[13] |
Z. Ju, J. J. He, Prediction of lysine glutarylation sites by maximum relevance minimum redundancy feature selection, Anal. Biochem., 550 (2018), 1–7. https://doi.org/10.1016/j.ab.2018.04.005 doi: 10.1016/j.ab.2018.04.005
![]() |
[14] |
Y. Xu, Y. Yang, J. Ding, C. Li, iGlu-Lys: A Predictor for lysine glutarylation through amino acid pair order features, IEEE Trans. Nanobiosci., 17 (2018), 394–401. https://doi.org/10.1109/TNB.2018.2848673 doi: 10.1109/TNB.2018.2848673
![]() |
[15] |
K. Y. Huang, H. J. Kao, J. B. K. Hsu, S. L. Weng, T. Y. Lee, Characterization and identification of lysine glutarylation based on intrinsic interdependence between positions in the substrate sites, BMC Bioinf., 19 (2019), 13–25. https://doi.org/10.1186/s12859-018-2394-9 doi: 10.1186/s12859-018-2394-9
![]() |
[16] |
H. J. Al-Barakati, H. Saigo, R. H. Newman, D. B. KC, RF-GlutarySite: a random forest based predictor for glutarylation sites, Mol. Omics, 15 (2019), 189–204. https://doi.org/10.1039/c9mo00028c doi: 10.1039/c9mo00028c
![]() |
[17] |
M. E. Arafat, M. W. Ahmad, S. M. Shovan, A. Dehzangi, S. R. Dipta, M. A. M. Hasan, et al., Accurately predicting glutarylation sites using sequential Bi-Peptide-Based evolutionary features, Genes, 11 (2020), 1023. https://doi.org/10.3390/genes11091023 doi: 10.3390/genes11091023
![]() |
[18] |
L. Dou, X. Li, L. Zhang, H. Xiang, L. Xu, iGlu_AdaBoost: identification of lysine glutarylation using the adaboost classifier, J. Proteome Res., 20 (2020), 191–201. https://doi.org/10.1021/acs.jproteome.0c00314 doi: 10.1021/acs.jproteome.0c00314
![]() |
[19] |
J. Jia, Z. Liu, X. Xian, B. Liu, K. C. Chou, pSuc-Lys: Predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach, J. Theor. Biol., 394 (2016), 223–230. https://doi.org/10.1016/j.jtbi.2016.01.020 doi: 10.1016/j.jtbi.2016.01.020
![]() |
[20] |
P. Kelchtermans, W. Bittremieux, K. De Grave, S. Degroeve, J. Ramon, K. Laukens, et al., Machine learning applications in proteomics research: how the past can boost the future, Proteomics, 14 (2014), 353–366. https://doi.org/10.1002/pmic.201300289 doi: 10.1002/pmic.201300289
![]() |
[21] |
L. Dou, F. Yang, L. Xu, Q. Zou, A comprehensive review of the imbalance classification of protein post-translational modifications, Briefings Bioinf., 22 (2021), bbab089. https://doi.org/10.1093/bib/bbab089 doi: 10.1093/bib/bbab089
![]() |
[22] |
Z. Ju, S. Y. Wang, Computational identification of lysine glutarylation sites using positive-unlabeled learning, Curr. Genomics, 21 (2020), 204–211. https://doi.org/10.2174/1389202921666200511072327 doi: 10.2174/1389202921666200511072327
![]() |
[23] |
B. Wen, W. F. Zeng, Y. Liao, Z. Shi, S. R. Savage, W. Jiang, et al., Deep learning in proteomics, Proteomics, 20 (2020), 1900335. https://doi.org/10.1002/pmic.201900335 doi: 10.1002/pmic.201900335
![]() |
[24] | S. C. Pakhrin, S. Pokharel, H. Saigo, D. B. Kc, Deep learning-based advances in protein posttranslational modification site and protein cleavage prediction, in Computational Methods for Predicting Post-Translational Modification Sites, Humana Press, (2022), 285–322. https://doi.org/10.1007/978-1-0716-2317-6_15 |
[25] |
S. Naseer, R. F. Ali, Y. D. Khan, P. D. D. Dominic, iGluK-Deep: computational identification of lysine glutarylation sites using deep neural networks with general pseudo amino acid compositions, J. Biomol. Struct. Dyn., 2021 (2021), 1–14. https://doi.org/10.1080/07391102.2021.1962738 doi: 10.1080/07391102.2021.1962738
![]() |
[26] |
C. M. Liu, V. D. Ta, N. Q. K. Le, D. A. Tadesse, C. Shi, Deep neural network framework based on word embedding for protein glutarylation sites prediction, Life, 12 (2022), 1213. https://doi.org/10.3390/life12081213 doi: 10.3390/life12081213
![]() |
[27] |
H. Xu, J. Zhou, S. Lin, W. Deng, Y. Zhang, Y. Xue, PLMD: an updated data resource of protein lysine modifications, J. Genet. Genomics, 44 (2017), 243–250. https://doi.org/10.1016/j.jgg.2017.03.007 doi: 10.1016/j.jgg.2017.03.007
![]() |
[28] |
W. Li, A. Godzik, Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences, Bioinformatics, 22 (2006), 1658–1659. https://doi.org/10.1093/bioinformatics/btl158 doi: 10.1093/bioinformatics/btl158
![]() |
[29] |
Y. Huang, B. Niu, Y. Gao, L. Fu, W. Li, CD-HIT Suite: a web server for clustering and comparing biological sequences, Bioinformatics, 26 (2010), 680–682. https://doi.org/10.1093/bioinformatics/btq003 doi: 10.1093/bioinformatics/btq003
![]() |
[30] |
K. C. Chou, Prediction of signal peptides using scaled window, Peptides, 22 (2001), 1973–1979. https://doi.org/10.1016/S0196-9781(01)00540-X doi: 10.1016/S0196-9781(01)00540-X
![]() |
[31] |
H. Wang, H. Zhao, Z. Yan, J. Zhao, J. Han, MDCAN-Lys: a model for predicting succinylation sites based on multilane dense convolutional attention network, Biomolecules, 11 (2021), 872. https://doi.org/10.3390/biom11060872 doi: 10.3390/biom11060872
![]() |
[32] |
H. Wang, Z. Yan, D. Liu, H. Zhao, J. Zhao, MDC-Kace: A model for predicting lysine acetylation sites based on modular densely connected convolutional networks, IEEE Access, 8 (2020), 214469–214480. https://doi.org/10.1109/access.2020.3041044 doi: 10.1109/access.2020.3041044
![]() |
[33] | G. Huang, Z. Liu, L. Van Der Maaten, K. Q. Weinberger, Densely connected convolutional networks, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, (2017), 2261–2269. http://doi.org/10.1109/CVPR.2017.243 |
[34] | T. Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, Focal loss for dense object detection, in Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, (2017), 2999–3007. https://doi.org/10.1109/ICCV.2017.324 |
[35] |
M. Sokolova, G. Lapalme, A systematic analysis of performance measures for classification tasks, Inf. Process. Manage., 45 (2009), 427–437. https://doi.org/10.1016/j.ipm.2009.03.002 doi: 10.1016/j.ipm.2009.03.002
![]() |
[36] |
S. Boughorbel, F. Jarray, M. El-Anbari, Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric, PLoS One, 12 (2017), e0177678. https://doi.org/10.1371/journal.pone.0177678 doi: 10.1371/journal.pone.0177678
![]() |
[37] |
T. Fawcett, An introduction to ROC analysis, Pattern Recognit. Lett., 27 (2006), 861–874. https://doi.org/10.1016/j.patrec.2005.10.010 doi: 10.1016/j.patrec.2005.10.010
![]() |
1. | Jianhua Jia, Lulu Qin, Rufeng Lei, DGA-5mC: A 5-methylcytosine site prediction model based on an improved DenseNet and bidirectional GRU method, 2023, 20, 1551-0018, 9759, 10.3934/mbe.2023428 | |
2. | Xin Liu, Bao Zhu, Xia-Wei Dai, Zhi-Ao Xu, Rui Li, Yuting Qian, Ya-Ping Lu, Wenqing Zhang, Yong Liu, Junnian Zheng, GBDT_KgluSite: An improved computational prediction model for lysine glutarylation sites based on feature fusion and GBDT classifier, 2023, 24, 1471-2164, 10.1186/s12864-023-09834-z | |
3. | Shouming Zhang, Yaling Zhang, Yixiao Liao, Kunkun Pang, Zhiyong Wan, Songbin Zhou, Polyphonic sound event localization and detection based on Multiple Attention Fusion ResNet, 2024, 21, 1551-0018, 2004, 10.3934/mbe.2024089 | |
4. | Jianhua Jia, Zhangying Wei, Xiaojing Cao, EMDL-ac4C: identifying N4-acetylcytidine based on ensemble two-branch residual connection DenseNet and attention, 2023, 14, 1664-8021, 10.3389/fgene.2023.1232038 | |
5. | Rufeng Lei, Jianhua Jia, Lulu Qin, Xin Wei, iPro2L-DG: Hybrid network based on improved densenet and global attention mechanism for identifying promoter sequences, 2024, 10, 24058440, e27364, 10.1016/j.heliyon.2024.e27364 | |
6. | Jianhua Jia, Zhangying Wei, Mingwei Sun, EMDL_m6Am: identifying N6,2′-O-dimethyladenosine sites based on stacking ensemble deep learning, 2023, 24, 1471-2105, 10.1186/s12859-023-05543-2 | |
7. | Jianhua Jia, Genqiang Wu, Meifang Li, iGly-IDN: Identifying Lysine Glycation Sites in Proteins Based on Improved DenseNet, 2023, 1557-8666, 10.1089/cmb.2023.0112 | |
8. | Jianhua Jia, Rufeng Lei, Lulu Qin, Xin Wei, i5mC-DCGA: an improved hybrid network framework based on the CBAM attention mechanism for identifying promoter 5mC sites, 2024, 25, 1471-2164, 10.1186/s12864-024-10154-z | |
9. | Esraa Hassan, Enhancing coffee bean classification: a comparative analysis of pre-trained deep learning models, 2024, 36, 0941-0643, 9023, 10.1007/s00521-024-09623-z | |
10. | Jianhua Jia, Lulu Qin, Rufeng Lei, im5C-DSCGA: A Proposed Hybrid Framework Based on Improved DenseNet and Attention Mechanisms for Identifying 5-methylcytosine Sites in Human RNA, 2023, 28, 2768-6701, 10.31083/j.fbl2812346 | |
11. | Yongxian Fan, Chen Wang, Guicong Sun, iEnhancer-DS: Attention-based improved densenet for identifying enhancers and their strength, 2025, 118, 14769271, 108484, 10.1016/j.compbiolchem.2025.108484 |
Original dataset | Number of proteins | Positive site | Negative site |
Training dataset | 167 | 590 | 3498 |
Testing dataset | 20 | 56 | 428 |
One hot | √ | √ | √ | √ | |||
EAAC | √ | √ | √ | √ | |||
K-Spaced | √ | √ | √ | √ | |||
Sn | 0.89 | 0.84 | 0.82 | 0.65 | 0.66 | 0.68 | 0.65 |
Sp | 0.62 | 0.58 | 0.43 | 0.65 | 0.62 | 0.51 | 0.65 |
ACC | 0.65 | 0.61 | 0.47 | 0.65 | 0.63 | 0.54 | 0.65 |
MCC | 0.33 | 0.27 | 0.16 | 0.22 | 0.21 | 0.14 | 0.22 |
AUC | 0.80 | 0.75 | 0.62 | 0.72 | 0.71 | 0.64 | 0.72 |
Predictor | Sn (%) | Sp (%) | ACC (%) | MCC | AUC |
PUL-GLU | 66.56 ± 0.73 | 86.43 ± 0.28 | 79.88 ± 0.29 | 0.5384 ± 0.069 | - |
GlutPred | 64.80 ± 0.99 | 76.60 ± 0.28 | 74.90 ± 0.32 | 0.3194 ± 0.087 | 78.06 ± 0.29 |
DeepDN-iGlu | 79.45 ± 8.29 | 63.74 ± 4.44 | 66.00 ± 3.61 | 0.3080 ± 0.068 | 77.25 ± 0.04 |
Predictor | Sn (%) | Sp (%) | ACC (%) | MCC | AUC |
PUL-GLU | 58.93 | 78.97 | 76.65 | 0.2785 | - |
GlutPred | 51.79 | 78.50 | 75.41 | 0.2238 | 0.7663 |
DeepDN-iGlu | 89.29 | 61.97 | 65.15 | 0.3309 | 0.8087 |
Original dataset | Number of proteins | Positive site | Negative site |
Training dataset | 167 | 590 | 3498 |
Testing dataset | 20 | 56 | 428 |
One hot | √ | √ | √ | √ | |||
EAAC | √ | √ | √ | √ | |||
K-Spaced | √ | √ | √ | √ | |||
Sn | 0.89 | 0.84 | 0.82 | 0.65 | 0.66 | 0.68 | 0.65 |
Sp | 0.62 | 0.58 | 0.43 | 0.65 | 0.62 | 0.51 | 0.65 |
ACC | 0.65 | 0.61 | 0.47 | 0.65 | 0.63 | 0.54 | 0.65 |
MCC | 0.33 | 0.27 | 0.16 | 0.22 | 0.21 | 0.14 | 0.22 |
AUC | 0.80 | 0.75 | 0.62 | 0.72 | 0.71 | 0.64 | 0.72 |
Predictor | Sn (%) | Sp (%) | ACC (%) | MCC | AUC |
PUL-GLU | 66.56 ± 0.73 | 86.43 ± 0.28 | 79.88 ± 0.29 | 0.5384 ± 0.069 | - |
GlutPred | 64.80 ± 0.99 | 76.60 ± 0.28 | 74.90 ± 0.32 | 0.3194 ± 0.087 | 78.06 ± 0.29 |
DeepDN-iGlu | 79.45 ± 8.29 | 63.74 ± 4.44 | 66.00 ± 3.61 | 0.3080 ± 0.068 | 77.25 ± 0.04 |
Predictor | Sn (%) | Sp (%) | ACC (%) | MCC | AUC |
PUL-GLU | 58.93 | 78.97 | 76.65 | 0.2785 | - |
GlutPred | 51.79 | 78.50 | 75.41 | 0.2238 | 0.7663 |
DeepDN-iGlu | 89.29 | 61.97 | 65.15 | 0.3309 | 0.8087 |