Phasic small interfering RNAs are plant secondary small interference RNAs that typically generated by the convergence of miRNAs and polyadenylated mRNAs. A growing number of studies have shown that miRNA-initiated phasiRNA plays crucial roles in regulating plant growth and stress responses. Experimental verification of miRNA-initiated phasiRNA loci may take considerable time, energy and labor. Therefore, computational methods capable of processing high throughput data have been proposed one by one. In this work, we proposed a predictor (DIGITAL) for identifying miRNA-initiated phasiRNAs in plant, which combined a multi-scale residual network with a bi-directional long-short term memory network. The negative dataset was constructed based on positive data, through replacing 60% of nucleotides randomly in each positive sample. Our predictor achieved the accuracy of 98.48% and 94.02% respectively on two independent test datasets with different sequence length. These independent testing results indicate the effectiveness of our model. Furthermore, DIGITAL is of robustness and generalization ability, and thus can be easily extended and applied for miRNA target recognition of other species. We provide the source code of DIGITAL, which is freely available at https://github.com/yuanyuanbu/DIGITAL.
Citation: Yuanyuan Bu, Jia Zheng, Cangzhi Jia. An efficient deep learning based predictor for identifying miRNA-triggered phasiRNA loci in plant[J]. Mathematical Biosciences and Engineering, 2023, 20(4): 6853-6865. doi: 10.3934/mbe.2023295
Related Papers:
[1]
Lili Jiang, Sirong Chen, Yuanhui Wu, Da Zhou, Lihua Duan .
Prediction of coronary heart disease in gout patients using machine learning models. Mathematical Biosciences and Engineering, 2023, 20(3): 4574-4591.
doi: 10.3934/mbe.2023212
[2]
Zhaobin Qiu, Ying Qiao, Wanyuan Shi, Xiaoqian Liu .
A robust framework for enhancing cardiovascular disease risk prediction using an optimized category boosting model. Mathematical Biosciences and Engineering, 2024, 21(2): 2943-2969.
doi: 10.3934/mbe.2024131
[3]
Anastasia-Maria Leventi-Peetz, Kai Weber .
Probabilistic machine learning for breast cancer classification. Mathematical Biosciences and Engineering, 2023, 20(1): 624-655.
doi: 10.3934/mbe.2023029
[4]
Javier Antonio Ballesteros-Ricaurte, Ramon Fabregat, Angela Carrillo-Ramos, Carlos Parra, Andrés Moreno .
Artificial neural networks to predict the presence of Neosporosis in cattle. Mathematical Biosciences and Engineering, 2025, 22(5): 1140-1158.
doi: 10.3934/mbe.2025041
[5]
Huili Yang, Wangren Qiu, Zi Liu .
Anoikis-related mRNA-lncRNA and DNA methylation profiles for overall survival prediction in breast cancer patients. Mathematical Biosciences and Engineering, 2024, 21(1): 1590-1609.
doi: 10.3934/mbe.2024069
[6]
Natalya Shakhovska, Vitaliy Yakovyna, Valentyna Chopyak .
A new hybrid ensemble machine-learning model for severity risk assessment and post-COVID prediction system. Mathematical Biosciences and Engineering, 2022, 19(6): 6102-6123.
doi: 10.3934/mbe.2022285
[7]
Vinh Huy Chau .
Powerlifting score prediction using a machine learning method. Mathematical Biosciences and Engineering, 2021, 18(2): 1040-1050.
doi: 10.3934/mbe.2021056
[8]
Giuseppe Ciaburro .
Machine fault detection methods based on machine learning algorithms: A review. Mathematical Biosciences and Engineering, 2022, 19(11): 11453-11490.
doi: 10.3934/mbe.2022534
[9]
Abhishek Savaliya, Rutvij H. Jhaveri, Qin Xin, Saad Alqithami, Sagar Ramani, Tariq Ahamed Ahanger .
Securing industrial communication with software-defined networking. Mathematical Biosciences and Engineering, 2021, 18(6): 8298-8313.
doi: 10.3934/mbe.2021411
[10]
Wang Cai, Jianzhuang Wang, Longchao Cao, Gaoyang Mi, Leshi Shu, Qi Zhou, Ping Jiang .
Predicting the weld width from high-speed successive images of the weld zone using different machine learning algorithms during laser welding. Mathematical Biosciences and Engineering, 2019, 16(5): 5595-5612.
doi: 10.3934/mbe.2019278
Abstract
Phasic small interfering RNAs are plant secondary small interference RNAs that typically generated by the convergence of miRNAs and polyadenylated mRNAs. A growing number of studies have shown that miRNA-initiated phasiRNA plays crucial roles in regulating plant growth and stress responses. Experimental verification of miRNA-initiated phasiRNA loci may take considerable time, energy and labor. Therefore, computational methods capable of processing high throughput data have been proposed one by one. In this work, we proposed a predictor (DIGITAL) for identifying miRNA-initiated phasiRNAs in plant, which combined a multi-scale residual network with a bi-directional long-short term memory network. The negative dataset was constructed based on positive data, through replacing 60% of nucleotides randomly in each positive sample. Our predictor achieved the accuracy of 98.48% and 94.02% respectively on two independent test datasets with different sequence length. These independent testing results indicate the effectiveness of our model. Furthermore, DIGITAL is of robustness and generalization ability, and thus can be easily extended and applied for miRNA target recognition of other species. We provide the source code of DIGITAL, which is freely available at https://github.com/yuanyuanbu/DIGITAL.
1.
Introduction
Cardiovascular diseases, commonly referred to as CVDs, are identified as the most common cause of death worldwide [1]. World Health Organization (WHO) estimates that nearly 17.9 million people die each year of CVDs which is around 1/3rd of the worldwide deaths. Notably, CVD does not refer to a specific heart problem, on the contrary CVD is a general term pointing to a set of disorders of the heart and blood vessels. These disorders include coronary heart disease-a heart condition where the arteries are not able to supply the required amount of oxygenated blood to the heart, cerebrovascular disease-a disease that negatively affects the blood flow in the brain, rheumatic heart disease where heart valves are damaged permanently [1]. Due to its high share in the worldwide deaths and continued increase, predicting heart failure is an important task for clinicians. The early discovery of heart failure can lead to medical interventions that can increase the recovery from CVDs and decrease the number of fatalities as well. However, predicting heart failure is not a trivial task and several studies have focused on identifying features that are associated with predicting heart failure [2-4]. The introduction of Electronic Medical Records (EMR) facilitated medical practitioners to move from paper-based record to electronic record management of patients, simplifying the analysis of large medical data, thereby facilitating the acquisition, storage and processing of medical data [5]. The availability of medical data in electronic form also benefited the scientific community as they used the data to perform various types of analysis to identify novel and previously unknown patterns [3,4]. The availability of data in general also led to the advent of new computational techniques such as deep learning in a wide variety of applications. Some of the application areas include medical sciences [6], social networks analysis [7], environmental science [8], and others [9,11,12].
Machine learning techniques and deep neural networks are also used in predicting heart failure by several researchers [2-4,13]. However, these researchers used a set of techniques from machine learning and some primitive techniques for comparison. This motivated us to consider the current state-of-the-art of machine learning algorithms for heart failure prediction and evaluate the machine learning algorithms on a standard benchmark dataset. Therefore, the objective of the work is to identify the best-performing machine learning algorithms on the benchmark dataset by selecting various machine learning techniques and evaluating them on the standard benchmark dataset.
The remainder of this work is organized in the following sections. Section 2 discusses state-of-the-art techniques concerning the use of machine learning algorithms for cardiovascular diseases. In Section 3, we present the experimental set-up and the relevant details. Section 4 covers the results and discussions. Section 5 concludes the work and discusses potential future research directions.
2.
Literature review
In the recent past, there is a considerable attention towards the applications of machine learning techniques in medical diagnostic [6-10]. The advancement of Information and Communication Technologies has resulted in development in many fields including medical sciences. The design of smart hardware has enabled the collection of large scale medical data that paved the way for the design of sophisticated machine learning and deep learning techniques used in health sciences [19,20]. Researchers have used a variety of novel approaches obtaining better results [19,20]. Further, novel and effective computational techniques are also developed. As the coverage of these works are beyond the scope of this work, in the following, we only focus on machine learning and related techniques in the area of cardiovascular diseases.
Ahmad et al. [13] collected the data of 299 patients with heart failure problems and applied statistical techniques to identify important features that contribute to the survival of the patients. The authors identified growing age, high blood pressure, and lower EF values to be key factors that affect the high mortality rate. Chicco and Jurman [2] considered the dataset as used by Ahmad et al. [13] and applied several machine learning algorithms to achieve two objectives. The first objective was to predict the survival possibility of patients whereas the second objective was to identify the features that are more relevant with respect to objective 1. Unlike [13] the authors found that serum creatinine and ejection fraction are the two important feature capable to identify the survival rate from heart failure.
Khourdifi and Bahaj [14] opined that for CVD prediction machine learning algorithms can provide good results in comparison to other techniques as they can model complex problems with non-linearity. The authors explored the concept of selective features selections which imply that not all features are important to predict the outcome. Further, the authors proposed to use particle swarm optimization and ant colony optimization techniques in conjunction with neural networks, random forest and support vector machines. Shah et al. [15] used naive bayes, k-nearest neighbors, random forest and decision tree techniques for heart disease prediction using the dataset available at the UCI repository. The original dataset contained 76 attributes, however, the authors used only 14 attributes and observed that the k-nearest neighbor technique performed the best.
Diwakar et al. [16] reviewed the literature addressing the problem of medical diagnosis using machine learning and image fusion techniques. The work of Mohan et al. [17] focused on the long-standing issue of cardiovascular disease prediction. The authors argued that the prediction of heart disease is a challenging scientific problem of real-world importance and the solution of the problem can be of significant importance for medical practitioners including physicians and surgeons. In the work [17], the authors focused on the identification of important features that can help in predicting heart disease. The authors proposed a feature selection method and evaluated several machine learning algorithms, including decision trees and support vector machines.
Ghosh et al. [3] considered three different datasets and combined them to use as a single dataset for evaluation of the various machine learning algorithms for CVD prediction. The authors used the LASSO feature selection technique to identify important features and produced better results than the other standard approaches. Chen et al. [4] argued that the present techniques of medical diagnostic related to heart failure are heavily dependent on physician knowledge as well as interpretation of the case. The work proposed to use the DPCNN and XGBOOST based hybrid approach for automatic extraction of features from patients' test history text. The authors claimed to achieve significant improvement in prediction sensitivity with the help of the hybrid approach. Porum et al. [18] advocated the use of Convolutional Neural Networks (CNN) for predicting Congestive Heart Failure (CHF) from raw electrocardiogram (ECG) heartbeat. The model is trained on a dataset containing 490,505 heartbeats and the authors claimed to achieve 100% CHF detection accuracy. Machine learning techniques are also used in medical image processing. For instance, machine learning techniques can be used in pre-precessing step to remove noise and other irrelevant information from medical images (such as CT-scans). Further, machine learning techniques are also used for automatic image segmentation and thus reducing the laborious task of manual segmentation of medical imagery [19]. For a detailed review on the use of machine learning techniques for heart failure prediction, readers are highly recommended to look into the following pieces of literature [20-22].
From the literature survey, this research concluded that a variety of techniques are used in the literature for CVDs, however, for survival prediction there is a gap to experimentally evaluate the machine learning algorithms on a standard data set. This research fills this gap by considering various machine learning techniques and evaluate them on a standard benchmark dataset as used in [2,13].
3.
Experimental setup
In this section, the experimental setup is discussed including the description of the methodology, explanation of the dataset, the set of machine learning algorithms, and the evaluation criteria.
3.1. Methodology
The steps followed in performing the experiments are listed below:
i) The dataset is reviewed manually to understand the structure of the data as well as the meaning of the various features. ii) The dataset is pre-processed for missing values, irregular values, values not matching the column description, and outliers. For this purpose various functions of Python's pandas library were used. After ensuring that the pre-processing phase resulted in a cleaned dataset, the data is saved for reproducibility of the results. No feature selection is carried out and all features are selected for model development and implementation. The dataset is split into training and test dataset. The training set contains 80% of the data, whereas the test set contains 20% of the data. Note that the data is randomly divided into training and test sets. Machine learning models are trained using the training set. Models are evaluated based on the performance evaluation criterion using the test set.
3.2. Dataset
We use the dataset collected by Ahmad et al. [13] and was previously used by Chicco and Jurman [2]. The dataset consists of 299 patients' data. The disease among the patients was identified by the use of the Cardiac Echo report as well as the notes written by the physician. Follow-up meetings were arranged with patients with an average period of 130 days. The gender distribution of the data is 194 men and 105 women. All the patients were over 40 years old and belong to NYHA classes III and IV [13]. The dataset consists of 12 features which include age, anemia, Blood Pressure (BP), Creatinine Phosphokinase, diabetes, Ejection Fraction, gender, platelets, serum creatinine, serum sodium and smoking. The target variable is named DEATH EVENT which is a binary variable expressing the survival outcome. Of the feature variables, age, Creatinine Phosphokinase, and serum sodium are continuous variables whereas ejection fraction, serum creatinine, and platelets are categorical variables. Gender, diabetes, anemia, blood pressure, and smoking are considered binary variables. Note that blood pressure and anemia were continuous variables but were transformed into binary variables. The cause of death of a patient is heart-related diseases. The authors [13] claimed to follow the necessary protocols including informed consent from the patients in data collection.
3.3. Algorithms
3.3.1. Logistic Regression
The Logistic Regression (LR) classifier is the most basic yet effective classifier. LR is used in situations where an input needs to be classified into a pre-defined set of classes. Logistic regression-based classifiers can be used both for binary and multi-class classification problems. Sigmoid is the most commonly used activation function, however, other alternatives exist as well.
3.3.2. Support Vector Machine (SVM)
Support Vector Machine (SVM) is a supervised machine learning technique and is used for classification problems where the boundary between various classes is nonlinear [17]. SVM supports several kernel functions such as linear, non-linear and Lagrangian, etc.
3.3.3. Decision Tree
Like SVM, the Decision Tree (DT) is a type of supervised machine learning-based classifier [17]. DTs can be used both for categorical variables as well as continuous variables. DTs can be binary as well as multiway, depending on the nature of the problem. At each level in DT, a decision variable exists (root at the top) and the decision to move to a lower level is based on the corresponding value observed against the node. The leaf nodes contain the outcome, i.e., in case of classification problem the possible values/outcomes of the target variable are present at the leaf.
3.3.4. Artificial Neural Networks
Artificial Neural Networks (ANN) are among the most favoured techniques in the machine learning domain for regression as well as classification problems [7]. ANNs are used in a variety of applications [2,12]. The basic constituent of ANN is a neuron which is a computational unit that receives an input (one or more) and produces an input. Other than the input and output layers, ANN contains multiple layers (called hidden layers) each of which contains many neurons. The final layer of the ANN is called the output layer which contains one or more units depending on the nature of the problem being solved.
3.3.5. Model parameters
To ensure reproducibility of the results, we present the various parameters used for the model in our experiments in Table 1. Note that during the experimentations, several parameters were used, and only the optimal parameters are reported here.
It is important to mention that the parameters given in Table 1 are obtained after testing various combinations. For instance, in case of Artificial Neural Networks various combinations of hidden layers and neurons in each layers were tested and the configuration which resulted in best performance is chosen.
3.4. Performance evaluation criteria
As per se, our machine learning problem is a classification problem, therefore, we will use performance evaluation criteria for classification problems. In this regard, we consider the following evaluation criteria;
● Accuracy
● Precision
● Recall
● F1-Score
However, before defining these terms, let us define some key terminologies that are in turn used to define accuracy, precision, recall and F1-score [7].
True Positive (TP) : An outcome is called as true positive if both the original outcome and predicted outcome are true.
False Positive (FP): An outcome is referred to as a false positive if the original outcome is false and the predicted outcome is true.
False Negative (FN): An outcome is called as false negative if the original outcome is positive whereas the predicted outcome is negative.
True Negative (TN): An outcome is referred to as true negative if both the original outcome and predicted outcome are negative.
Based on the TP, FP, FN and TN, we define the performance evaluation terms as follows [7];
Accuracy: Accuracy is the ratio between all the true predictions (sum of TP and TN) and the total number of predictions.
Precision: Precision is a ratio between the number of true predictions (TP) and that of the sum of instances that are predicted as true (TP+ FP).
Recall: Recall is the ratio between truly predicted records (TP) and that of the total number of positive records (TP + FN).
F1-Score: The F1-score uses the precision and recall score and is considered as the harmonic mean of precision and recall.
4.
Results and discussions
In this section, we present the results and discussions based on the evaluation criteria of accuracy, precision, recall and F1-score. However, before discussing the results based on the criteria, we present the summary of TP, FP, FN and TN in Table 2.
DT is found to be the technique that obtained the best accuracy among the set of algorithms. The achieved accuracy of DT is 80% which is 14% better than the average accuracy of the other 3 models. ANN achieved the minimum accuracy of 60% whereas the accuracy achieved by LR is 78.34% and that of SVM is 66.67%. Figure 1 presents the accuracy achieved by each of the models.
Figure 1.
Accuracy score by each of the machine learning technique.
When the machine learning algorithms are evaluated using precision as the criterion, this research observed that LR outperformed the remaining techniques. LR achieved a precision score of 91.67% whereas the DT achieved 78.94% precision. SVM and ANN scored 80% and 40% respectively in terms of precision. The precision scores of all the algorithms are reflected in Figure 2.
Figure 2.
Precision score by each of the machine learning technique.
Figure 3 shows the recall score of the selected four machine learning techniques. DT has achieved the highest recall score of 65.21%, whereas the minimum recall score is achieved by ANN which is 8.69%. LR achieved a recall of 47.82% whereas SVMs recall score is 17.39%.
Figure 3.
Recall score by each of the machine learning technique.
Finally, in terms of F1-score, this research observed that DT achieved the highest score of 71.4 whereas the second-best performing technique is LR with an F1-score of 62.8. SVM achieved an F1-score of 28.57 and ANN is the least performing technique with an F1-score of 14.28. Figure 4 shows the summary of the F1-score by each of the machine learning techniques.
Figure 4.
F1 score by each of the machine learning technique.
We observed that DT performance is better than the rest of the machine learning techniques on all performance evaluation measures except precision. When precision is considered as a performance measure, LR performs better than the rest of the techniques. Overall, LR is better than SVM and ANN. Only DT performs better than LR. The performance of ANN is observed to be the worst, even worse than a basic logistic regression.
For experiments, the data was randomly divided into 80% and 20% training and test sets respectively. This might result in biased datasets for training and testing. In order to address the shortcoming of random division of data, we performed k-fold cross validation by setting k=5. The results obtained for accuracy metric using k-fold cross validation are presented in Figure 5. The k-fold cross validation confirmed the original performance ordering of the machine learning algorithms. We observed that although there is a slight increase in accuracy scores of all the machine learning techniques, the relative order remained same.
As identified that decision tree is the best performing technique, it is appropriate to visualize it and identify the factors that play an important role in the decision. Figure 6 is a graphical representation of the decision-making process by the decision tree model. Note that the DT is derived based on the number of instances in the training set (80% of the total data). At the root is variable serum creatinine, if the value of serum creatinine is less than or equal to 0.151 then the left sub-tree is traversed, otherwise, the decision is transferred to the right sub-tree. In the left sub-tree (true block), the ejection fraction value is evaluated, whereas, in the right sub-tree (false block), the Phosphokinase value is checked. A point of importance is to note that these values are normalized. It is important to note that we used a standard scaler for values normalization in the pre-processing phase. Our findings are in line with that of Chicco and Jurman [2] who also identified serum creatinine and ejection fraction as important variables. However, our findings are different from that of Ahmad et al. [13] where age and high blood pressure were marked as key factors.
CVDs are the most common cause of death worldwide with an estimated figure of 17.9 million annual deaths. Early prediction and diagnosis of cardiovascular diseases can reduce the number of associated deaths. In this regard, several computational techniques are introduced in the literature that focuses on various aspects of predicting, identifying, and controlling heart-related diseases. In terms of machine learning techniques, the work mainly focused on identifying features that are vital in influencing the survival rate in CVDs. Several different techniques and datasets are used by the researchers with varying degrees of success.
In this work, we considered a set of algorithm/techniques as well as a standard benchmark dataset to review the performance of algorithms/techniques against various performance measures. We identified decision tree to be the best performing algorithm and artificial neural networks to be the worst performing on various performance measures. The performance of DT is 14% better than the average performance of other techniques. In terms of accuracy, the performance difference between DT and LR is not significant. This work can be further extended in terms of designing robust machine learning algorithms that can effectively perform well on real-world data sets. Further, the current dataset is relatively imbalanced and it will also be important to collect more robust data regarding various heart-related conditions and use the data for training more robust models. Another potential research direction is to explore the use of medical image processing techniques for CVDs.
Conflict of interest
The author declares that there is no conflict of interest to report regarding the present study.
References
[1]
B. He, J. Huang, H. Chen, PVsiRNAPred: Prediction of plant exclusive virus-derived small interfering RNAs by deep convolutional neural network, J Bioinform. Comput. Biol., 17 (2019), 1950039. https://doi.org/10.1142/S0219720019500392 doi: 10.1142/S0219720019500392
[2]
D. Baulcombe, RNA silencing in plants, Nature, 431 (2004), 356-363. https://doi.org/10.1038/nature02874 doi: 10.1038/nature02874
[3]
E. J. Chapman, J. C. Carrington, Specialization and evolution of endogenous small RNA pathways, Nat. Rev. Genet., 8 (2007), 884-896. https://doi.org/10.1038/nrg2179 doi: 10.1038/nrg2179
[4]
M. Niu, Y. Lin, Q. Zou, sgRNACNN: Identifying sgRNA on-target activity in four crops using ensembles of convolutional neural networks, Plant. Mol. Biol., 105 (2021), 483-495. https://doi.org/10.1007/s11103-020-01102-y doi: 10.1007/s11103-020-01102-y
[5]
S. M. Hammond, E. Bernstein, D. Beach, G. J. Hannon, An RNA-directed nuclease mediates post-transcriptional gene silencing in Drosophila cells, Nature, 404 (2000), 293-296. https://doi.org/10.1038/35005107 doi: 10.1038/35005107
[6]
S.-W. Ding, R. Lu, Virus-derived siRNAs and piRNAs in immunity and pathogenesis, Curr. Opin. Virol., 1 (2011), 533-544. https://doi.org/10.1016/j.coviro.2011.10.028 doi: 10.1016/j.coviro.2011.10.028
[7]
X. Chen, Small RNAs and their roles in plant development, Annu. Rev. Cell. Dev. Biol., 25 (2009), 21-44. https://doi.org/10.1146/annurev.cellbio.042308.113417 doi: 10.1146/annurev.cellbio.042308.113417
[8]
C. Cao, J. Wang, D. Kwok, F. Cui, Z. Zhang, D. Zhao, et al., WebTWAS: A resource for disease candidate susceptibility genes identified by transcriptome-wide association study, Nucleic Acids Res., 50 (2021), D1123-D1130. https://doi.org/10.1093/nar/gkab957 doi: 10.1093/nar/gkab957
[9]
X. Song, P. Li, J. Zhai, M. Zhou, L. Ma, B. Liu, et al., Roles of DCL4 and DCL3b in rice phased small RNA biogenesis, Plant J., 69 (2012), 462-474. https://doi.org/10.1111/j.1365-313X.2011.04805.x doi: 10.1111/j.1365-313X.2011.04805.x
[10]
Y. Liu, C. Teng, R. Xia, B. C. Meyers, PhasiRNAs in Plants: Their biogenesis, genic sources, and roles in stress responses, development, and reproduction, Plant Cell, 32 (2020), 3059-3080. https://doi.org/10.1105/tpc.20.00335 doi: 10.1105/tpc.20.00335
[11]
Q. Fei, R. Xia, B. C. Meyers, Phased, secondary, small interfering RNAs in posttranscriptional regulatory networks, Plant Cell, 25 (2013), 2400-2415. https://doi.org/10.1105/tpc.113.114652 doi: 10.1105/tpc.113.114652
[12]
S. Belanger, S. Pokhrel, K. Czymmek, B. C. Meyers, Premeiotic, 24-nucleotide reproductive phasiRNAs are abundant in anthers of wheat and barley but not rice and maize, Plant Physiol., 184 (2020), 1407-1423. https://doi.org/10.1104/pp.20.00816 doi: 10.1104/pp.20.00816
[13]
C. Chen, J. Li, J. Feng, B. Liu, L. Feng, X. Yu, et al., sRNAanno-a database repository of uniformly annotated small RNAs in plants, Hortic Res., 8 (2021), 45. https://doi.org/10.1038/s41438-021-00480-8 doi: 10.1038/s41438-021-00480-8
[14]
J. Liu, X. Liu, S. Zhang, S. Liang, W. Luan, X. Ma, TarDB: An online database for plant miRNA targets and miRNA-triggered phased siRNAs, BMC Genomics, 22 (2021), 348. https://doi.org/10.1186/s12864-021-07680-5 doi: 10.1186/s12864-021-07680-5
[15]
H. M. Chen, L. T. Chen, K. Patel, Y. H. Li, D. C. Baulcombe, S. H. Wu, 22-Nucleotide RNAs trigger secondary siRNA biogenesis in plants, Proc. Natl. Acad. Sci. U. S. A., 107 (2010), 15269-15274. https://doi.org/10.1073/pnas.1001738107 doi: 10.1073/pnas.1001738107
[16]
R. Xia, J. Xu, S. Arikit, B. C. Meyers, Extensive families of miRNAs and PHAS Loci in Norway spruce demonstrate the origins of complex phasiRNA networks in seed plants, Mol. Biol. Evol., 32 (2015), 2905-2918. https://doi.org/10.1093/molbev/msv164 doi: 10.1093/molbev/msv164
[17]
J. Zhai, D. H. Jeong, E. De Paoli, S. Park, B. D. Rosen, Y. Li, et al., MicroRNAs as master regulators of the plant NB-LRR defense gene family via the production of phased, trans-acting siRNAs, Genes Dev., 25 (2011), 2540-2553. https://doi.org/10.1101/gad.177527.111 doi: 10.1101/gad.177527.111
[18]
E. de Paoli, A. Dorantes-Acosta, J. Zhai, M. Accerbi, D. H. Jeong, S. Park, et al., Distinct extremely abundant siRNAs associated with cosuppression in petunia, RNA, 15 (2009), 1965-1970. https://doi.org/10.1261/rna.1706109 doi: 10.1261/rna.1706109
[19]
M. Oubounyt, Z. Louadi, H. Tayara, K. T. Chong, DeePromoter: Robust promoter predictor using deep learning, Front. Genet., 10 (2019), 286. https://doi.org/10.3389/fgene.2019.00286 doi: 10.3389/fgene.2019.00286
[20]
Y. Qian, Y. Zhang, B. Guo, S. Ye, Y. Wu, J. Zhang, An improved promoter recognition model using convolutional neural network, in 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), (2018), 471-476. https://doi.org/10.1109/COMPSAC.2018.00072
[21]
Y. Yang, Z. Hou, Z. Ma, X. Li, K. C. Wong, iCircRBP-DHN: Identification of circRNA-RBP interaction sites using deep hierarchical network, Brief. Bioinform., 22 (2021). https://doi.org/10.1093/bib/bbaa274 doi: 10.1093/bib/bbaa274
[22]
D. Wang, C. Zhang, B. Wang, B. Li, Q. Wang, D. Liu, et al., Optimized CRISPR guide RNA design for two high-fidelity Cas9 variants by deep learning, Nat. Commun., 10 (2019), 4284. https://doi.org/10.1038/s41467-019-12281-8 doi: 10.1038/s41467-019-12281-8
[23]
Neeraj, V. Singhal, J. Mathew, R. K. Behera, Detection of alcoholism using EEG signals and a CNN-LSTM-ATTN network, Comput. Biol. Med., 138 (2021), 104940. https://doi.org/10.1016/j.compbiomed.2021.104940 doi: 10.1016/j.compbiomed.2021.104940
[24]
Q. Liu, J. Chen, Y. Wang, S. Li, C. Jia, J. Song, et al., DeepTorrent: A deep learning-based approach for predicting DNA N4-methylcytosine sites, Brief. Bioinform., 22 (2020). https://doi.org/10.1093/bib/bbaa124 doi: 10.1093/bib/bbaa124
[25]
Y. Zhu, F. Li, D. Xiang, T. Akutsu, J. Song, C. Jia, Computational identification of eukaryotic promoters based on cascaded deep capsule neural networks, Briefi. Bioinform., 22 (2020). https://doi.org/10.1093/bib/bbaa299 doi: 10.1093/bib/bbaa299
[26]
D. Salimi, A. Moeini, Incorporating K-mers highly correlated to epigenetic modifications for Bayesian inference of gene interactions, Curr. Bioinform., 16 (2021), 484-492. https://doi.org/10.2174/1574893615999200728193621 doi: 10.2174/1574893615999200728193621
[27]
S. Ye, Y. Liang, B. Zhang, Bayesian functional mixed-effects models with grouped smoothness for analyzing time-course gene expression data, Curr. Bioinform., 16 (2021), 2-12. https://doi.org/10.2174/1574893615999200520082636 doi: 10.2174/1574893615999200520082636
[28]
D. Chai, C. Jia, J. Zheng, Q. Zou, F. Li, Staem5: A novel computational approachfor accurate prediction of m5C site, Mol. Ther. Nucl. Acids., 26 (2021), 1027-1034. https://doi.org/10.1016/j.omtn.2021.10.012 doi: 10.1016/j.omtn.2021.10.012
[29]
H. Abbasimehr, R. Paki, Prediction of COVID-19 confirmed cases combining deep learning methods and Bayesian optimization, Chaos Solitons Fractals, 142 (2021), 110511. https://doi.org/10.1016/j.chaos.2020.110511 doi: 10.1016/j.chaos.2020.110511
[30]
J. Chen, Q. Zou, J. Li, DeepM6ASeq-EL: Prediction of human N6-Methyladenosine (m6A) sites with LSTM and ensemble learning, Front.. Comput. Sci., 16 (2022), 162302. https://doi.org/10.1007/s11704-020-0180-0 doi: 10.1007/s11704-020-0180-0
[31]
A. K. Sharma, R. Srivastava, Protein secondary structure prediction using character Bi-gram embedding and Bi-LSTM, Curr. Bioinform., 16 (2021), 333-338. https://doi.org/10.2174/1574893615999200601122840 doi: 10.2174/1574893615999200601122840
[32]
A. Rafiei, A. Rezaee, F. Hajati, S. Gheisari, M. Golzan, SSP: Early prediction of sepsis using fully connected LSTM-CNN model, Comput. Biol. Med., 128 (2021), 104110. https://doi.org/10.1016/j.compbiomed.2020.104110 doi: 10.1016/j.compbiomed.2020.104110
[33]
H. Lv, F. Y. Dao, Z. X. Guan, H. Yang, Y. W. Li, H. Lin, Deep-Kcr: Accurate detection of lysine crotonylation sites using deep learning method, Brief. Bioinform., 22 (2021), 255. https://doi.org/10.1093/bib/bbaa255 doi: 10.1093/bib/bbaa255
[34]
S. Gholamizoj, B. Ma, SPEQ: Quality assessment of peptide tandem mass spectra with deep learning, Bioinformatics, 38 (2022), 1568-1574. https://doi.org/10.1093/bioinformatics/btab874 doi: 10.1093/bioinformatics/btab874
[35]
D. D. S. Lima, L. J. A. Amichi, A. A. Constantino, M. A. Fernandez, F. A. V. Seixas, NCYPred: A bidirectional LSTM network with attention for Y RNA and short non-coding RNA classification, IEEE-ACM Trans. Comput. Biol. Bioinform. (2021), 1-1. https://doi.org/10.1109/TCBB.2021.3131136
[36]
M. L. Chen, A. Doddi, J. Royer, L. Freschi, M. Schito, M. Ezewudo, et al., Deep learning predicts tuberculosis drug resistance status from genome sequencing data, BioRxiv, (2018), 275628. https://doi.org/10.1101/275628
This article has been cited by:
1.
Isaac Kofi Nti, Owusu Nyarko-Boateng, Adebayo Felix Adekoya, Patrick Kwabena Mensah, Mighty Abra Ayidzoe, Godfred Kusi Fosu, Henrietta Adjei Pokuaa, R. Arjun,
2023,
Chapter 27,
978-981-19-6630-9,
383,
10.1007/978-981-19-6631-6_27
2.
Amina Magomedovna Alieva, Elena Vladimirovna Reznik, Natalia Vadimovna Teplova, Leyla Ramazanovna Sarakaeva, Elena Valerievna Surskaya, Dzhannet Anuarovna Elmurzaeva, Madina Yakubovna Shavaeva, Alik Magomedovich Rakhaev, Irina Aleksandrovna Kotikova, Igor Gennadievich Nikitin,
Interleukin-13 and cardiovascular diseases: literature review,
2022,
28,
2412-9100,
291,
10.17816/medjrf109605
3.
Marianne Lyne Manaog, Luca Parisi,
Funnel Random Forest: Inliers-Focused Ensemble Learning for Improved Prognostics of Heart Failure,
2022,
1556-5068,
10.2139/ssrn.4132314
4.
Mohammadreza Salehirad, Mohammad Mollaie Emamzadeh, Mojtaba Barkhordari Yazdi,
Improved Equilibrium Optimizer for Accurate Training of Feedforward Neural Networks,
2024,
33,
1060-992X,
133,
10.3103/S1060992X24700048
5.
Betimihirt G. Tsehay, Abdulkeirm M. Yibre,
2025,
Chapter 2,
978-3-031-64150-3,
21,
10.1007/978-3-031-64151-0_2
6.
Sorif Hossain, Mohammad Kamrul Hasan, Mohammad Omar Faruk, Nelufa Aktar, Riyadh Hossain, Kabir Hossain,
Machine learning approach for predicting cardiovascular disease in Bangladesh: evidence from a cross-sectional study in 2023,
2024,
24,
1471-2261,
10.1186/s12872-024-03883-2
7.
Sonam Palden Barfungpa, Leena Samantaray, Hiren Kumar Deva Sarma,
SMOTE-based adaptive coati kepler optimized hybrid deep network for predicting the survival of heart failure patients,
2024,
83,
1573-7721,
65497,
10.1007/s11042-023-18061-3
8.
Charanjeet Gaba, Sonam Khattar, Sheenam Middha,
2023,
An Empirical Study of Machine Learning Methods for Analyzing Cardiovascular Disease,
9798400709418,
1,
10.1145/3647444.3647834
9.
Patrizia Ribino, Claudia Di Napoli, Giovanni Paragliola, Luca Serino,
Hyper-Parameter Optimization through Reinforcement Learning for Survival Prediction of Patients with Heart Failure,
2024,
239,
18770509,
1754,
10.1016/j.procs.2024.06.354
10.
Abdallah Abdellatif, Hamza Mubarak, Hamdan Abdellatef, Jeevan Kanesan, Yahya Abdelltif, Chee-Onn Chow, Joon Huang Chuah, Hassan Muwafaq Gheni, Graham Kendall,
Computational detection and interpretation of heart disease based on conditional variational auto-encoder and stacked ensemble-learning framework,
2024,
88,
17468094,
105644,
10.1016/j.bspc.2023.105644
11.
Megha Bhushan, Akkshat Pandit, Ayush Garg,
Machine learning and deep learning techniques for the analysis of heart disease: a systematic literature review, open challenges and future directions,
2023,
56,
0269-2821,
14035,
10.1007/s10462-023-10493-5
12.
Joseph Chukwudi Okeibunor, Anelisa Jaca, Chinwe Juliana Iwu-Jaja, Ngozi Idemili-Aronu, Housseynou Ba, Zukiswa Pamela Zantsi, Asiphe Mavis Ndlambe, Edison Mavundza, Derrick Muneene, Charles Shey Wiysonge, Lindiwe Makubalo,
The use of artificial intelligence for delivery of essential health services across WHO regions: a scoping review,
2023,
11,
2296-2565,
10.3389/fpubh.2023.1102185
13.
Ashir Javeed, Muhammad Asim Saleem, Ana Luiza Dallora, Liaqat Ali, Johan Sanmartin Berglund, Peter Anderberg,
Decision Support System for Predicting Mortality in Cardiac Patients Based on Machine Learning,
2023,
13,
2076-3417,
5188,
10.3390/app13085188
Pranav Kumar Sandilya, Deep Pal, Shrey Rajesh Dahiya, Soven K. Dana,
2024,
Prognostic Modeling for Heart Failure Survival: A Classification Approach,
979-8-3503-0843-3,
415,
10.1109/SPIN60856.2024.10512260
19.
Xin-yi Zhou, Yu-mei Li, Ju-kun Su, Yan-feng Wang, Jin Su, Qiao-hong Yang,
Effects of posttraumatic growth on psychosocial adjustment in young and middle-aged patients with acute myocardial infarction: The mediating role of rumination,
2023,
62,
01479563,
81,
10.1016/j.hrtlng.2023.06.003
20.
Chunjie Zhou, Pengfei Dai, Aihua Hou, Zhenxing Zhang, Li Liu, Ali Li, Fusheng Wang,
A comprehensive review of deep learning-based models for heart disease prediction,
2024,
57,
1573-7462,
10.1007/s10462-024-10899-9
21.
Soundararajan Sankaranarayanan, Elangovan Gunasekaran, Amir shaikh, S Govinda Rao,
A novel survival analysis of machine using fuzzy ensemble convolutional based optimal RNN,
2023,
234,
09574174,
120966,
10.1016/j.eswa.2023.120966
Qisthi A Hidayaturrohman, Eisuke Hanada,
Predictive Analytics in Heart Failure Risk, Readmission, and Mortality Prediction: A Review,
2024,
2168-8184,
10.7759/cureus.73876
24.
Emmanuel Kokori, Ravi Patel, Gbolahan Olatunji, Bonaventure Michael Ukoaka, Israel Charles Abraham, Victor Oluwatomiwa Ajekiigbe, Julia Mimi Kwape, Adetola Emmanuel Babalola, Ntishor Gabriel Udam, Nicholas Aderinto,
Machine learning in predicting heart failure survival: a review of current models and future prospects,
2024,
1573-7322,
10.1007/s10741-024-10474-y
25.
Monir Abdullah,
Artificial intelligence-based framework for early detection of heart disease using enhanced multilayer perceptron,
2025,
7,
2624-8212,
10.3389/frai.2024.1539588
26.
Alireza Jafarkhani, Behzad Imani, Soheila Saeedi, Amir Shams,
Predicting Factors Affecting Survival Rate in Patients Undergoing On‐Pump Coronary Artery Bypass Graft Surgery Using Machine Learning Methods: A Systematic Review,
2025,
8,
2398-8835,
10.1002/hsr2.70336
27.
P. Chairmadurai, P. Kavitha, S. Kamalakkannan,
2025,
Enhanced Bayesian Optimized Support Vector Machine (BO-SVM) Classification and Prediction of Heart Disease,
979-8-3315-2392-3,
1044,
10.1109/ICSADL65848.2025.10933457
28.
Faisal S. Alsubaei, Abdulwahab Ali Almazroi, Walid Said Atwa, Abdulaleem Ali Almazroi, Nasir Ayub, N. Z. Jhanjhi,
Adaptive malware identification via integrated SimCLR and GRU networks,
2025,
15,
2045-2322,
10.1038/s41598-025-08556-4
Yuanyuan Bu, Jia Zheng, Cangzhi Jia. An efficient deep learning based predictor for identifying miRNA-triggered phasiRNA loci in plant[J]. Mathematical Biosciences and Engineering, 2023, 20(4): 6853-6865. doi: 10.3934/mbe.2023295
Yuanyuan Bu, Jia Zheng, Cangzhi Jia. An efficient deep learning based predictor for identifying miRNA-triggered phasiRNA loci in plant[J]. Mathematical Biosciences and Engineering, 2023, 20(4): 6853-6865. doi: 10.3934/mbe.2023295