Cancer tumor | Number of samples | Training (≈90%) | Testing (≈10%) |
BLCA | 434 | 390 | 44 |
COAD | 337 | 303 | 34 |
LIHC | 429 | 386 | 43 |
LUAD | 492 | 443 | 49 |
PRAD | 549 | 494 | 55 |
Total | 2241 | 2016 | 225 |
The continuous increase of energy consumption resulted in the unavoidable increase in demand for renewable energy (RE) investment projects in recent years. Although the necessity of developing alternative energy sources is clear, the government cannot afford the huge investment in RE investment projects without private sector participation. Therefore, analyzing the decision-making procedure from the investor's point of view is essential to improve this process. Numerous studies in the literature developed various multi-criteria decision-making approaches using expert's judgments to provide informed decisions on RE investment projects. While prior efforts are valuable, accounting heterogeneity impact with regard to experts' background and knowledge on results has not been examined. Therefore, this study aims to develop a modified decision-making approach in RE projects using an analytical hierarchy process to: (1) provide a comprehensive review of investors criteria in RE projects; (2) evaluate how the level of expertise of experts in RE subject has an impact on the achieved common solution; and (3) determine the best RE alternative in different scenarios. Then, Iran, as a case study is selected to illustrate the model practicability. The results indicate that those who have higher expertise in the subject are more concerned about the "consumption market", and "government supportive policies". Whereas economic factors remain the most challenging criteria in less expert participation views. Both groups chose 'wind energy' as the best alternative energy source for investment based on current Iran's energy market. It is anticipated that the developed methodology and its results can be used by (1) government and public agencies to understand the investors' concerns; (2) investors to make a more-informed risk-based decision in RE projects or other complex decision-making projects.
Citation: Abdolmajid Erfani, Mehdi Tavakolan, Ali Hassandokht Mashhadi, Pouria Mohammadi. Heterogeneous or homogeneous? A modified decision-making approach in renewable energy investment projects[J]. AIMS Energy, 2021, 9(3): 558-580. doi: 10.3934/energy.2021027
[1] | Nana Wei, Hanwen Zhu, Chun Li, Xiaoqi Zheng . Purimeth: an integrated web-based tool for estimating and accounting for tumor purity in cancer DNA methylation studies. Mathematical Biosciences and Engineering, 2021, 18(6): 8951-8961. doi: 10.3934/mbe.2021441 |
[2] | Bo Wei, Rui Wang, Le Wang, Chao Du . Prognostic factor identification by analysis of the gene expression and DNA methylation data in glioma. Mathematical Biosciences and Engineering, 2020, 17(4): 3909-3924. doi: 10.3934/mbe.2020217 |
[3] | Xiuxian Zhu, Xianxiong Ma, Chuanqing Wu . A methylomics-correlated nomogram predicts the recurrence free survival risk of kidney renal clear cell carcinoma. Mathematical Biosciences and Engineering, 2021, 18(6): 8559-8576. doi: 10.3934/mbe.2021424 |
[4] | Huili Yang, Wangren Qiu, Zi Liu . Anoikis-related mRNA-lncRNA and DNA methylation profiles for overall survival prediction in breast cancer patients. Mathematical Biosciences and Engineering, 2024, 21(1): 1590-1609. doi: 10.3934/mbe.2024069 |
[5] | Huiqing Wang, Xiao Han, Jianxue Ren, Hao Cheng, Haolin Li, Ying Li, Xue Li . A prognostic prediction model for ovarian cancer using a cross-modal view correlation discovery network. Mathematical Biosciences and Engineering, 2024, 21(1): 736-764. doi: 10.3934/mbe.2024031 |
[6] | Ye Hu, Meiling Wang, Kainan Wang, Jiyue Gao, Jiaci Tong, Zuowei Zhao, Man Li . A potential role for metastasis-associated in colon cancer 1 (MACC1) as a pan-cancer prognostic and immunological biomarker. Mathematical Biosciences and Engineering, 2021, 18(6): 8331-8353. doi: 10.3934/mbe.2021413 |
[7] | Natalya Shakhovska, Vitaliy Yakovyna, Valentyna Chopyak . A new hybrid ensemble machine-learning model for severity risk assessment and post-COVID prediction system. Mathematical Biosciences and Engineering, 2022, 19(6): 6102-6123. doi: 10.3934/mbe.2022285 |
[8] | Venkatesan Rajinikanth, Seifedine Kadry, Ramya Mohan, Arunmozhi Rama, Muhammad Attique Khan, Jungeun Kim . Colon histology slide classification with deep-learning framework using individual and fused features. Mathematical Biosciences and Engineering, 2023, 20(11): 19454-19467. doi: 10.3934/mbe.2023861 |
[9] | Xiaoyu Hou, Baoshan Ma, Ming Liu, Yuxuan Zhao, Bingjie Chai, Jianqiao Pan, Pengcheng Wang, Di Li, Shuxin Liu, Fengju Song . The transcriptional risk scores for kidney renal clear cell carcinoma using XGBoost and multiple omics data. Mathematical Biosciences and Engineering, 2023, 20(7): 11676-11687. doi: 10.3934/mbe.2023519 |
[10] | Juan Zhou, Xiong Li, Yuanting Ma, Zejiu Wu, Ziruo Xie, Yuqi Zhang, Yiming Wei . Optimal modeling of anti-breast cancer candidate drugs screening based on multi-model ensemble learning with imbalanced data. Mathematical Biosciences and Engineering, 2023, 20(3): 5117-5134. doi: 10.3934/mbe.2023237 |
The continuous increase of energy consumption resulted in the unavoidable increase in demand for renewable energy (RE) investment projects in recent years. Although the necessity of developing alternative energy sources is clear, the government cannot afford the huge investment in RE investment projects without private sector participation. Therefore, analyzing the decision-making procedure from the investor's point of view is essential to improve this process. Numerous studies in the literature developed various multi-criteria decision-making approaches using expert's judgments to provide informed decisions on RE investment projects. While prior efforts are valuable, accounting heterogeneity impact with regard to experts' background and knowledge on results has not been examined. Therefore, this study aims to develop a modified decision-making approach in RE projects using an analytical hierarchy process to: (1) provide a comprehensive review of investors criteria in RE projects; (2) evaluate how the level of expertise of experts in RE subject has an impact on the achieved common solution; and (3) determine the best RE alternative in different scenarios. Then, Iran, as a case study is selected to illustrate the model practicability. The results indicate that those who have higher expertise in the subject are more concerned about the "consumption market", and "government supportive policies". Whereas economic factors remain the most challenging criteria in less expert participation views. Both groups chose 'wind energy' as the best alternative energy source for investment based on current Iran's energy market. It is anticipated that the developed methodology and its results can be used by (1) government and public agencies to understand the investors' concerns; (2) investors to make a more-informed risk-based decision in RE projects or other complex decision-making projects.
Cancer is a serious disease that profoundly affects human physical and mental health. According to the International Agency for Research on Cancer (IARC) of the World Health Organization, approximately 19.3 million people worldwide were diagnosed with cancer in 2020 [1], with over half being male patients. Moreover, male-specific tumors [2] (such as prostate adenocarcinoma) have garnered significant attention due to their high incidence rates and their impact on men's health. There are notable differences in the global occurrence rates of male cancers [3]. Accurately classifying these cancers is one of the fundamental strategies to furnish clinical decision-making information and reduce the mortality rates of male cancers [4]. Among these, bladder urothelial carcinoma [5], colon adenocarcinoma [6], liver hepatocellular carcinoma [7], lung adenocarcinoma [8] and prostate adenocarcinoma [9] are prevalent cancers in males. The incidence of these prevalent cancers in men, among which prostate adenocarcinoma is one of the most common, increases with age. Bladder urothelial carcinoma and colon adenocarcinoma are usually associated with diet, lifestyle and genetic factors. On the other hand, liver hepatocellular carcinoma is primarily associated with hepatitis B and C virus infection, while lung adenocarcinoma is associated with smoking and exposure to airborne pollutants. These cancers' high incidence and mortality rates pose considerable threats to human health and life. Hence, cancer classification is crucial in selecting appropriate treatment strategies and improving the patient's prognosis. DNA methylation analysis has emerged as a promising tool for cancer classification [10], providing valuable insights into tumor biology and revealing potential therapeutic targets.
DNA methylation is the process of covalently modifying DNA by adding a methyl group to cytosine residues located in CpG dinucleotide contexts without altering the DNA sequence itself [11]. This process is critical in regulating gene expression, maintaining genomic stability and silencing transposable elements [12]. Increasing evidence suggests that abnormal DNA methylation patterns are associated with many diseases [13], especially cancer. Specifically, abnormal DNA methylation patterns in CpG island promoter regions [14] can lead to an increased loss of control of gene expression and genomic instability, thus promoting tumor initiation and progression. It is noteworthy that DNA methylation analysis has become an effective tool for cancer classification primarily because this technique can provide comprehensive information on the methylation status of individual CpG sites [15]. Consequently, it can accurately identify differential methylation patterns between normal and tumor tissues, making it an essential tool for cancer diagnosis and classification.
In recent years, high-throughput sequencing technology [16] has emerged as one of the most crucial tools in cancer research. DNA methylation data, which are closely associated with cancer development, are one of the types of data analyzed using this technology. With the continuous advancement of sequencing technology and computer processing capabilities, an increasing amount of large-scale DNA methylation data has been amassed. The challenge is now to extract useful information from these data and classify cancer, a critical issue in current cancer research. In addition to integrating multiple high-throughput sequencing data, artificial intelligence technology has also been widely used in cancer research. For instance, deep learning algorithms can be utilized to automate tasks and improve work efficiency in cancer diagnosis and treatment. For example, Mohammed et al. [17] used multiple One-Dimensional Convolutional Neural Network (1D-CNN) models stacked together to classify five types of cancer based on The Cancer Genome Atlas (TCGA) RNA-seq data. Jia et al. [18] proposed a method that combines variance selection with recursive feature elimination, successfully selecting 20 optimal features from over 480,000 dimensions of DNA methylation data. They compared the performance of four different estimators and five classifiers and achieved an accuracy of over 93%. Furthermore, Lin et al. [19] developed a new cancer prediction model, iCancer-Pred, utilizing deep neural networks. This model can classify seven different cancer datasets obtained from the TCGA Hub database on the University of California Santa Cruz (UCSC) XENA platform [19,20]. The authors compared this method with machine learning techniques such as support vector machines (SVM), logistic regression (LR) and random forest (RF). By means of 5-fold cross-validation, they achieved the highest accuracy of the model to be up to 97%. Although several existing studies have made significant progress in cancer classification using various models, there is still a need to overcome model limitations and improve the overall performance of cancer classification.
In this study, we propose an ensemble learning-based classification algorithm called Stacking for classifying male tumors. Specifically, we utilize the chi-square test and L1 regularity based logistic regression to select features highly associated with the characteristics of the cancer dataset. Subsequently, we devised an ensemble learning algorithm to distinguish the five most common cancers in males and their corresponding normal tissues. Stacking has been tested on DNA methylation 450K cancers data set, where the results demonstrated a significant advantage in the accurate classification of cancer. In addition, this study explores the relationship between potential genes and the survival rates of these five common cancers through gene ontology analysis, survival analysis, literature review and other related methods. Our findings suggest that the SRC gene is associated with bladder urothelial carcinoma survival, while RPS2, RPL23A, RPL22, RPL27 and SRC genes are related to liver hepatocellular carcinoma survival. Furthermore, KRAS gene is associated with lung adenocarcinoma survival, and SRC gene is associated with prostate adenocarcinoma survival. These discoveries may assist in the early identification and precise categorization of these cancer types, while also pinpointing potential treatment approaches to enhance the survival rates among high-risk males.
UCSC XENA is one of the websites derived from the TCGA database. The site stores several large public datasets on cancer, including TCGA, GETX and TARGET, among others with powerful and intuitive functionality.
The DNA methylation 450K data used in this study were downloaded exclusively from the UCSC XENA platform, which included datasets for bladder urothelial carcinoma (BLCA), colon adenocarcinoma (COAD), liver hepatocellular carcinoma (LIHC), lung adenocarcinoma (LUAD) and prostate adenocarcinoma (PRAD). A total of 2241 samples of both cancer and normal tissue were obtained by combining these five datasets, as shown in Table 1. The dataset was then divided into a training set and a testing set at a ratio of 9:1.
Cancer tumor | Number of samples | Training (≈90%) | Testing (≈10%) |
BLCA | 434 | 390 | 44 |
COAD | 337 | 303 | 34 |
LIHC | 429 | 386 | 43 |
LUAD | 492 | 443 | 49 |
PRAD | 549 | 494 | 55 |
Total | 2241 | 2016 | 225 |
Given the high-dimensional nature of the data in this study, with more than 480,000 dimensions, the sample size seems somewhat limited. However, it is essential to note that not all features hold equal importance for the classification model. Therefore, it is crucial to identify and select the most informative features to ensure accurate and effective cancer classification. This task is achieved through feature selection and dimensionality reduction, where representative features are selected. In the training dataset, features containing "NaN" were removed, and in the test dataset, they were substituted with 0.
To select features relevant to the five common tumor classifications, the chi-square test was initially employed for feature selection. The chi-square test [21] is a statistical method used to evaluate the independence between categorical variables. It is employed to assess the significance of each feature in predicting the target variable. We can determine the association between features and cancer classifications by utilizing the chi-square test, thereby selecting the crucial features. Specifically, we used the SelectKBest [22] function from the scikit-learn library [23] to filter out features with top chi-square scores. Based on this, through a cross-validation approach, we determined that the performance of the chi-square test was significantly improved at a feature count of 22,120. The features selected by the chi-square test are highly relevant to cancer classification tasks [24] and thus are of significant importance. Consequently, these features were utilized as input features for subsequent classifier training and testing. The formula used is as follows:
χ2=n∑i=1m∑j=1(Oij−Eij)2Eij. | (1) |
Oij denotes the observed value of the cross term in row i, column j; Eij represents the expected value of the cross term in row i, column j; n denotes the number of rows; m denotes the number of columns.
Although approximately 400,000 CpG sites were removed from the cancer dataset using chi-square testing, which significantly reduced the number of sample features, it is still necessary to further reduce the number of features to construct a high-performance predictor. Feature selection can assist in reducing model complexity [25], which minimizes the risk of overfitting and enhances model interpretability and explainability. By selecting features with strong predictive power for the target variable, feature selection can improve the predictive performance of the model [26]. Additionally, feature selection can decrease data processing and modeling time and expenses.
To accomplish this aim, we employed a logistic regression model based on the L1 parametric penalty [27], and the SelectFromModel function of the scikit-learn library was used to filter features. "L1" refers to L1 regularization, which is a regularization technique used in machine learning models like linear regression and logistic regression [28]. This approach helps identify crucial features of the classification task by penalizing the model's complexity, thus preventing overfitting. These features not only enhance the model's performance and predictive power but also its interpretability and practical application value. In practical applications, we can perform more nuanced feature selection and optimization based on the significance and weight of these features to further improve the model's performance and application. The L1 regularization method naturally possesses feature selection properties because of its sparse solution characteristic.
The logistic regression model computes the probability of a data point belonging to a specific class based on a linear combination of input features [29]. The fundamental logistic regression formula without regularization is:
p(y=n|x)=1(1+e−(w0+w1x1+w2x2+⋯+wnxn)). | (2) |
Here, p(y=n|x) denotes the probability of data point x belonging to class n, wi represents the weights for each feature xi and exp is the exponential function.
When L1 regularization is applied, the objective function is the sum of the log loss to be minimised and the L1 regularization term, which is the absolute sum of the weights. The L1 regularization term is added with a regularization strength parameter λ. The objective function is:
J(θ)=−1mm∑i=1[−ylog(p(y=n|x))−(1−y)log(1−p(y=n|x))]+λm∑i=1|wi|, | (3) |
where y is the true label of the data points, and the L1 regularized logistic regression model is derived by minimizing the objective function J(θ) with respect to the weight wi.
Stacking is an ensemble method for models [30], where the combination of multiple weaker models often yields better performance than a single strong model. This approach involves training several base learners and using their predictions as input to a meta-learner. The stacked ensemble algorithm offers superior performance, generalization capabilities and flexibility compared to individual algorithms by leveraging the advantages of multiple base learners to enhance model accuracy and robustness. In this study, we propose a framework combining a chi-square test, logistic regression with L1 penalty and stacking ensemble learning to construct a multiclass classifier for five types of cancer data. The overall flowchart of this study is presented in Figure 1.
The approach has trained seven base classification models: random forest (RF), support vector machine (SVM), bootstrap aggregated algorithm (Bagging), stochastic gradient descent (SGD), multilayer perceptron (MLP), logistic regression (LR) and LightGBM (LGBM) [31,32,33,34]. The reason for selecting these models is that they are based on different algorithms and can capture different data features. The LR, SVM and SGD models are linear models, the RF model captures nonlinear relationships and interactions, Bagging and LighTGBM capture nonlinear relationships by boosting weak learners and MLP can solve linearly inseparable problems. In this study, the integrated algorithm is designed to leverage the strengths of multiple models more effectively than a single algorithm. This approach aims to enhance performance robustness and accuracy.
All base learners were trained on the whole training set and then evaluated with the validation set. These predictions were used to train the meta-learner along with the true labels in the validation set. For the meta-learner, we chose the LGBM model. The specific prediction process is shown in Figure 2.
In summary, stacking is an effective method of ensemble learning that combines multiple models to achieve higher performance than any single model [35]. By leveraging different base learners and using a meta-learner to find the best way to combine them, stacking can produce better results compared to any single model. Our experimental performance demonstrated the effectiveness of the stacking approach for this classification task.
Scientific evaluation metrics are crucial to the performance metrics of a model. We usually use a variety of evaluation metrics to measure the performance of a model, such as accuracy and recall. These metrics can not only help us understand the predictive ability of the model but also help us optimize the parameters and structure of the model to improve its performance. In this study, the evaluation of model performance contains five metrics: Accuracy (ACC), Matthews Correlation Coefficient (MCC) [36], Precision (PRE), Geometric mean (Gmean), Recall (RECALL) and Kappa Coefficient (KAPPA) [37].
{ACC=165∑i=0TPi+TNiTPi+TNi+FPi+FNiMCC=165∑i=0TPi×TNi−FPi×FNi(TPi+FPi)(TPi++FNi)(TNi+FPi)(TNi+FNi)PRE=165∑i=0TPiTPi+FPiREC=165∑i=0TPiTPi+FNiGmean=165∑i=0√TPi×TNi(TPi+FPi)(TPi+FNi)KAPPA=p0−pe1−pe. | (4) |
In this context, TP (True Positive) represents the true positives, indicating the number of times the model predicted the positive class correctly; TN (True Negative) represents the true negatives, indicating the number of times the model predicted the negative class correctly; FP (False Positive) represents the false positives, indicating the number of times the model predicted the negative class as positive; FN (False Negative) represents the false negatives, indicating the number of times the model predicted the positive class as negative. The kappa coefficient is a measure of agreement between a classifier and human classification. It compares the observed classification accuracy with the chance agreement. po is the observed classification accuracy, and pe is the expected classification accuracy by chance. They can be expressed as follows:
po=TPi+TNiTPi+TNi+FPi+FNi, | (5) |
pe=5∑i=0(TPi+FNi)(TPi+FPi)+(FNi+TNi)(FPi+TNi)(TPi+TNi+FPi+FNi)2, | (6) |
where TPi, TNi, FPi, FNi (i = 0, 1, 2..., 5) are TP, TN, FP and FN for each subset, respectively.
During model training, we performed 10-fold cross-validation on the training set to optimize the parameters. In this, 10-fold cross-validation is performed on the training dataset. The original dataset is first divided into training and test sets. Then, we further divide the training set into ten equal-sized subsets for cross-validation. In each cross-validation, one subset is used as the validation set and the remaining nine subsets are used to train the model, ensuring that each subset has acted as a validation set.
Selecting an effective feature selection method improves the performance of predictive models and obtains better explanatory power. For this purpose, a comprehensive comparison of various feature selection methods was performed and recursive feature elimination (RFE) [38], elastic network (ENET) [39], and a combination of logistic regression with chi-square test based on L1 regularity were considered.
During the comparison process, we observed that the combined methods exhibit promising performance in the feature selection task. The results presented in Table 2 enable us to clearly compare the variations in performance among these methods for the prediction task. In the fourth row of Table 2, "99.22 ± 0.004" indicates that in the 10-fold cross-validation, the value of ACC is 99.22 and the variance is 0.004.
Method | ACC (%) | PRE (%) | RECALL (%) | MCC (%) | Gmean (%) | KAPPA (%) |
RFE | 98.36 ± 0.010 | 97.95 ± 0.013 | 97.74 ± 0.014 | 98.01 ± 0.012 | 98.01 ± 0.012 | 99.03 ± 0.006 |
ENET | 97.97 ± 0.009 | 97.07 ± 0.010 | 96.18 ± 0.016 | 96.81 ± 0.011 | 96.79 ± 0.012 | 98.40 ± 0.006 |
Ours | 99.22 ± 0.004 | 99.24 ± 0.004 | 99.23 ± 0.004 | 99.07 ± 0.005 | 99.60 ± 0.002 | 99.07 ± 0.005 |
Building upon these findings, the chi-square test proves to be highly valuable in categorization problems as it allows us to identify features that are significantly associated with the target variable, potentially related to cancer in this study. As for the logistic regression method based on L1 regularization, it induces sparsity in the feature coefficients by applying L1 regularization, and this sparsity helps to filter out irrelevant or redundant features, thus improving the generalization ability of the model. Consequently, we opted for the combination method to screen features and have continued utilizing this approach in subsequent studies.
Afterwards, we conducted a 10-fold cross-validation of all classification methods. The statistical results are shown in Tables 3 and 4. The performance of the stacking ensemble learning method was found to be superior to those of the base learners while also exhibiting good stability. Due to the instability of independent testing results at each training, we took the average of 10 results in the experiment. The data achieved very good results on the stacking ensemble learning model, with over 99% in all criteria except RECALL, as shown in Table 4 and Figure 3(b).
Method | ACC (%) | PRE (%) | RECALL (%) | MCC (%) | Gmean (%) | KAPPA (%) |
MLP | 97.52 ± 0.010 | 96.85 ± 0.014 | 97.01 ± 0.013 | 96.99 ± 0.013 | 98.54 ± 0.006 | 96.98 ± 0.013 |
Bagging | 97.62 ± 0.015 | 96.81 ± 0.020 | 97.48 ± 0.017 | 97.13 ± 0.018 | 98.63 ± 0.009 | 97.10 ± 0.018 |
LR | 98.01 ± 0.011 | 97.37 ± 0.015 | 97.78 ± 0.019 | 97.60 ± 0.013 | 98.85 ± 0.006 | 97.59 ± 0.013 |
SVM | 97.87 ± 0.011 | 97.17 ± 0.016 | 97.67 ± 0.012 | 97.42 ± 0.014 | 98.76 ± 0.006 | 97.41 ± 0.014 |
RF | 97.77 ± 0.012 | 97.03 ± 0.016 | 97.28 ± 0.014 | 97.30 ± 0.014 | 98.70 ± 0.007 | 97.28 ± 0.014 |
SGD | 97.42 ± 0.013 | 96.87 ± 0.014 | 96.93 ± 0.021 | 96.91 ± 0.015 | 98.49 ± 0.008 | 96.86 ± 0.016 |
LGBM | 98.11 ± 0.013 | 97.56 ± 0.017 | 97.62 ± 0.015 | 97.71 ± 0.016 | 98.89 ± 0.008 | 97.70 ± 0.016 |
Stacking | 99.22 ± 0.004 | 99.24 ± 0.004 | 99.23 ± 0.004 | 99.07 ± 0.005 | 99.60 ± 0.002 | 99.07 ± 0.005 |
Method | ACC (%) | PRE (%) | RECALL (%) | MCC (%) | Gmean (%) | KAPPA (%) |
MLP | 98.22 ± 0.003 | 97.20 ± 0.009 | 97.25 ± 0.006 | 97.82 ± 0.004 | 98.94 ± 0.002 | 97.82 ± 0.004 |
Bagging | 98.58 ± 0.009 | 96.89 ± 0.004 | 97.62 ± 0.001 | 98.26 ± 0.002 | 99.20 ± 0.001 | 98.25 ± 0.002 |
LR | 98.22 ± 0.000 | 96.73 ± 0.000 | 97.32 ± 0.000 | 97.83 ± 0.000 | 98.97 ± 0.000 | 97.82 ± 0.000 |
SVM | 98.67 ± 0.000 | 97.09 ± 0.000 | 97.69 ± 0.000 | 98.37 ± 0.000 | 99.25 ± 0.000 | 98.36 ± 0.000 |
RF | 98.36 ± 0.005 | 96.98 ± 0.011 | 97.35 ± 0.004 | 97.99 ± 0.005 | 99.05 ± 0.002 | 97.98 ± 0.006 |
SGD | 97.73 ± 0.008 | 97.10 ± 0.016 | 96.22 ± 0.012 | 97.24 ± 0.009 | 98.63 ± 0.005 | 97.22 ± 0.010 |
LGBM | 99.11 ± 0.000 | 98.26 ± 0.000 | 98.07 ± 0.000 | 98.91 ± 0.000 | 99.48 ± 0.000 | 98.91 ± 0.000 |
Stacking | 99.29 ± 0.004 | 99.21 ± 0.007 | 98.20 ± 0.007 | 99.13 ± 0.005 | 99.56 ± 0.002 | 99.13 ± 0.005 |
The confusion matrix [40] is shown in Figure 4, and it can be seen that the model performs well in distinguishing between the five types of cancer and normal tissue. It can also be observed that out of all the samples in the independent testing data, only two were misclassified, where one normal tissue sample was incorrectly predicted as a PRAD sample, and another COAD sample was incorrectly predicted as a BLCA sample.
This result is satisfactory, which indicates that the model has high accuracy and reliability and can be used in clinical practice. At the same time, although the misclassification rate is low, we still need to continue to optimize and improve the model to improve its accuracy and applicability in future research.
To validate the generalizability of the proposed model, we utilized the dataset and neural network employed by Lin et al. [19] to assess the performance of our classifier in this study. By comparing the performance of our model with their dataset and methods, our model consistently outperforms theirs, as demonstrated in Tables 5 and 6. These results indicate that our model excels not only on the original dataset but can also be successfully applied to other datasets with a degree of generality and replicability. This, in turn, enhances the reliability and stability of the model for practical medical applications.
Method | ACC (%) | PRE (%) | RECALL (%) | MCC (%) | Gmean (%) | KAPPA (%) |
ICancer-Pred | 83.56 ± 0.199 | 78.49 ± 0.229 | 81.81 ± 0.179 | 82.50 ± 0.194 | 80.16 ± 0.236 | 89.43 ± 0.135 |
Stacking | 99.29 ± 0.004 | 99.29 ± 0.007 | 98.20 ± 0.007 | 99.13 ± 0.005 | 99.56 ± 0.002 | 99.13 ± 0.005 |
Method | ACC (%) | PRE (%) | RECALL (%) | MCC (%) | Gmean (%) | KAPPA (%) |
ICancer-Pred | 97.27 ± 0.006 | 97.37 ± 0.005 | 96.99 ± 0.007 | 96.82 ± 0.007 | ━ | 96.81 ± 0.007 |
Stacking | 98.22 ± 0.006 | 98.18 ± 0.007 | 97.96 ± 0.009 | 97.93 ± 0.007 | 98.35 ± 0.002 | 97.92 ± 0.007 |
To achieve interpretable predictions and gain insights into feature contributions, we utilized the local interpretable model-agnostic explanations (LIME) [41] model. In this study, we used 9511 CpG sites as features for predicting cancer types. Figure 5 shows the LIME prediction results of a sample by using the stacking integrated learning model, which screens the top 6 predictive biomarkers most helpful in classifying normal tissue, bladder urothelial carcinoma (BLCA), colon adenocarcinoma (COAD), liver hepatocellular carcinoma (LIHC), lung adenocarcinoma (LUAD) and prostate adenocarcinoma (PRAD). The prediction probability table in the top-left corner of Figure 5 shows the model's probability of predicting a given sample as one of these types of cancer. In this case, LIME assigns a feature weight 0.20 for cg11055493 feature values less than or equal to 0.39 (cg11055493 ≤ 0.39).
Additionally, we demonstrated the feature weights of other predicted features. We detailed each feature's values and color codes in the "Feature-Value" table, which specifies whether a given feature contributes to the prediction. Specifically, normal tissue is displayed in blue, bladder urothelial carcinoma (BLCA) is color coded in orange, colon adenocarcinoma (COAD) is color-coded in green, liver hepatocellular carcinoma (LIHC) is color coded in purple, lung adenocarcinoma (LUAD) is color coded in red and prostate adenocarcinoma (PRAD) is color coded in brown.
We obtained 9511 CpG loci by screening and annotating them into genes and finally obtained 8087 genes. By comparing these 8087 genes with published CpG biomarkers, we found that Ding et al.'s study [42] included data for the five types of cancer used in our study, as well as 3000 CpG biomarker genes. At the same time, 863 genes overlapped between the two studies, as shown in Figure 6.
For the list of overlapping genes, we performed pathway and process enrichment analysis by using multiple ontology sources, including GO Biological Processes, GO Cellular Components, GO Molecular Functions and KEGG Pathway [43]. A series of criteria were applied to screen for biologically significant enrichment terms, including p values less than 0.01, minimum counts of 3 and enrichment factors greater than 1.5 (enrichment factor refers to the ratio between observed counts and randomly expected counts). Based on their membership similarity, we grouped the enriched terms into clusters and used Kappa scores as a similarity measure in the hierarchical clustering process, treating subtrees with a similarity greater than 0.3 as a cluster. Finally, in each cluster, the most significant enriched terms in terms of the above metrics (p-values, counts, etc.) were selected to represent its clusters. For example, for GO:0016570 we filtered the following metrics: count = 34, Log10(P) = -8.09, Log10(q) = -4.16. "Count" is the number of genes in the user-provided lists with membership in the given ontology term. "Log10(P)" is the p-value in log base 10. "Log10(q)" is the multi-test adjusted p-value in log base 10. The results of these enrichment analyses can help us to deeply understand the functions of these overlapping genes in different biological processes and pathways, providing important clues for further studies.
As shown in Figure 7, gene ontology analysis shows that overlapping genes are present in biological processes of histone modification (GO:0016570), DNA metabolic process (GO:0006259), neuromuscular process (GO:0050905), embryo development ending in birth or egg hatching (GO:0009792), localization within membrane (GO:0051668), brain development (GO:0007420), modulation of chemical synaptic transmission (GO:0050804), regulation of cell cycle process (GO:0010564), developmental maturation (GO:0021700) and other related genes; molecular functions exist in transcription coregulator activity (GO:0003712), molecular adaptor activity (GO:0060090), protein domain specific binding (GO:0019904) and transcription factor binding (GO:0008134); among cellular components, there is extrinsic component of membrane (GO:0019898), dendrite (GO:0030425), cell projection membrane (GO:0031253), perinuclear region of cytoplasm (GO:0048471) and transporter complex (GO:1990351) were enriched. In addition, growth hormone synthesis (hsa04935) secretion and action (hsa04935) and the MAPK signaling pathway (hsa04010) were identified in the KEGG pathway.
In this study, the STRING database was employed to search for potential interactions among encoded proteins and investigate their potential interactions. Through this step, we obtained a representation of the protein‒protein interaction network, as shown in Figure 8. This network describes the relationships between genes and proteins, such as physical contacts and targeted regulation. Our goal was to elucidate the meaningful molecular regulatory networks in living organisms.
Subsequently, the "cytoHubba" plugin in Cytoscape [44] software calculated the node scores of genes in the PPI network and identified the top 10 key genes: RPS2, RPL23A, RPL3L, RPL22, RPL27, EEF2, PIK3CA, SRC, KRAS and CREBBP (as shown in Figure 9).
To investigate the effect of these genes on the survival of cancer patients, we performed a survival analysis of 10 potential biomarkers screened from the PPI network and used the TIMER database [45] to further draw Kaplan‒Meier survival curves (Figure 10). The Kaplan-Meier survival curve was first proposed by Kaplan and Meier in 1958 [46]. It is a non-parametric method used for analyzing survival data, capable of estimating survival probabilities at different time points and visualizing the changes in survival curves. In the field of cancer research, the Kaplan-Meier survival curve is widely employed for analyzing patients' survival data [47,48]. Recent studies have shown that this method remains highly effective in predicting patient survival rates. For instance, Hamid Bakhtiari et al. [49] utilized this method to predict the survival rates of hypertensive patients with COVID-19. According to statistical significance, a gene was considered to be significantly associated with cancer survival when P < 0.05. Our analysis revealed that the SRC gene is significantly associated with the prognosis of bladder urothelial carcinoma, liver hepatocellular carcinoma and prostate adenocarcinoma. Additionally, the RPS2, RPL23A, RPL22, RPL27 and SRC genes are significantly associated with the prognosis of liver hepatocellular carcinoma. Furthermore, we found that the KRAS gene is significantly associated with the prognosis of lung adenocarcinoma. These results suggest that these genes have potential prognostic value. However, further clinical validation of these potential biomarkers is needed before they can be used.
After conducting a comprehensive review of the literature, we found that many of the critical genes associated with cancer survival identified in this study have been previously reported in the literature. For example, Sree Karani Kondapuram et al. [50] identified SRC as a central autophagy gene associated with LIHC survival, making it a potential drug target. Kowalczyk et al. [51] discovered that RPS2 is overexpressed in mouse liver hepatocellular carcinoma samples and may impact the accuracy of mRNA translation related to aminoacyl-tRNA binding ribosomes, thus promoting cell proliferation. Additionally, Pan et al. [52] identified SRC as a potential therapeutic target for docetaxel-resistant prostate adenocarcinoma and an effective prognostic indicator that was significantly correlated with the immune score, ferroptosis, methylation and OCLR score. Wang et al. [53] found that high expression of ribosome-related genes RPL23A and RPL27 significantly reduced the survival rate of patients with liver hepatocellular carcinoma. Moreover, Xu et al. [54] identified SRC as a prognostic gene for BLCA through multivariate Cox regression analysis [55]. In lung cancer, KRAS gene mutation is most common in patients with lung adenocarcinoma, and approximately 33% of patients will have this mutation [56].
In summary, the comprehensive literature review confirms the importance of the key genes identified in this study in cancer survival. Our feature selection method has proven to be effective in extracting potential biomarkers. Furthermore, these findings provide additional evidence supporting the potential clinical relevance of our model and the importance of integrating machine learning methods into cancer research. Further research is needed to validate these findings and explore the underlying mechanisms of these genes in cancer development and progression.
In this paper, we present a novel model for predicting the types of five different cancers and their corresponding normal tissues in the DNA methylation 450K dataset. Our proposed model includes a stacked ensemble learning approach combined with a feature selection method based on a chi-square test and logistic regression with L1 regularization. This framework effectively addresses the challenges posed by the high-dimensional nature of the data. Specifically, we utilize a chi-square test for feature selection, followed by logistic regression with L1 regularization as the estimator for SelectFromModel to create an optimized feature set. These selected features are then employed in our stacking ensemble learning model for prediction. Additionally, we have taken steps to mitigate the issue of an unbalanced sample distribution between cancer samples and normal tissues by applying SMOTETomek integrated sampling to the training set.
Compared to existing methods, our proposed stacking ensemble learning model consistently performs better in classifying different cancer types. Our study establishes a robust multiclass predictor capable of identifying a patient's cancer type. Furthermore, we have conducted survival analysis on essential genes to identify potential biomarkers associated with cancer survival, and we have performed comprehensive GO and KEGG pathway analyses to underscore the biological relevance of our findings. In conclusion, our model has great potential in the field of cancer diagnosis and treatment, highlighting the value of combining machine learning methods with DNA methylation data analysis. However, despite conducting relevant bioinformatics analyses, our study still has limitations and requires further validation and testing on a broader cancer dataset. In order to enhance the robustness of our approach, we plan to explore and integrate other types of cancer and related multi-omics data in future research efforts.
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
This work was supported by grants from the National Natural Science Foundation of China (No. 62162032, 62062043, 32270789), the Scientific Research Plan of the Department of Education of Jiangxi Province, China (GJJ2201004, GJJ2201038).
The authors declare there is no conflict of interest.
[1] |
Heravi G, Salehi MM, Rostami M (2020) Identifying cost-optimal options for a typical residential nearly zero energy building's design in developing countries. Clean Technol Environ Policy 22: 2107-2128. doi: 10.1007/s10098-020-01962-4
![]() |
[2] |
Hansen K, Breyer C, Lund H (2019) Status and perspectives on 100% renewable energy systems. Energy 175: 471-480. doi: 10.1016/j.energy.2019.03.092
![]() |
[3] |
Houri Jafari H, Vakili A, Eshraghi H, et al. (2016) Energy planning and policy making; The case study of Iran. Energy Sources, Part B: Econ, Plann, Policy 11: 682-689. doi: 10.1080/15567249.2012.741186
![]() |
[4] |
Gabbar HA, Eldessouky A, Runge J (2016) Evaluation of renewable energy deployment scenarios for building energy management. AIMS Energy 4: 742-761. doi: 10.3934/energy.2016.5.742
![]() |
[5] |
Saeed T, Tularam GA (2017) Relations between fossil fuel returns and climate change variables using canonical correlation analysis. Energy Sources, Part B: Econ, Plann, Policy 12: 675-684. doi: 10.1080/15567249.2016.1265615
![]() |
[6] |
Kim SB, Cho JH (2014) A study on forecasting green infrastructure construction market. KSCE J Civ Eng 18: 430-443. doi: 10.1007/s12205-014-0189-8
![]() |
[7] |
Hailu AD, Kumsa DK (2021) Ethiopia renewable energy potentials and current state. AIMS Energy 9: 1-14. doi: 10.3934/energy.2021001
![]() |
[8] |
Kahia M, Jebli MB, Belloumi M (2019) Analysis of the impact of renewable energy consumption and economic growth on carbon dioxide emissions in 12 MENA countries. Clean Technol Environ Policy 21: 871-885. doi: 10.1007/s10098-019-01676-2
![]() |
[9] | Alizadeh R, Maknoon R, Majidpour M (2014) Clean Development Mechanism, a bridge to mitigate the Greenhouse Gasses: is it broken in Iran. 13th International Conference on Clean Energy (ICCE): 399-404. |
[10] | Asutosh AT, Woo J, Kouhirostami M, et al. (2020) Renewable energy industry trends and its contributions to the development of energy resilience in an era of accelerating climate change. Int J Energy Power Eng 14: 233-240. |
[11] |
Brockway PE, Owen A, Brand-Correa LI, et al. (2019) Estimation of global final-stage energy-return-on-investment for fossil fuels with comparison to renewable energy sources. Nat Energy 4: 612-621. doi: 10.1038/s41560-019-0425-z
![]() |
[12] |
Suárez-Eiroa B, Fernández E, Méndez-Martínez G, et al. (2019) Operational principles of circular economy for sustainable development: Linking theory and practice. J Cleaner Prod 214: 952-961. doi: 10.1016/j.jclepro.2018.12.271
![]() |
[13] |
Kirchherr J, Piscicelli L, Bour R, et al. (2018) Barriers to the circular economy: evidence from the European Union (EU). Ecol Econ 150: 264-272. doi: 10.1016/j.ecolecon.2018.04.028
![]() |
[14] |
Okafor C, Madu C, Ajaero C, et al. (2021) Moving beyond fossil fuel in an oil-exporting and emerging economy: Paradigm shift. AIMS Energy 9: 379-413. doi: 10.3934/energy.2021020
![]() |
[15] | International Energy Agency. Data and statistics, 2018. Available from: https://www.iea.org/data-and-statistics/. |
[16] |
Falcone PM, Morone P, Sica E (2018) Greening of the financial system and fuelling a sustainability transition: A discursive approach to assess landscape pressures on the Italian financial system. Technol Forecast Soc Change 127: 23-37. doi: 10.1016/j.techfore.2017.05.020
![]() |
[17] |
Gielen D, Boshell F, Saygin D, et al. (2019) The role of renewable energy in the global energy transformation. Energy Strategy Rev 24: 38-50. doi: 10.1016/j.esr.2019.01.006
![]() |
[18] |
Fadly D (2019) Low-carbon transition: Private sector investment in renewable energy projects in developing countries. World Dev 122: 552-569. doi: 10.1016/j.worlddev.2019.06.015
![]() |
[19] |
Moriarty P, Honnery D (2018) Energy policy and economics under climate change. AIMS Energy 6: 272-290. doi: 10.3934/energy.2018.2.272
![]() |
[20] |
Yang X, He L, Xia Y, et al. (2019) Effect of government subsidies on renewable energy investments: The threshold effect. Energy Policy 132: 156-166. doi: 10.1016/j.enpol.2019.05.039
![]() |
[21] |
Ndiritu SW, Engola MK (2020) The effectiveness of feed-in-tariff policy in promoting power generation from renewable energy in Kenya. Renewable Energy 161: 593-605. doi: 10.1016/j.renene.2020.07.082
![]() |
[22] | Erfani A, Tavakolan M (2020) Risk evaluation model of wind energy investment projects using modified fuzzy group decision-making and monte carlo simulation. Arthaniti: J Econ Theory Pract, 0976747920963222. |
[23] |
Echeverri-Martínez R, Alfonso-Morales W, Caicedo-Bravo EF (2020) A methodological Decision-Making support for the planning, design and operation of smart grid projects. AIMS Energy 8: 627-651. doi: 10.3934/energy.2020.4.627
![]() |
[24] |
Lee HC, Chang CT (2018). Comparative analysis of MCDM methods for ranking renewable energy sources in Taiwan. Renewable Sustainable Energy Rev 92: 883-896. doi: 10.1016/j.rser.2018.05.007
![]() |
[25] | Monzer N, Fayek AR, Lourenzutti R, et al. (2019) Aggregation-based framework for construction risk assessment with heterogeneous groups of experts. J Constr Eng Manage, 145. |
[26] |
Perez IJ, Cabrerizo FJ, Alonso S, et al. (2013) A new consensus model for group decision making problems with non-homogeneous experts. IEEE Trans Syst, Man, Cybern: Syst 44: 494-498. doi: 10.1109/TSMC.2013.2259155
![]() |
[27] | Zheng Y (2018) Identifying Dominators and Followers in Group Decision Making based on The Personality Traits. IUI Workshops. Available from: http://ceur-ws.org/Vol-2068/humanize3.pdf. |
[28] |
Morente-Molinera JA, Wu X, Morfeq A, et al. (2020). A novel multi-criteria group decision-making method for heterogeneous and dynamic contexts using multi-granular fuzzy linguistic modelling and consensus measures. Inf Fusion 53: 240-250. doi: 10.1016/j.inffus.2019.06.028
![]() |
[29] |
Alizadeh R, Soltanisehat L, Lund PD, et al. (2020). Improving renewable energy policy planning and decision-making through a hybrid MCDM method. Energy Policy 137: 111-174. doi: 10.1016/j.enpol.2019.111174
![]() |
[30] |
Gündoğdu FK, Kahraman C (2020) A novel spherical fuzzy analytic hierarchy process and its renewable energy application. Soft Comput 24: 4607-4621. doi: 10.1007/s00500-019-04222-w
![]() |
[31] |
Zarnegar M (2018) Renewable energy utilization in Iran. Energy Sources, Part A: Recovery, Util, Environ Eff 40: 765-771. doi: 10.1080/15567036.2018.1457741
![]() |
[32] |
Thang VV, Ha T (2019) Optimal siting and sizing of renewable sources in distribution system planning based on life cycle cost and considering uncertainties. AIMS Energy 7: 211-226. doi: 10.3934/energy.2019.2.211
![]() |
[33] | Alizadeh R, Beiragh RG, Soltanisehat L, et al. (2020) Performance evaluation of complex electricity generation systems: A dynamic network-based data envelopment analysis approach. Energy Econ, 91. |
[34] | Alizadeh R, Lund PD, Soltanisehat L (2020) Outlook on biofuels in future studies: A systematic literature review. Renewable Sustainable Energy Rev, 134. |
[35] |
Alizadeh R, Allen JK, Mistree F (2020) Managing computational complexity using surrogate models: a critical review. Res Eng Des 31: 275-298. doi: 10.1007/s00163-020-00336-7
![]() |
[36] | Bou-Rabee M, Sulaiman SA, Saleh MS, et al. (2017) Using artificial neural networks to estimate solar radiation in Kuwait. Renewable Sustainable Energy Rev 72: 434-438. |
[37] |
Teimourian A, Bahrami A, Teimourian H, et al. (2020) Assessment of wind energy potential in the southeastern province of Iran. Energy Sources, Part A: Recovery, Util, Environ Eff 42: 329-343. doi: 10.1080/15567036.2019.1587079
![]() |
[38] | Çakmakçı BA, Hüner E (2020) Evaluation of wind energy potential: a case study. Energy Sources, Part A: Recovery, Util, Environ Eff, 1-19. |
[39] |
Azizkhani M, Vakili A, Noorollahi Y, et al. (2017) Potential survey of photovoltaic power plants using Analytical Hierarchy Process (AHP) method in Iran. Renewable Sustainable Energy Rev 75: 1198-1206. doi: 10.1016/j.rser.2016.11.103
![]() |
[40] |
Díaz-Cuevas P, Domínguez-Bravo J, Prieto-Campos A (2019) Integrating MCDM and GIS for renewable energy spatial models: assessing the individual and combined potential for wind, solar and biomass energy in Southern Spain. Clean Technol Environ Policy 21: 1855-1869. doi: 10.1007/s10098-019-01754-5
![]() |
[41] |
Yun TS, Lee JS, Lee SC, et al. (2011) Geotechnical issues related to renewable energy. KSCE J Civ Eng 15: 635-642. doi: 10.1007/s12205-011-0004-8
![]() |
[42] |
Liu X, Xu Y, Herrera F (2019). Consensus model for large-scale group decision making based on fuzzy preference relation with self-confidence: Detecting and managing overconfidence behaviors. Inf Fusion 52: 245-256. doi: 10.1016/j.inffus.2019.03.001
![]() |
[43] |
Heravi G, Fathi M, Faeghi S (2017) Multi-criteria group decision-making method for optimal selection of sustainable industrial building options focused on petrochemical projects. J Cleaner Prod 142: 2999-3013. doi: 10.1016/j.jclepro.2016.10.168
![]() |
[44] |
Dranka GG, Cunha J, de Lima JD, et al. (2020) Economic evaluation methodologies for renewable energy projects. AIMS Energy 8: 339-364. doi: 10.3934/energy.2020.2.339
![]() |
[45] |
Boran FE, Boran K, Dizdar E (2012) A fuzzy multi criteria decision making to evaluate energy policy based on an information axiom: a case study in Turkey. Energy Sources, Part B: Econ, Plann, Policy 7: 230-240. doi: 10.1080/15567240902839294
![]() |
[46] |
Celiktas MS, Kocar G (2010) From potential forecast to foresight of Turkey's renewable energy with Delphi approach. Energy 35: 1973-1980. doi: 10.1016/j.energy.2010.01.012
![]() |
[47] | Kul C, Zhang L, Solangi YA (2020) Assessing the Renewable Energy Investment Risk Factors for Sustainable Development in Turkey. J Cleaner Prod, 124164. |
[48] | Hsu CC, Sandford BA (2007) The Delphi technique: making sense of consensus. Pract Assess, Res, Eval 12: 10. |
[49] |
Kabak M, Dağdeviren M, Burmaoğlu S (2016) A hybrid SWOT-FANP model for energy policy making in Turkey. Energy Sources, Part B: Econ, Plann, Policy 11: 487-495. doi: 10.1080/15567249.2012.673692
![]() |
[50] |
Aslani A, Naaranoja M, Zakeri B (2012) The prime criteria for private sector participation in renewable energy investment in the Middle East (case study: Iran). Renewable Sustainable Energy Rev 16: 1977-1987. doi: 10.1016/j.rser.2011.12.015
![]() |
[51] | Aslani A, Feng B (2014) Investment prioritization in renewable energy resources with consideration to the investment criteria in Iran. Distrib Gener Altern Energy J 29: 7-26. |
[52] | Erfani A, Tavakolan M (2019) Challenges in renewable energy investment projects in Iran: A review of the main Criteria and Risks. 3rd International Conference on Applied Researches in Structural Engineering and Construction Management, 1-10. |
[53] |
Boran FE (2018) A new approach for evaluation of renewable energy resources: A case of Turkey. Energy Sources, Part B: Econ, Plann, Policy 13: 196-204. doi: 10.1080/15567249.2017.1423414
![]() |
[54] |
Ozorhon B, Batmaz A, Caglayan S (2018) Generating a framework to facilitate decision making in renewable energy investments. Renewable Sustainable Energy Rev 95: 217-226. doi: 10.1016/j.rser.2018.07.035
![]() |
[55] |
Cavallaro F, Ciraolo L (2005) A multicriteria approach to evaluate wind energy plants on an Italian island. Energy Policy 33: 235-244. doi: 10.1016/S0301-4215(03)00228-3
![]() |
[56] |
Wang JJ, Jing YY, Zhang CF, et al. (2009) Review on multi-criteria decision analysis aid in sustainable energy decision-making. Renewable Sustainable Energy Rev 13: 2263-2278. doi: 10.1016/j.rser.2009.06.021
![]() |
[57] |
Kaya T, Kahraman C (2010) Multicriteria renewable energy planning using an integrated fuzzy VIKOR & AHP methodology: The case of Istanbul. Energy 35: 2517-2527. doi: 10.1016/j.energy.2010.02.051
![]() |
[58] |
Şengül Ü, Eren M, Shiraz SE, et al. (2015) Fuzzy TOPSIS method for ranking renewable energy supply systems in Turkey. Renewable Energy 75: 617-625. doi: 10.1016/j.renene.2014.10.045
![]() |
[59] |
Strantzali E, Aravossis K (2016) Decision making in renewable energy investments: A review. Renewable Sustainable Energy Rev 55: 885-898. doi: 10.1016/j.rser.2015.11.021
![]() |
[60] |
Al Garni H, Kassem A, Awasthi A, et al. (2016) A multicriteria decision making approach for evaluating renewable power generation sources in Saudi Arabia. Sustainable Energy Technol Assess 16: 137-150. doi: 10.1016/j.seta.2016.05.006
![]() |
[61] |
Balin A, Baraçli H (2017) A fuzzy multi-criteria decision-making methodology based upon the interval type-2 fuzzy sets for evaluating renewable energy alternatives in Turkey. Technol Econ Dev Econ 23: 742-763. doi: 10.3846/20294913.2015.1056276
![]() |
[62] |
Çolak M, Kaya İ (2017) Prioritization of renewable energy alternatives by using an integrated fuzzy MCDM model: A real case application for Turkey. Renewable Sustainable Energy Rev 80: 840-853. doi: 10.1016/j.rser.2017.05.194
![]() |
[63] |
Wu Y, Li L, Song Z, et al. (2019) Risk assessment on offshore photovoltaic power generation projects in China based on a fuzzy analysis framework. J Cleaner Prod 215: 46-62. doi: 10.1016/j.jclepro.2019.01.024
![]() |
[64] |
Solangi YA, Tan Q, Mirjat NH, et al. (2019) An integrated Delphi-AHP and fuzzy TOPSIS approach toward ranking and selection of renewable energy resources in Pakistan. Processes 7: 118. doi: 10.3390/pr7020118
![]() |
[65] | Karakaş E (2019) Evaluation of renewable energy alternatives for turkey via modified fuzzy ahp. Available from: http://zbw.eu/econis-archiv/bitstream/11159/3155/1/1667513583.pdf. |
[66] |
Peng HG, Shen KW, He SS, et al. (2019) Investment risk evaluation for new energy resources: An integrated decision support model based on regret theory and ELECTRE III. Energy Convers Manage 183: 332-348. doi: 10.1016/j.enconman.2019.01.015
![]() |
[67] | Saaty TL (2008) Decision making with the analytic hierarchy process. Int J Serv Sci 1: 83-98. |
[68] |
Kamaruzzaman SN, Lou ECW, Wong PF, et al. (2018) Developing weighting system for refurbishment building assessment scheme in Malaysia through analytic hierarchy process (AHP) approach. Energy Policy 112: 280-290. doi: 10.1016/j.enpol.2017.10.023
![]() |
[69] | Saaty TL (1977) A scaling method for priorities in hierarchical structures. J Math Psychol 15: 234-281. |
[70] | Saaty TL (1990) Multicriteria decision making: the analytic hierarchy process: planning, priority setting resource allocation. |
[71] |
Katal F, Fazelpour F (2018) Multi-criteria evaluation and priority analysis of different types of existing power plants in Iran: An optimized energy planning system. Renewable Energy 120: 163-177. doi: 10.1016/j.renene.2017.12.061
![]() |
[72] |
Erdogan SA, Šaparauskas J, Turskis Z (2019) A multi-criteria decision-making model to choose the best option for sustainable construction management. Sustainability 11: 2239. doi: 10.3390/su11082239
![]() |
[73] | Feili H, Ahmadian P, Rabiei E, et al. (2018) Ranking of suitable renewable energy location using AHP method and scoring systems with sustainable development perspective. 6th International Conference on Economics, Management, Engineering Science and Art, 1-8. |
[74] | OPEC, Share of world crude oil reserves (2018) Available from: https://www.opec.org/opec_web/en/data_graphs/330.htm. |
[75] |
Mollahosseini A, Hosseini SA, Jabbari M, et al. (2017) Renewable energy management and market in Iran: A holistic review on current state and future demands. Renewable Sustainable Energy Rev 80: 774-788. doi: 10.1016/j.rser.2017.05.236
![]() |
Cancer tumor | Number of samples | Training (≈90%) | Testing (≈10%) |
BLCA | 434 | 390 | 44 |
COAD | 337 | 303 | 34 |
LIHC | 429 | 386 | 43 |
LUAD | 492 | 443 | 49 |
PRAD | 549 | 494 | 55 |
Total | 2241 | 2016 | 225 |
Method | ACC (%) | PRE (%) | RECALL (%) | MCC (%) | Gmean (%) | KAPPA (%) |
RFE | 98.36 ± 0.010 | 97.95 ± 0.013 | 97.74 ± 0.014 | 98.01 ± 0.012 | 98.01 ± 0.012 | 99.03 ± 0.006 |
ENET | 97.97 ± 0.009 | 97.07 ± 0.010 | 96.18 ± 0.016 | 96.81 ± 0.011 | 96.79 ± 0.012 | 98.40 ± 0.006 |
Ours | 99.22 ± 0.004 | 99.24 ± 0.004 | 99.23 ± 0.004 | 99.07 ± 0.005 | 99.60 ± 0.002 | 99.07 ± 0.005 |
Method | ACC (%) | PRE (%) | RECALL (%) | MCC (%) | Gmean (%) | KAPPA (%) |
MLP | 97.52 ± 0.010 | 96.85 ± 0.014 | 97.01 ± 0.013 | 96.99 ± 0.013 | 98.54 ± 0.006 | 96.98 ± 0.013 |
Bagging | 97.62 ± 0.015 | 96.81 ± 0.020 | 97.48 ± 0.017 | 97.13 ± 0.018 | 98.63 ± 0.009 | 97.10 ± 0.018 |
LR | 98.01 ± 0.011 | 97.37 ± 0.015 | 97.78 ± 0.019 | 97.60 ± 0.013 | 98.85 ± 0.006 | 97.59 ± 0.013 |
SVM | 97.87 ± 0.011 | 97.17 ± 0.016 | 97.67 ± 0.012 | 97.42 ± 0.014 | 98.76 ± 0.006 | 97.41 ± 0.014 |
RF | 97.77 ± 0.012 | 97.03 ± 0.016 | 97.28 ± 0.014 | 97.30 ± 0.014 | 98.70 ± 0.007 | 97.28 ± 0.014 |
SGD | 97.42 ± 0.013 | 96.87 ± 0.014 | 96.93 ± 0.021 | 96.91 ± 0.015 | 98.49 ± 0.008 | 96.86 ± 0.016 |
LGBM | 98.11 ± 0.013 | 97.56 ± 0.017 | 97.62 ± 0.015 | 97.71 ± 0.016 | 98.89 ± 0.008 | 97.70 ± 0.016 |
Stacking | 99.22 ± 0.004 | 99.24 ± 0.004 | 99.23 ± 0.004 | 99.07 ± 0.005 | 99.60 ± 0.002 | 99.07 ± 0.005 |
Method | ACC (%) | PRE (%) | RECALL (%) | MCC (%) | Gmean (%) | KAPPA (%) |
MLP | 98.22 ± 0.003 | 97.20 ± 0.009 | 97.25 ± 0.006 | 97.82 ± 0.004 | 98.94 ± 0.002 | 97.82 ± 0.004 |
Bagging | 98.58 ± 0.009 | 96.89 ± 0.004 | 97.62 ± 0.001 | 98.26 ± 0.002 | 99.20 ± 0.001 | 98.25 ± 0.002 |
LR | 98.22 ± 0.000 | 96.73 ± 0.000 | 97.32 ± 0.000 | 97.83 ± 0.000 | 98.97 ± 0.000 | 97.82 ± 0.000 |
SVM | 98.67 ± 0.000 | 97.09 ± 0.000 | 97.69 ± 0.000 | 98.37 ± 0.000 | 99.25 ± 0.000 | 98.36 ± 0.000 |
RF | 98.36 ± 0.005 | 96.98 ± 0.011 | 97.35 ± 0.004 | 97.99 ± 0.005 | 99.05 ± 0.002 | 97.98 ± 0.006 |
SGD | 97.73 ± 0.008 | 97.10 ± 0.016 | 96.22 ± 0.012 | 97.24 ± 0.009 | 98.63 ± 0.005 | 97.22 ± 0.010 |
LGBM | 99.11 ± 0.000 | 98.26 ± 0.000 | 98.07 ± 0.000 | 98.91 ± 0.000 | 99.48 ± 0.000 | 98.91 ± 0.000 |
Stacking | 99.29 ± 0.004 | 99.21 ± 0.007 | 98.20 ± 0.007 | 99.13 ± 0.005 | 99.56 ± 0.002 | 99.13 ± 0.005 |
Method | ACC (%) | PRE (%) | RECALL (%) | MCC (%) | Gmean (%) | KAPPA (%) |
ICancer-Pred | 83.56 ± 0.199 | 78.49 ± 0.229 | 81.81 ± 0.179 | 82.50 ± 0.194 | 80.16 ± 0.236 | 89.43 ± 0.135 |
Stacking | 99.29 ± 0.004 | 99.29 ± 0.007 | 98.20 ± 0.007 | 99.13 ± 0.005 | 99.56 ± 0.002 | 99.13 ± 0.005 |
Method | ACC (%) | PRE (%) | RECALL (%) | MCC (%) | Gmean (%) | KAPPA (%) |
ICancer-Pred | 97.27 ± 0.006 | 97.37 ± 0.005 | 96.99 ± 0.007 | 96.82 ± 0.007 | ━ | 96.81 ± 0.007 |
Stacking | 98.22 ± 0.006 | 98.18 ± 0.007 | 97.96 ± 0.009 | 97.93 ± 0.007 | 98.35 ± 0.002 | 97.92 ± 0.007 |
Cancer tumor | Number of samples | Training (≈90%) | Testing (≈10%) |
BLCA | 434 | 390 | 44 |
COAD | 337 | 303 | 34 |
LIHC | 429 | 386 | 43 |
LUAD | 492 | 443 | 49 |
PRAD | 549 | 494 | 55 |
Total | 2241 | 2016 | 225 |
Method | ACC (%) | PRE (%) | RECALL (%) | MCC (%) | Gmean (%) | KAPPA (%) |
RFE | 98.36 ± 0.010 | 97.95 ± 0.013 | 97.74 ± 0.014 | 98.01 ± 0.012 | 98.01 ± 0.012 | 99.03 ± 0.006 |
ENET | 97.97 ± 0.009 | 97.07 ± 0.010 | 96.18 ± 0.016 | 96.81 ± 0.011 | 96.79 ± 0.012 | 98.40 ± 0.006 |
Ours | 99.22 ± 0.004 | 99.24 ± 0.004 | 99.23 ± 0.004 | 99.07 ± 0.005 | 99.60 ± 0.002 | 99.07 ± 0.005 |
Method | ACC (%) | PRE (%) | RECALL (%) | MCC (%) | Gmean (%) | KAPPA (%) |
MLP | 97.52 ± 0.010 | 96.85 ± 0.014 | 97.01 ± 0.013 | 96.99 ± 0.013 | 98.54 ± 0.006 | 96.98 ± 0.013 |
Bagging | 97.62 ± 0.015 | 96.81 ± 0.020 | 97.48 ± 0.017 | 97.13 ± 0.018 | 98.63 ± 0.009 | 97.10 ± 0.018 |
LR | 98.01 ± 0.011 | 97.37 ± 0.015 | 97.78 ± 0.019 | 97.60 ± 0.013 | 98.85 ± 0.006 | 97.59 ± 0.013 |
SVM | 97.87 ± 0.011 | 97.17 ± 0.016 | 97.67 ± 0.012 | 97.42 ± 0.014 | 98.76 ± 0.006 | 97.41 ± 0.014 |
RF | 97.77 ± 0.012 | 97.03 ± 0.016 | 97.28 ± 0.014 | 97.30 ± 0.014 | 98.70 ± 0.007 | 97.28 ± 0.014 |
SGD | 97.42 ± 0.013 | 96.87 ± 0.014 | 96.93 ± 0.021 | 96.91 ± 0.015 | 98.49 ± 0.008 | 96.86 ± 0.016 |
LGBM | 98.11 ± 0.013 | 97.56 ± 0.017 | 97.62 ± 0.015 | 97.71 ± 0.016 | 98.89 ± 0.008 | 97.70 ± 0.016 |
Stacking | 99.22 ± 0.004 | 99.24 ± 0.004 | 99.23 ± 0.004 | 99.07 ± 0.005 | 99.60 ± 0.002 | 99.07 ± 0.005 |
Method | ACC (%) | PRE (%) | RECALL (%) | MCC (%) | Gmean (%) | KAPPA (%) |
MLP | 98.22 ± 0.003 | 97.20 ± 0.009 | 97.25 ± 0.006 | 97.82 ± 0.004 | 98.94 ± 0.002 | 97.82 ± 0.004 |
Bagging | 98.58 ± 0.009 | 96.89 ± 0.004 | 97.62 ± 0.001 | 98.26 ± 0.002 | 99.20 ± 0.001 | 98.25 ± 0.002 |
LR | 98.22 ± 0.000 | 96.73 ± 0.000 | 97.32 ± 0.000 | 97.83 ± 0.000 | 98.97 ± 0.000 | 97.82 ± 0.000 |
SVM | 98.67 ± 0.000 | 97.09 ± 0.000 | 97.69 ± 0.000 | 98.37 ± 0.000 | 99.25 ± 0.000 | 98.36 ± 0.000 |
RF | 98.36 ± 0.005 | 96.98 ± 0.011 | 97.35 ± 0.004 | 97.99 ± 0.005 | 99.05 ± 0.002 | 97.98 ± 0.006 |
SGD | 97.73 ± 0.008 | 97.10 ± 0.016 | 96.22 ± 0.012 | 97.24 ± 0.009 | 98.63 ± 0.005 | 97.22 ± 0.010 |
LGBM | 99.11 ± 0.000 | 98.26 ± 0.000 | 98.07 ± 0.000 | 98.91 ± 0.000 | 99.48 ± 0.000 | 98.91 ± 0.000 |
Stacking | 99.29 ± 0.004 | 99.21 ± 0.007 | 98.20 ± 0.007 | 99.13 ± 0.005 | 99.56 ± 0.002 | 99.13 ± 0.005 |
Method | ACC (%) | PRE (%) | RECALL (%) | MCC (%) | Gmean (%) | KAPPA (%) |
ICancer-Pred | 83.56 ± 0.199 | 78.49 ± 0.229 | 81.81 ± 0.179 | 82.50 ± 0.194 | 80.16 ± 0.236 | 89.43 ± 0.135 |
Stacking | 99.29 ± 0.004 | 99.29 ± 0.007 | 98.20 ± 0.007 | 99.13 ± 0.005 | 99.56 ± 0.002 | 99.13 ± 0.005 |
Method | ACC (%) | PRE (%) | RECALL (%) | MCC (%) | Gmean (%) | KAPPA (%) |
ICancer-Pred | 97.27 ± 0.006 | 97.37 ± 0.005 | 96.99 ± 0.007 | 96.82 ± 0.007 | ━ | 96.81 ± 0.007 |
Stacking | 98.22 ± 0.006 | 98.18 ± 0.007 | 97.96 ± 0.009 | 97.93 ± 0.007 | 98.35 ± 0.002 | 97.92 ± 0.007 |