Export file:


  • RIS(for EndNote,Reference Manager,ProCite)
  • BibTex
  • Text


  • Citation Only
  • Citation and Abstract

Supervised machine learning models applied to disease diagnosis and prognosis

1 Department of Mathematical Sciences, University of Texas, El Paso, United States
2 Computational Science Program, University of Texas, El Paso, United States

This work analyses the diagnosis and prognosis of cancer and heart disease data using five Machine Learning (ML) algorithms. We compare the predictive ability of all the ML algorithms to breast cancer and heart disease. The important variables that causes cancer and heart disease are also studied. We predict the test data based on the important variables and compute the prediction accuracy using the Receiver Operating Characteristic (ROC) curve. The Random Forest (RF) and Principal Component Regression (PCR) provides the best performance in analyzing the breast cancer and heart disease data respectively.
  Article Metrics


1. Thorsten J (1999) Transductive Inference for Text Classification Using Support Vector Machines. Icml 99: 200–209.

2. Lavanya D, Rani DKU (2011) Analysis of feature selection with classification: Breast cancer datasets. IJCSE 2: 756–763.

3. Dwivedi AK (2018) Performance evaluation of different machine learning techniques for prediction of heart disease. Springer 29: 685–693.

4. Sharma A, Rani R (2019) C-HMOSHSSA: Gene selection for cancer classification using multi-objective meta-heuristic and machine learning methods. Comput Methods Programs Biomed 178: 219–235.    

5. Sharma A, Rani R (2017) An Optimized Framework for Cancer Classification Using Deep Learning and Genetic Algorithm. J Med Imaging Health Inf 7: 1851–1856.    

6. American Cancer Society. Breast Cancer Facts & Figures 2017-2018. Available from: https://www.cancer.org/.

7. Chourasia V, Pal S (2014) Data Mining Techniques: To Predict and Resolve Breast Cancer Survivability. IJCSMC 3: 10–22.

8. CDC (2017) Heart Disease Facts. Available from: https://www.cdc.gov/heartdisease/facts.htm.

9. Kahramanli H, Allahverdi N (2008) Design of a hybrid system for the diabetes and heart diseases. Expert Syst Appl 35: 82–89.    

10. Machine Learning Repository. Breast Cancer Wisconsin (Diagnostic) Data Set. Available from: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic).

11. Machine Learning Repository. Heart Disease Data Set. Available from: https://archive.ics.uci.edu/ml/datasets/heart+Disease.

12. Mariani MC, Bhuiyan MAM, Tweneboah OK (2018) Estimation of stochastic volatility by using Ornstein–Uhlenbeck type models. Phys A (Amsterdam, Neth) 491: 167–176.    

13. James G, Witten D, Hastie T, et al. (2013) An Introduction to Statistical Learning. New York: Springer.

14. Tikhonov regularization. Available from: https://en.wikipedia.org/wiki/Tikhonov regularization.

15. Jolliffe IT (2002) Principal Component Analysis, 2 Eds., Springer, 167–195.

16. Hastie T, Tibshirani R, Friedman J (2008) The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2 Eds, Springer.

17. Leo Breiman (2001) Random Forest. Springer 45: 5–32.

18. Oduro SD, Metia S, Duc H, et al. (2015) Multivariate adaptive regression splines models for vehicular emission prediction. Visualization Eng 3: 13.    

19. Zhang W, Goh ATC (2016) Multivariate Adaptive Regression Splines and Neural Network Models for Prediction of Pile Drivability. Geosci Front 7: 45–52.    

20. James G, Witten D, Hastie T, et al. (2013) An Introduction to Statistical Learning with Applications in R. Springer.

© 2019 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution Licese (http://creativecommons.org/licenses/by/4.0)

Download full text in PDF

Export Citation

Article outline

Show full outline
Copyright © AIMS Press All Rights Reserved