Research article

Supervised machine learning models applied to disease diagnosis and prognosis

  • Received: 18 August 2019 Accepted: 08 October 2019 Published: 17 October 2019
  • This work analyses the diagnosis and prognosis of cancer and heart disease data using five Machine Learning (ML) algorithms. We compare the predictive ability of all the ML algorithms to breast cancer and heart disease. The important variables that causes cancer and heart disease are also studied. We predict the test data based on the important variables and compute the prediction accuracy using the Receiver Operating Characteristic (ROC) curve. The Random Forest (RF) and Principal Component Regression (PCR) provides the best performance in analyzing the breast cancer and heart disease data respectively.

    Citation: Maria C Mariani, Osei K Tweneboah, Md Al Masum Bhuiyan. Supervised machine learning models applied to disease diagnosis and prognosis[J]. AIMS Public Health, 2019, 6(4): 405-423. doi: 10.3934/publichealth.2019.4.405

    Related Papers:

  • This work analyses the diagnosis and prognosis of cancer and heart disease data using five Machine Learning (ML) algorithms. We compare the predictive ability of all the ML algorithms to breast cancer and heart disease. The important variables that causes cancer and heart disease are also studied. We predict the test data based on the important variables and compute the prediction accuracy using the Receiver Operating Characteristic (ROC) curve. The Random Forest (RF) and Principal Component Regression (PCR) provides the best performance in analyzing the breast cancer and heart disease data respectively.


    加载中


    [1] Thorsten J (1999) Transductive Inference for Text Classification Using Support Vector Machines. Icml 99: 200–209.
    [2] Lavanya D, Rani DKU (2011) Analysis of feature selection with classification: Breast cancer datasets. IJCSE 2: 756–763.
    [3] Dwivedi AK (2018) Performance evaluation of different machine learning techniques for prediction of heart disease. Springer 29: 685–693.
    [4] Sharma A, Rani R (2019) C-HMOSHSSA: Gene selection for cancer classification using multi-objective meta-heuristic and machine learning methods. Comput Methods Programs Biomed 178: 219–235. doi: 10.1016/j.cmpb.2019.06.029
    [5] Sharma A, Rani R (2017) An Optimized Framework for Cancer Classification Using Deep Learning and Genetic Algorithm. J Med Imaging Health Inf 7: 1851–1856. doi: 10.1166/jmihi.2017.2266
    [6] American Cancer Society. Breast Cancer Facts & Figures 2017-2018. Available from: https://www.cancer.org/.
    [7] Chourasia V, Pal S (2014) Data Mining Techniques: To Predict and Resolve Breast Cancer Survivability. IJCSMC 3: 10–22.
    [8] CDC (2017) Heart Disease Facts. Available from: https://www.cdc.gov/heartdisease/facts.htm.
    [9] Kahramanli H, Allahverdi N (2008) Design of a hybrid system for the diabetes and heart diseases. Expert Syst Appl 35: 82–89. doi: 10.1016/j.eswa.2007.06.004
    [10] Machine Learning Repository. Breast Cancer Wisconsin (Diagnostic) Data Set. Available from: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic).
    [11] Machine Learning Repository. Heart Disease Data Set. Available from: https://archive.ics.uci.edu/ml/datasets/heart+Disease.
    [12] Mariani MC, Bhuiyan MAM, Tweneboah OK (2018) Estimation of stochastic volatility by using Ornstein–Uhlenbeck type models. Phys A (Amsterdam, Neth) 491: 167–176. doi: 10.1016/j.physa.2017.08.153
    [13] James G, Witten D, Hastie T, et al. (2013) An Introduction to Statistical Learning. New York: Springer.
    [14] Tikhonov regularization. Available from: https://en.wikipedia.org/wiki/Tikhonov regularization.
    [15] Jolliffe IT (2002) Principal Component Analysis, 2 Eds., Springer, 167–195.
    [16] Hastie T, Tibshirani R, Friedman J (2008) The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2 Eds, Springer.
    [17] Leo Breiman (2001) Random Forest. Springer 45: 5–32.
    [18] Oduro SD, Metia S, Duc H, et al. (2015) Multivariate adaptive regression splines models for vehicular emission prediction. Visualization Eng 3: 13. doi: 10.1186/s40327-015-0024-4
    [19] Zhang W, Goh ATC (2016) Multivariate Adaptive Regression Splines and Neural Network Models for Prediction of Pile Drivability. Geosci Front 7: 45–52. doi: 10.1016/j.gsf.2014.10.003
    [20] James G, Witten D, Hastie T, et al. (2013) An Introduction to Statistical Learning with Applications in R. Springer.
  • Reader Comments
  • © 2019 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(507) PDF downloads(605) Cited by(0)

Article outline

Figures and Tables

Figures(8)  /  Tables(7)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog