This work leveraged predictive modeling techniques in machine learning
(ML) to predict heart disease using a dataset sourced from the Center for
Disease Control and Prevention in the US. The dataset was preprocessed and used
to train five machine learning models: random forest, support vector machine,
logistic regression, extreme gradient boosting and light gradient boosting. The
goal was to use the best performing model to develop a web application capable of reliably predicting heart
disease based on user-provided data. The extreme gradient boosting
classifier provided the most reliable results with precision, recall and
F1-score of 97%, 72%, and 83% respectively for Class 0 (no heart disease) and
21% (precision), 81% (recall) and 34% (F1-score) for Class 1 (heart disease).
The model was further deployed as a web application.
References
[1]
Chakraborty, C., Bhattacharya, M., Pal, S. and Lee, S. (2023) From Machine Learning to Deep Learning: An Advances of the Recent Data-Driven Paradigm Shift in Medicine and Healthcare. Current Research in Biotechnology, 7, Article ID: 100164. https://doi.org/10.1016/j.crbiot.2023.100164
[2]
Mbunge, E. and Batani, J. (2023) Application of Deep Learning and Machine Learning Models to Improve Healthcare in Sub-Saharan Africa: Emerging Opportunities, Trends, and Implications. Telematics and Informatics Reports, 11, Article ID: 100097. https://doi.org/10.1016/j.teler.2023.100097
[3]
Motwani, A., Shukla, P.K. and Pawar, M. (2022) Ubiquitous and Smart Healthcare Monitoring Frameworks Based on Machine Learning: A Comprehensive Review. Artificial Intelligence in Medicine, 134, Article ID: 102431. https://doi.org/10.1016/j.artmed.2022.102431
[4]
Rasheed, K., Qayyum, A., Ghaly, M., et al. (2022) Explainable, Trustworthy, and Ethical Machine Learning for Healthcare: A Survey. Computers in Biology and Medicine, 149, Article ID: 106043. https://doi.org/10.1016/j.compbiomed.2022.106043
[5]
Liao, W., He, J., Luo, X., Wu, M., Shen, Y., Li, C. and Chen, N. (2022) Automatic Delineation of Gross Tumor Volume Based on Magnetic Resonance Imaging by Performing a Novel Semisupervised Learning Framework in Nasopharyngeal Carcinoma. International Journal of Radiation Oncology Biology Physics, 113, 893-902. https://doi.org/10.1016/j.ijrobp.2022.03.031
[6]
Pierre, K., Haneberg, A.G., Kwak, S., Peters, K.R., Hochhegger, B., Sananmuang, T., Tunlayadechanont, P., Tighe, P.J., Mancuso, A. and Forghani, R. (2023) Applications of Artificial Intelligence in the Radiology Roundtrip: Process Streamlining, Workflow Optimization, and Beyond. Seminars in Roentgenology, 58, 158-169. https://doi.org/10.1053/j.ro.2023.02.003
[7]
Zhai, K., Yousef, M.S., Mohammed, S., Al-Dewik, N.I. and Qoronfleh, M.W. (2023) Optimizing Clinical Workflow Using Precision Medicine and Advanced Data Analytics. Processes, 11, Article No. 939. https://doi.org/10.3390/pr11030939
[8]
Javaid, M., Haleem, A., Singh, R.P., Suman, R. and Rab, S. (2022) Significance of Machine Learning in Healthcare: Features, Pillars and Applications. International Journal of Intelligent Networks, 3, 58-73. https://doi.org/10.1016/j.ijin.2022.05.002
[9]
Behera, M.P., Sarangi, A., Mishra, D. and Sarangi, S.K. (2023) A Hybrid Machine Learning Algorithm for Heart and Liver Disease Prediction Using Modified Particle Swarm Optimization with Support Vector Machine. Procedia Computer Science, 218, 818-827. https://doi.org/10.1016/j.procs.2023.01.062
[10]
Abdalrada, A.S., Abawajy, J. and Al-Quraishi, T. (2022) Machine Learning Models for Prediction of Co-Occurrence of Diabetes and Cardiovascular Diseases: A Retrospective Cohort Study. Journal of Diabetes & Metabolic Disorders, 21, 251-261. https://doi.org/10.1007/s40200-021-00968-z
[11]
Chari, S., et al. (2023) Informing Clinical Assessment by Contextualizing Post-Hoc Explanations of Risk Prediction Models in Type-2 Diabetes. Artificial Intelligence in Medicine, 137, Article ID: 102498. https://doi.org/10.1016/j.artmed.2023.102498
[12]
Dworzynski, P., Aasbrenn, M., Rostgaard, K., Melbye, M., Gerds, T.A., Hjalgrim, H. and Pers, T.H. (2020) Nationwide Prediction of Type 2 Diabetes Comorbidities. Scientific Reports, 10, Article No. 1776. https://doi.org/10.1038/s41598-020-58601-7
[13]
Ojeme, B. and Mbogho, A. (2016) Selecting Learning Algorithms for Simultaneous Identification of Depression and Comorbid Disorders. Procedia Computer Science, 96, 1294-1303. https://doi.org/10.1016/j.procs.2016.08.174
[14]
Tennenhouse, L.G., Marrie, R.A., Bernstein, C.N., Lix, L.M. and CIHR Team in Defining the Burden and Managing the Effects of Psychiatric Comorbidity in Chronic Immunoinflammatory Disease (2020) Machine-Learning Models for Depression and Anxiety in Individuals with Immune-Mediated Inflammatory Disease. Journal of Psychosomatic Research, 134, Article ID: 110126. https://doi.org/10.1016/j.jpsychores.2020.110126
[15]
Wang, X., Eichhorn, J., Haq, I. and Baghal, A. (2021) Resting-State Brain Metabolic Fingerprinting Clusters (Biomarkers) and Predictive Models for Major Depression in Multiple Myeloma Patients. PLOS ONE, 16, e0251026. https://doi.org/10.1371/journal.pone.0251026
[16]
Farran, B., Channanath, A.M., Behbehani, K. and Thanaraj, T.A. (2013) Predictive Models to Assess Risk of Type 2 Diabetes, Hypertension and Comorbidity: Machine-Learning Algorithms and Validation Using National Health Data from Kuwait—A Cohort Study. BMJ Open, 3, e002457. https://doi.org/10.1136/bmjopen-2012-002457
[17]
Nikolaou, V., et al. (2021) The Cardiovascular Phenotype of Chronic Obstructive Pulmonary Disease (COPD): Applying Machine Learning to the Prediction of Cardiovascular Comorbidities. Respiratory Medicine, 186, Article ID: 106528. https://doi.org/10.1016/j.rmed.2021.106528
[18]
Glauser, T., et al. (2020) Identifying Epilepsy Psychiatric Comorbidities with Machine Learning. Acta Neurologica Scandinavica, 141, 388-396. https://doi.org/10.1111/ane.13216
[19]
Linden, T., De Jong, J., Lu, C., Kiri, V., Haeffs, K. and Fröhlich, H. (2021) An Explainable Multimodal Neural Network Architecture for Predicting Epilepsy Comorbidities Based on Administrative Claims Data. Frontiers in Artificial Intelligence, 4, Article ID: 610197. https://doi.org/10.3389/frai.2021.610197
[20]
Asih, P.S., Azhar, Y., Wicaksono, G.W. and Akbi, D.R. (2023) Interpretable Machine Learning Model for Heart Disease Prediction. Procedia Computer Science, 227, 439-445. https://doi.org/10.1016/j.procs.2023.10.544
[21]
Nashif, S., Raihan, Md.R., Islam, Md.R. and Imam, M.H. (2018) Heart Disease Detection by Using Machine Learning Algorithms and a Real-Time Cardiovascular Health Monitoring System. World Journal of Engineering and Technology, 6, 854-873. https://doi.org/10.4236/wjet.2018.64057
[22]
Uddin, S., Wang, S., Lu, H., Khan, A., Hajati, F. and Khushi, M. (2022) Comorbidity and Multimorbidity Prediction of Major Chronic Diseases Using Machine Learning and Network Analytics. Expert Systems with Applications, 205, Article ID: 117761. https://doi.org/10.1016/j.eswa.2022.117761
[23]
Yang, P., Qiu, H., Wang, L. and Zhou, L. (2022) Early Prediction of High-Cost Inpatients with Ischemic Heart Disease Using Network Analytics and Machine Learning. Expert Systems with Applications, 210, Article ID: 118541. https://doi.org/10.1016/j.eswa.2022.118541
[24]
Australian Government Department of Health (2020) Chronic Conditions in Australia. https://www.health.gov.au/topics/chronic-conditions/chronic-conditions-in-australia
[25]
Janosi, A., Steinbrunn, W., Pfisterer, M. and Detrano, R. (1988) Heart Disease. UCI Machine Learning Repository.
[26]
Mortaz, E. (2020) Imbalance Accuracy Metric for Model Selection in Multi-Class Imbalance Classification Problems. Knowledge-Based Systems, 210, Article ID: 106490. https://doi.org/10.1016/j.knosys.2020.106490
[27]
Bangdiwala, S.I., Fonn, S., Okoye, O., et al. (2010) Workforce Resources for Health in Developing Countries. Public Health Reviews, 32, 296-318. https://doi.org/10.1007/BF03391604
[28]
Lamuri, A., et al. (2023) Burnout Dimension Profiles among Healthcare Workers in Indonesia. Heliyon, 9, e14519. https://doi.org/10.1016/j.heliyon.2023.e14519
[29]
Moyo, E., et al. (2023) Burnout among Healthcare Workers during Public Health Emergencies in Sub-Saharan Africa: Contributing Factors, Effects, and Prevention Measures. Human Factors in Healthcare, 3, Article ID: 100039. https://doi.org/10.1016/j.hfh.2023.100039
[30]
Asante, A. and Hall, J. (2011) A Review of Health Leadership and Management Capacity in Papua New Guinea. Human Resources for Health Knowledge Hub, University of New South Wales, Sydney. https://sph.med.unsw.edu.au/sites/default/files/sphcm/Centres_and_Units/LM_PNG_Report.pdf
[31]
Mitchell, M., Thomason, J., Donaldson, D. and Garner, P. (1991) The Cost of Rural Health Services in Papua New Guinea. Papua and New Guinea Medical Journal, 34, 276-284.
[32]
Wiltshire, C., Watson, A.H.A., Lokinap, D. and Currie, T. (2020) Papua New Guinea’s Primary Health Care System: Views from the Front Line. ANU and UPNG.
[33]
World Bank Group (2017) Health Financing System Assessment Papua New Guinea. World Bank Publications, Washington DC. https://documents1.worldbank.org/curated/en/906971515655591305/pdf/122589-wp-p154901-public-23994-png-health-financing-system-assessment-web.pdf
[34]
Centers for Disease Control and Prevention (CDC) (2020) Behavioral Risk Factor Surveillance System. Data Collected through the Behavioral Risk Factor Surveillance System. https://www.cdc.gov/brfss/annual_data/annual_2020.html
Weisstein, E.W. (n.d.) Arithmetic Mean. From MathWorld—A Wolfram Web Resource. https://mathworld.wolfram.com/ArithmeticMean.html
[37]
Weisstein, E.W. (n.d.) Standard Deviation. From MathWorld—A Wolfram Web Resource. https://mathworld.wolfram.com/StandardDeviation.html
[38]
MathWorks. (n.d.) Sequence Classification Using Inverse Frequency Class Weights. https://www.mathworks.com/help/deeplearning/ug/sequence-classification-using-inverse-frequency-class-weights.html
[39]
Stack Overflow Community (2019) How to Calculate Unbalanced Weights for BCEWithLogitsLoss in Pytorch. Stack Overflow. https://stackoverflow.com/questions/57021620/how-to-calculate-unbalanced-weights-for-bcewithlogitsloss-in-pytorch
[40]
Tantai, H. (2023, February) Use Weighted Loss Function to Solve Imbalanced Data Classification Problems. Medium. https://medium.com/@zergtant/use-weighted-loss-function-to-solve-imbalanced-data-classification-problems-749237f38b75
[41]
Liu, Y., Wang, Y. and Zhang, J. (2012) New Machine Learning Algorithm: Random Forest. In: Liu, B., Ma, M. and Chang, J., Eds., Information Computing and Applications, Lecture Notes in Computer Science, Vol. 7473, Springer, Berlin, 246-252. https://doi.org/10.1007/978-3-642-34062-8_32
[42]
Kecman, V. (2005) Support Vector Machines—An Introduction. In: Wang, L., Ed., Support Vector Machines: Theory and Applications, Studies in Fuzziness and Soft Computing, Vol. 177, Springer, Berlin, 1-47. https://doi.org/10.1007/10984697_1
[43]
Starbuck, C. (2023) Logistic Regression. In: Starbuck, C., Ed., The Fundamentals of People Analytics, Springer, Cham, 223-238. https://doi.org/10.1007/978-3-031-28674-2_12
[44]
Chen, T. and Guestrin, C. (2016) XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, 13-17 August 2016, 785-794. https://doi.org/10.1145/2939672.2939785
[45]
Ke, G., et al. (2017) LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, 4-9 December 2017, 314s9-3157. https://dl.acm.org/doi/10.5555/3294996.3295074
[46]
Buckland, M. and Gey, F. (1994) The Relationship between Recall and Precision. Journal of the American Society for Information Science, 45, 12-19. https://doi.org/10.1002/(SICI)1097-4571(199401)45:1<12::AID-ASI2>3.0.CO;2-L
[47]
Yu, L. and Zhou, N. (2021) Survey of Imbalanced Data Methodologies.
[48]
Ogunsanya, M., Isichei, J. and Desai, S. (2023) Grid Search Hyperparameter Tuning in Additive Manufacturing Processes. Manufacturing Letters, 35, 1031-1042. https://doi.org/10.1016/j.mfglet.2023.08.056