Machine learning algorithms (MLs) can potentially improve disease
diagnostics, leading to early detection and treatment of these diseases. As a
malignant tumor whose primary focus is
located in the bronchial mucosal epithelium, lung cancer has the highest
mortality and morbidity among cancer types, threatening health and life of
patients suffering from the disease. Machine learning algorithms such as Random
Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and
Naïve Bayes(NB)
have been used for lung cancer prediction. However they still face challenges
such as high dimensionality of the feature space, over-fitting, high
computational complexity, noise and missing data, low accuracies, low precision
and high error rates. Ensemble learning, which combines classifiers, may be
helpful to boost prediction on new data. However, current ensemble ML
techniques rarely consider comprehensive evaluation metrics to evaluate the
performance of individual classifiers. The main purpose of this study was to
develop an ensemble classifier that improves lung cancer prediction. An
ensemble machine learning algorithm is developed based on RF, SVM, NB, and KNN.
Feature selection is done based on Principal Component Analysis (PCA) and
Analysis of Variance (ANOVA). This algorithm is then executed on lung cancer data
and evaluated using execution time, true positives (TP), true negatives (TN),
false positives (FP), false negatives (FN), false positive rate (FPR), recall
(R), precision (P) and F-measure (FM). Experimental results show that the
proposed ensemble classifier has the best classification of 0.9825% with the
lowest error rate of 0.0193. This is followed by SVM in which the probability
of having the best classification is 0.9652% at an error rate of 0.0206. On the
other hand, NB had the worst performance of 0.8475% classification at 0.0738
error rate.
References
[1]
Liu, N., Li, X., Qi, E., Xu, M., Li, L. and Gao, B. (2020) A Novel Ensemble Learning Paradigm for Medical Diagnosis with Imbalanced Data. IEEE Access, 8, 171263-171280.
[2]
Yekkala, I., Dixit, S. and Jabbar, M.A. (2017) Prediction of Heart Disease Using Ensemble Learning and Particle Swarm Optimization. Proceedings of the 2017 International Conference on Smart Technologies for Smart Nation (SmartTechCon), Bengaluru, 17-19 August 2017, 691-698. https://doi.org/10.1109/SmartTechCon.2017.8358460
[3]
Rosellini, A.J., Liu, S., Anderson, G.N., Sbi, S., Tung, E.S. and Knyazhanskaya, E. (2020) Developing Algorithms to Predict Adult Onset Internalizing Disorders: An Ensemble Learning Approach. Journal of Psychiatric Research, 121, 189-196. https://doi.org/10.1016/j.jpsychires.2019.12.006
[4]
Jiang, J., Li, X., Zhao, C., Guan, Y. and Yu, Q. (2017) Learning and Inference in Knowledge-Based Probabilistic Model for Medical Diagnosis. Knowledge-Based Systems, 138, 58-68. https://doi.org/10.1016/j.knosys.2017.09.030
[5]
Baccouche, A., Garcia-Zapirain, B., Castillo Olea, C. and Elmaghraby, A. (2020) Ensemble Deep Learning Models for Heart Disease Classification: A Case Study from Mexico. Information, 11, Article 207. https://doi.org/10.3390/info11040207
[6]
Eshtay, M., Faris, H. and Obeid, N. (2018) Improving Extreme Learning Machine by Competitive Swarm Optimization and Its Application for Medical Diagnosis Problems. Expert Systems with Applications, 104, 134-152. https://doi.org/10.1016/j.eswa.2018.03.024
[7]
Alkeshuosh, A.H., Moghadam, M.Z., Al Mansoori, I. and Abdar, M. (2017) Using PSO Algorithm for Producing Best Rules in Diagnosis of Heart Disease. Proceeding of 2017 International Conference on Computer and Applications (ICCA), 6-7 September 2017, Doha, 306-311. https://doi.org/10.1109/COMAPP.2017.8079784
[8]
Sevakula, R.K. and Verma, N.K. (2017) Assessing Generalization Ability of Majority Vote Point Classifiers. IEEE Transactions on Neural Networks and Learning Systems, 28, 2985-2997. https://doi.org/10.1109/TNNLS.2016.2609466
[9]
Mazlan, A., Sahabudin, N., Remli, M., et al. (2021) A Review on Recent Progress in Machine Learning and Deep Learning Methods for Cancer Classification on Gene Expression Data. Processes, 9, Article 1466. https://doi.org/10.3390/pr9081466
[10]
Nanglia, P., Kumar, S., Mahajan, A.N., Singh, P. and Rathee, D. (2021) A Hybrid Algorithm for Lung Cancer Classification Using SVM and Neural Networks. ICT Express, 7, 335-341. https://doi.org/10.1016/j.icte.2020.06.007
[11]
Bolón-Canedo, V., Sánchez-Maroño, N. and Alonso-Betanzos, A. (2015) Recent Advances and Emerging Challenges of Feature Selection in the Context of Big Data. Knowledge-Based Systems, 86, 33-45. https://doi.org/10.1016/j.knosys.2015.05.014
[12]
Baker, Q., Gharaibeh, M. and Al-Harahsheh, Y. (2021) Predicting Lung Cancer Survival Time Using Deep Learning Techniques. Proceeding of 2021 12th International Conference on Information and Communication Systems (ICICS), Valencia, 24-26 May 2021, 177-181. https://doi.org/10.1109/ICICS52457.2021.9464589
[13]
Nguyen, T.H., Shirai, K. and Velcin, J. (2015) Sentiment Analysis on Social Media for Stock Movement Prediction. Expert Systems with Applications, 42, 9603-9611. https://doi.org/10.1016/j.eswa.2015.07.052
[14]
Li, H., Cui, Y., Liu, Y., Li, W., Shi, Y., Fang, C. and Lu, Y. (2018) Ensemble Learning for Overall Power Conversion Efficiency of the All-Organic Dye-Sensitized Solar Cells. IEEE Access, 6, 34118-34126. https://doi.org/10.1109/ACCESS.2018.2850048
[15]
Zhang, X. and Mahadevan, S. (2019) Ensemble Machine Learning Models for Aviation Incident Risk Prediction. Decision Support Systems, 116, 48-63. https://doi.org/10.1016/j.dss.2018.10.009
[16]
Mert, A., Kılıç, N., Bilgili, E. and Akan, A. (2015) Breast Cancer Detection with Reduced Feature Set. Computational and Mathematical Methods in Medicine, 2015, Article ID: 265138. https://doi.org/10.1155/2015/265138
[17]
Aličković, E. and Subasi, A. (2017) Breast Cancer Diagnosis Using GA Feature Selection and Rotation Forest. Neural Computing and Applications, 28, 753-763. https://doi.org/10.1155/2015/265138
[18]
Abdar, M. and Makarenkov, V. (2019) CWV-BANN-SVM Ensemble Learning Classifier for an Accurate Diagnosis of Breast Cancer. Measurement, 146, 557-570. https://doi.org/10.1016/j.measurement.2019.05.022
[19]
übeyli, E.D. (2007) Implementing Automated Diagnostic Systems for Breast Cancer Detection. Expert Systems with Applications, 33, 1054-1062. https://doi.org/10.1016/j.eswa.2006.08.005
[20]
Joshi, A. and Mehta, A. (2018) Analysis of K-Nearest Neighbor Technique for Breast Cancer Disease Classification. International Journal of Recent Scientific Research, 9, 26126-26130.
[21]
Chhatkuli, R.B., Demachi, K., Miyamoto, N., Uesaka, M. and Haga, A. (2015) Dynamic Image Prediction Using Principal Component and Multi-Channel Singular spectral Analysis: A Feasibility Study. Open Journal of Medical Imaging, 5, 133-142. https://doi.org/10.4236/ojmi.2015.53017
[22]
Karabatak, M. (2015) A New Classifier for Breast Cancer Detection Based on Naïve Bayesian. Measurement, 72, 32-36. https://doi.org/10.1016/j.measurement.2015.04.028
[23]
Maleki, N., Zeinali, Y. and Niaki, S.T.A. (2021) A k-NN Method for Lung Cancer Prognosis with the Use of a Genetic Algorithm for Feature Selection. Expert Systems with Applications, 164, Article ID: 113981. https://doi.org/10.1016/j.eswa.2020.113981
[24]
Lynch, C.M., Abdollahi, B., Fuqua, J.D., de Carlo, A.R., Bartholomai, J.A., Balgemann, R.N. and Frieboes, H.B. (2017) Prediction of Lung Cancer Patient Survival via Supervised Machine Learning Classification Techniques. International Journal of Medical Informatics, 108, 1-8. https://doi.org/10.1016/j.ijmedinf.2017.09.013
[25]
Alharbi, A. (2018) An Automated Computer System Based on Genetic Algorithm and Fuzzy Systems for Lung Cancer Diagnosis. International Journal of Nonlinear Sciences and Numerical Simulation, 19, 583-594. https://doi.org/10.1515/ijnsns-2017-0048
[26]
Lakshmanaprabu, S.K., Mohanty, S.N., Shankar, K., Arunkumar, N. and Ramirez, G. (2019) Optimal Deep Learning Model for Classification of Lung Cancer on CT Images. Future Generation Computer Systems, 92, 374-382. https://doi.org/10.1016/j.future.2018.10.009
[27]
Radhika, P.R., Nair, R.A.S. and Veena, G. (2019) A Comparative Study of Lung Cancer Detection Using Machine Learning Algorithms. Proceedings of 2019 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), Coimbatore, 20-22 February 2019, 1-4. https://doi.org/10.1109/ICECCT.2019.8869001
[28]
Pradeep, K.R. and Naveen, N.C. (2018) Lung Cancer Survivability Prediction Based on Performance Using Classification Techniques of Support Vector Machines, C4.5 and Naive Bayes Algorithms for Healthcare Analytics. Procedia Computer Science, 132, 412-420. https://doi.org/10.1016/j.procs.2018.05.162
[29]
Yuan, Q., Cai, T., Hong, C., et al. (2021) Performance of a Machine Learning Algorithm Using Electronic Health Record Data to Identify and Estimate Survival in a Longitudinal Cohort of Patients with Lung Cancer. JAMA Network Open, 4, e2114723. https://doi.org/10.1001/jamanetworkopen.2021.14723
[30]
Kanavati, F., Toyokawa, G., Momosaki, S., Rambeau, M., Kozuma, Y., Shoji, F., et al. (2020) Weakly-Supervised Learning for Lung Carcinoma Classification Using Deep Learning. Scientific Reports, 10, Article No. 9297. https://doi.org/10.1038/s41598-020-66333-x
[31]
Hsu, C.H., Chen, X., Lin, W., Jiang, C., Zhang, Y., Hao, Z. and Chung, Y.C. (2021) Effective Multiple Cancer Disease Diagnosis Frameworks for Improved Healthcare Using Machine Learning. Measurement, 175, Article ID: 109145. https://doi.org/10.1016/j.measurement.2021.109145
[32]
Solanki, A., Kumar, S., Rohan, C., Singh, S.P. and Tayal, A. (2021) Prediction of Breast and Lung Cancer, Comparative Review and Analysis Using Machine Learning Techniques. In: Singh, S.P., Solanki, A., Sharma, A., Polkowski, Z. and Kumar, R. Eds., Smart Computing and Self-Adaptive Systems, CRC Press, Boca Raton, 251-271. https://doi.org/10.1201/9781003156123-13
[33]
Bhattacharjee, A., Richards, W., Staunton, J., et al. (2001) Classification of Human Lung Carcinomas by mRNA Expression Profiling Reveals Distinct Adenocarcinoma Subclasses. Proceedings of the National Academy of Sciences, 98, 13790-13795. https://doi.org/10.1073/pnas.191502998