%0 Journal Article
%T Lung Cancer Prediction from Elvira Biomedical Dataset Using Ensemble Classifier with Principal Component Analysis
%A Teresa Kwamboka Abuya
%J Journal of Data Analysis and Information Processing
%P 175-199
%@ 2327-7203
%D 2023
%I Scientific Research Publishing
%R 10.4236/jdaip.2023.112010
%X Machine learning algorithms (MLs) can potentially improve disease
diagnostics, leading to early detection and treatment of these diseases. As a
malignant tumor whose primary focus is
located in the bronchial mucosal epithelium, lung cancer has the highest
mortality and morbidity among cancer types, threatening health and life of
patients suffering from the disease. Machine learning algorithms such as Random
Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and
Naïve Bayes (NB)
have been used for lung cancer prediction. However they still face challenges
such as high dimensionality of the feature space, over-fitting, high
computational complexity, noise and missing data, low accuracies, low precision
and high error rates. Ensemble learning, which combines classifiers, may be
helpful to boost prediction on new data. However, current ensemble ML
techniques rarely consider comprehensive evaluation metrics to evaluate the
performance of individual classifiers. The main purpose of this study was to
develop an ensemble classifier that improves lung cancer prediction. An
ensemble machine learning algorithm is developed based on RF, SVM, NB, and KNN.
Feature selection is done based on Principal Component Analysis (PCA) and
Analysis of Variance (ANOVA). This algorithm is then executed on lung cancer data
and evaluated using execution time, true positives (TP), true negatives (TN),
false positives (FP), false negatives (FN), false positive rate (FPR), recall
(R), precision (P) and F-measure (FM). Experimental results show that the
proposed ensemble classifier has the best classification of 0.9825% with the
lowest error rate of 0.0193. This is followed by SVM in which the probability
of having the best classification is 0.9652% at an error rate of 0.0206. On the
other hand, NB had the worst performance of 0.8475% classification at 0.0738
error rate.
%K Accuracy
%K False Positive Rate
%K Naï
%K ve Bayes
%K Random Forest
%K Lung Cancer Prediction
%K Principal Component Analysis
%K Support Vector Machine
%K K-Nearest Neighbor
%U http://www.scirp.org/journal/PaperInformation.aspx?PaperID=124879