This paper proposes an adaptive and diverse hybrid-based ensemble method
to improve the performance of binary classification. The proposed method is a
non-linear combination of base models and the application of adaptive selection
of the most suitable model for each data instance. Ensemble method, an
important machine learning technique uses multiple single models to construct a
hybrid model. A hybrid model generally performs better compared to a single
individual model. In a given dataset the application of diverse single models
trained with different machine learning algorithms will have different
capabilities in recognizing patterns in the given training sample. The proposed
approach has been validated on Repeat Buyers Prediction dataset and Census
Income Prediction dataset. The experiment results indicate up to 18.5%
improvement on F1 score for the Repeat Buyers dataset compared to
the best individual model. This improvement also indicates that the proposed
ensemble method has an exceptional ability of dealing with imbalanced datasets.
In addition, the proposed method outperforms two other commonly used ensemble
methods (Averaging and Stacking) in terms of improved F1 score.
Finally, our results produced a slightly higher AUC score of 0.718 compared to
the previous result of AUC score of 0.712 in the Repeat Buyers competition.
This roughly 1% increase AUC score in performance is significant considering a
very big dataset such as Repeat Buyers.
References
[1]
David, O. and Maclin, R. (1999) Popular Ensemble Methods: An Empirical Study. Journal of Artificial Intelligence Research, 11, 169-198.
https://doi.org/10.1613/jair.614
[2]
Kuncheva, L.I. (2005) Diversity in Multiple Classifier Systems. Information Fusion, 6, 3-4. https://doi.org/10.1016/j.inffus.2004.04.009
[3]
Canuto, A.M.P., et al. (2005) Performance and Diversity Evaluation in Hybrid and Non-Hybrid Structures of Ensembles. Fifth International Conference on Hybrid Intelligent Systems, Rio de Janeiro, 6-9 November 2005, 6.
https://doi.org/10.1109/ICHIS.2005.87
[4]
Canuto, A.M.P., et al. (2006) Using Weighted Dynamic Classifier Selection Methods in Ensembles with Different Levels of Diversity. International Journal of Hybrid Intelligent Systems, 3, 147-158. https://doi.org/10.3233/HIS-2006-3303
Fan, X., Lung, C.-H. and Ajila, S.A. (2017) An Adaptive Diversity-Based Ensemble Method for Binary Classification. IEEE 41th Annual International Computers, Software & Applications Conference (COMPSAC’17), Torino, 4-8 July 2017.
https://doi.org/10.1109/COMPSAC.2017.49
[8]
David, B. (2012) Bayesian Reasoning and Machine Learning. Cambridge University Press, New York.
[9]
Sariel, H.-P., Roth, D. and Zimak, D. (2002) Constraint Classification: A New Approach to Multiclass Classification. International Conference on Algorithmic Learning Theory. Springer, Berlin Heidelberg.
[10]
Ethem, A. (2010) Introduction to Machine Learning. The MIT Press, Cambridge, 249-256.
[11]
Friedman, J.H. (2001) Greedy Function Approximation: A Gradient Boosting Machine. Annals of Statistics, 29, 1189-1232. https://doi.org/10.1214/aos/1013203451
[12]
Ron, K. and Provost, F. (1998) Glossary of Terms. Machine Learning, 30, 271-274.
https://doi.org/10.1023/A:1007411609915
[13]
Tom, F. (2006) An Introduction to ROC Analysis. Pattern Recognition Letters, 27, 861-874. https://doi.org/10.1016/j.patrec.2005.10.010
[14]
Wikipedia (2016) Receiver Operating Characteristic.
https://en.wikipedia.org/wiki/Receiver_operating_characteristic
[15]
William, C., Ravikumar, P. and Fienberg, S. (2003) A Comparison of String Metrics for Matching Names and Records. Kdd Workshop on Data Cleaning and Object Consolidation, 3.
[16]
Jesse, D. and Goadrich, M. (2006) The Relationship between Precision-Recall and ROC Curves. Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, 25-29 June 2006, 233-240. https://doi.org/10.1145/1143844.1143874
[17]
Breiman, L. (1996) Bagging Predictors. Machine Learning, 24, 123-140.
https://doi.org/10.1007/BF00058655
[18]
Freund, Y., et al. (1997) Using and Combining Predictors That Specialize. Proceedings of the 29th Annual ACM Symposium on Theory of Computing, El Paso, 4-6 May 1997, 334-343. https://doi.org/10.1145/258533.258616
[19]
Friedman, N., Geiger, D. and Goldszmidt, M. (1997) Bayesian Network Classifiers. Machine Learning, 29, 131-163.
[20]
Chen, T. and Guestrin, C. (2016) Xgboost: A Scalable Tree Boosting System.
[21]
Bergstra, J., et al. (2006) Aggregate Features and AdaBoost for Music Classification. Machine Learning, 65, 473-484.
Ham, J., et al. (2005) Investigation of the Random Forest Framework for Classification of Hyperspectral Data. IEEE Transactions on Geoscience and Remote Sensing, 43, 492-501. https://doi.org/10.1109/TGRS.2004.842481
[24]
Atkinson, E.J., et al. (2012) Assessing Fracture Risk Using Gradient Boosting Machine (GBM) Models. Journal of Bone and Mineral Research, 27, 1397-1404.
https://doi.org/10.1002/jbmr.1577
[25]
Ghorbani, A.A. and Owrangh, K. (2001) Stacked Generalization in Neural Networks: Generalization on Statistically Neutral Problems. Proceedings of the International Joint Conference Neural Networks, Washington DC, 15-19 July 2001, 1715-1720.
[26]
Wang, S.-Q., Yang, J. and Chou, K.-C. (2006) Using Stacked Generalization to Predict Membrane Protein Types Based on Pseudo-Amino Acid Composition. Journal of Theoretical Biology, 242, 941-946. https://doi.org/10.1016/j.jtbi.2006.05.006
[27]
Tsymbal, A., et al. (2008) Dynamic Integration of Classifiers for Handling Concept Drift. Information Fusion, 9, 56-68. https://doi.org/10.1016/j.inffus.2006.11.002
[28]
Bhatnagar, V., et al. (2014) Accuracy-Diversity Based Pruning of Classifier Ensembles. Progress in Artificial Intelligence, 2, 97-111.
[29]
Giacinto, G. and Roli, F. (2001) Dynamic Classifier Selection Based on Multiple Classifier Behaviour. Pattern Recognition, 34, 1879-1881.
https://doi.org/10.1016/S0031-3203(00)00150-3
[30]
Liu, G., et al. (2015) Report for Repeated Buyer Prediction Competition by Team 9*STAR*. Proceedings of the 1st International Workshop on Social Influence Analysis Co-Located with 24th International Joint Conference on Artificial Intelligence (IJCAI 2015), Buenos Aires, 27 July 2015.
[31]
He, B., et al. (2015) Repeat Buyers Prediction after Sales Promotion for Tmall Platform. Proceedings of the 1st International Workshop on Social Influence Analysis co-located with 24th International Joint Conference on Artificial Intelligence (IJCAI 2015), Buenos Aires, Argentina, 27 July 2015.
[32]
Domingos, P. and Pazzani, M. (1997) On the Optimality of the Simple Bayesian Classifier under Zero-One Loss. Machine Learning, 29, 103-130.
[33]
Schutt, R. and O’Neil, C. (2013) Doing Data Science: Straight Talk from the Frontline. O’Reilly Media, Inc.