We introduce multiscale wavelet kernels to kernel principal component analysis (KPCA) to narrow down the search of parameters required in the calculation of a kernel matrix. This new methodology incorporates multiscale methods into KPCA for transforming multiscale data. In order to illustrate application of our proposed method and to investigate the robustness of the wavelet kernel in KPCA under different levels of the signal to noise ratio and different types of wavelet kernel, we study a set of two-class clustered simulation data. We show that WKPCA is an effective feature extraction method for transforming a variety of multidimensional clustered data into data with a higher level of linearity among the data attributes. That brings an improvement in the accuracy of simple linear classifiers. Based on the analysis of the simulation data sets, we observe that multiscale translation invariant wavelet kernels for KPCA has an enhanced performance in feature extraction. The application of the proposed method to real data is also addressed. 1. Introduction The majority of the techniques developed in the field of computational mathematics and statistics for modeling multivariate data have focused on detecting or explaining linear relationships among the variables, such as, in principal component analysis (PCA) [1]. However, in real-world applications the property of linearity is a rather special case and most of the captured behaviors of data are nonlinear. In data classification, a possible way to handle nonlinearly separable problems is to use a non-linear classifier [2, 3]. In this approach a classifier constructs an underlying objective function using some selected components of the original input data. An alternative approach presented in this paper is to map the data from the original input space into a feature space through kernel-based methods [4, 5]. PCA is often used for feature extraction in high dimensional data classification problems. The objective for PCA is to map the data attributes into a new feature space that contains better, that is, more linearly separable, features than those in the original input space. As the standard PCA is linear in nature, the projections in the principal component space do not always yield meaningful results for classification purposes. For solving this problem, various kernel-based methods have been applied successfully in machine learning and data analysis (e.g., [6–10]). The introduction of the kernel allows working implicitly in some extended feature space, while doing all computations in the original input
References
[1]
I. T. Jolliffe, Principal Component Analysis, Springer Science, New York, NY, USA, 2004.
[2]
B. Scholkopf, A. J. Smola, and K. R. Muller, “Nonlinear component analysis as a kernel eigenvalue problem,” Neural Computation, vol. 10, no. 5, pp. 1299–1319, 1998.
[3]
B. Scholkopf and A. J. Smola, Learning with Kernels—Support Vector Machines, Regularization, Optimization and Beyond, The MIT Press, Cambridge, Mass, USA, 2002.
[4]
R. Rosipal, M. Girolami, L. J. Trejo, and A. Cichocki, “Kernel PCA for feature extraction and de-noising in nonlinear regression,” Neural Computing & Applications, vol. 10, no. 3, pp. 231–243, 2001.
[5]
T. Hastie, R. Tibshirani, and A. Buja, “Flexible discriminant analysis by optimal scoring,” Journal of the American statistical Association, vol. 89, pp. 1255–1270, 1994.
[6]
K. R. Muller, S. Mika, G. Ratsch, K. Tsuda, and B. Scholkopf, “An introduction to kernel-based learning algorithms,” IEEE Transactions on Neural Networks, vol. 12, no. 2, pp. 181–201, 2001.
[7]
M. Zhu, “Kernels and ensembles: perspectives on statistical learning,” The American Statistician, vol. 62, no. 2, pp. 97–109, 2008.
[8]
A. Karatzoglou, A. Smola, K. Hornik, and A. Zeileis, “Kernlab—An S4 package for kernel methods in R,” Journal of Statistical Software, vol. 11, no. 9, pp. 1–20, 2004.
[9]
V. Vapnik, The Nature of Statistical Learning Theory, Springer, New York, NY, USA, 1995.
[10]
V. Vapnik, Statistical Learning Theory, Wiley, New York, NY, USA, 1998.
[11]
L. Zhang, W. D. Zhou, and L. C. Jiao, “Wavelet Support Vector Machine,” IEEE Transactions on Systems, Man, and Cybernetics B, vol. 34, no. 1, pp. 34–39, 2004.
[12]
T. Takiguchi and Y. Ariki, “Robust feature extraction using kernel PCA,” in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '06), pp. 509–512, Toulouse, France, May 2006.
[13]
B. Scholkopf, A. Smola, and K. R. Muller, “Nonlinear component analysis as a kernel eigenvalue problem,” Technical Report 44, Max-Planck-Institut fur biologische Kybernetik Arbeitsgruppe Bulthoff, Tubingen, Germany, 1996.
[14]
W. S. Chen, P. C. Yuen, J. Huang, and J. H. Lai, “Wavelet kernel construction for kernel discriminant analysis on face recognition,” in Proceedings of the Conference on Computer Vision and Pattern Recognition Workshops, p. 47, June 2006.
[15]
W. F. Zhang, D. Q. Dai, and H. Yan, “Framelet Kernels with applications to support vector regression and regularization networks,” IEEE Transactions on Systems, Man, and Cybernetics B, vol. 40, no. 4, pp. 1128–1144, 2009.
[16]
R. Opfer, “Multiscale kernels,” Technical Report, Institut fur Numerische und Angewandte Mathematik, Universitt Gottingen, 2004.
[17]
A. Rakotomamonjy and S. Canu, “Frames, reproducing kernels, regularization and learning,” Journal of Machine Learning Research, vol. 6, pp. 1485–1515, 2005.
[18]
P. Er?st? and L. Holmstr?m, “Bayesian multiscale smoothing for making inferences about features in scatterplots,” Journal of Computational and Graphical Statistics, vol. 14, no. 3, pp. 569–589, 2005.
[19]
T. Phienthrakul and B. Kijsirikul, “Evolutionary strategies for multi-scale radial basis function kernels in support vector machines,” in Proceedings of the Conference on Genetic and Evolutionary Computation (GECCO '05), pp. 905–911, Washington, DC, USA, June 2005.
[20]
N. Kingsbury, D. B. H. Tay, and M. Palaniswami, “Multi-scale kernel methods for classification,” in Proceedings of IEEE Workshop on Machine Learning for Signal Processing, pp. 43–48, September 2005.
[21]
F. Wang, G. Tan, and Y. Fang, “Multiscale wavelet support vector regression for traffic flow prediction,” in Proceedings of the 3rd International Symposium on Intelligent Information Technology Application (IITA '09), vol. 3, pp. 319–322, November 2009.
[22]
H. Cheng and J. Liu, “Super-resolution image reconstruction based on MWSVR estimation,” in Proceedings of the 7th World Congress on Intelligent Control and Automation (WCICA '08), pp. 5990–5994, June 2008.
[23]
J. Wang and H. Peng, “Multi-scale wavelet support vector regression for soft sensor modeling,” in Proceedings of the International Conference on Neural Networks and Brain (ICNNB '05), vol. 1, pp. 284–287, October 2005.
[24]
F. Han, D. Wang, C. Li, and X. Liao, “A multiresolution wavelet kernel for support vector regression,” in In Proceedings of the Third International Conference on Advances in Neural Networks (ISNN '06), vol. 1, pp. 1022–1029, 2006.
[25]
F. Wu and Y. Zhao, “Least square support vector machine on gaussian wavelet kernel function set,” in Proceedings of the 3rd International Conference on Advances in Neural Networks (ISNN '06), vol. 3971 of Lecture Notes in Computer Science, pp. 936–941, 2006.
[26]
S. Kadambe and P. Srinivasan, “Adaptive wavelets for signal classification and compression,” International Journal of Electronics and Communications, vol. 60, no. 1, pp. 45–55, 2006.
[27]
I. Daubechies, Ten Lectures on Wavelets, SIAM, Philadelphia, Pa, USA, 1992.
[28]
H. C. Shyu and Y. S. Sun, “Underwater acoustic signal analysis by multi-scaling and multi-translation wavelets,” in Wavelet Applications V, vol. 3391 of Proceedings of SPIE, pp. 628–636, Orlando, Fla, USA, April 1998.