Tensor subspace analysis (TSA) and discriminant TSA (DTSA) are two effective two-sided projection methods for dimensionality reduction and feature extraction of face image matrices. However, they have two serious drawbacks. Firstly, TSA and DTSA iteratively compute the left and right projection matrices. At each iteration, two generalized eigenvalue problems are required to solve, which makes them inapplicable for high dimensional image data. Secondly, the metric structure of the facial image space cannot be preserved since the left and right projection matrices are not usually orthonormal. In this paper, we propose the orthogonal TSA (OTSA) and orthogonal DTSA (ODTSA). In contrast to TSA and DTSA, two trace ratio optimization problems are required to be solved at each iteration. Thus, OTSA and ODTSA have much less computational cost than their nonorthogonal counterparts since the trace ratio optimization problem can be solved by the inexpensive Newton-Lanczos method. Experimental results show that the proposed methods achieve much higher recognition accuracy and have much lower training cost. 1. Introduction Many applications in the field of information process, such as data mining, information retrieval, machine learning, and pattern recognition, require dealing with high-dimensional data. Dimensionality reduction has been a key technique for achieving high efficiency in manipulating the high-dimensional data. In dimensionality reduction, the high-dimensional data are transformed into a low-dimensional subspace with limited loss of information. Principal component analysis (PCA) [1] and linear discriminant analysis (LDA) [2] are two of the most well-known and widely used dimension reduction methods. PCA is an unsupervised method, which aims to find the projection directions by maximizing variance of features in the low-dimensional subspace. It is also considered as the best data representation method in that the mean squared error between the original data and the data reconstructed using the PCA transform result is the minimum. LDA is a supervised method and is based on the following idea: the transform results of the data points of different classes should be far as much as possible from each other and the transform results of the data points of the same class should be close as much as possible to each other. To achieve this goal, LDA seeks to find optimal linear transformation by minimizing the within-class distance and maximizing the between-class distance simultaneously. The optimal transformation of LDA can be computed by solving a generalized
References
[1]
M. Turk and A. Pentland, “Eigenfaces for recognition,” Journal of Cognitive Neuroscience, vol. 3, no. 1, pp. 71–86, 1991.
[2]
K. Fukunaga, Introduction to Statistical Pattern Recognition, Computer Science and Scientific Computing, Academic Press, Boston, Mass, USA, 2nd edition, 1990.
[3]
C. M. Bishop, Pattern Recognition and Machine Learning, Information Science and Statistics, Springer, New York, NY, USA, 2006.
[4]
R. Q. Duda, P. E. Hart, and D. G. Stork, Pattern Classiffcation, John Wiley & Sons, New York, NY, USA, 2nd edition, 2001.
[5]
G. Kowalski, Information Retrieval Systems: Theory and Implementation, Kluwer Academic Publishers, Norwell, Mass, USA, 1997.
[6]
Z. Jin, J.-Y. Yang, Z.-S. Hu, and Z. Lou, “Face recognition based on the uncorrelated discriminant transformation,” Pattern Recognition, vol. 34, no. 7, pp. 1405–1416, 2001.
[7]
D. L. Swets and J. Weng, “Using discriminant eigenfeatures for image retrieval,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 8, pp. 831–836, 1996.
[8]
P. Baldi and G. W. Hatfield, DNA Microarrays and Gene Expression: From Experiments to Data Analysis and Modeling, Cambridge University Press, Cambridge, Mass, USA, 2002.
[9]
S. Dudoit, J. Fridlyand, and T. P. Speed, “Comparison of discrimination methods for the classification of tumors using gene expression data,” Journal of the American Statistical Association, vol. 97, no. 457, pp. 77–87, 2002.
[10]
A. K. Jain and R. C. Dubes, Algorithms for Clustering Data, Prentice Hall Advanced Reference Series, Prentice Hall, Englewood Cliffs, NJ, USA, 1988.
[11]
P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman, “Eigenfaces vs. fisherfaces: recognition using class specific linear projection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 711–720, 1997.
[12]
P. Howland and H. Park, “Generalizing discriminant analysis using the generalized singular value decomposition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 8, pp. 995–1006, 2004.
[13]
J. Ye, R. Janardan, C. H. Park, and H. Park, “An optimization criterion for generalized discriminant analysis on undersampled problems,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 8, pp. 982–994, 2004.
[14]
J. Ye and Q. Li, “A two-stage linear discriminant analysis via QR-decomposition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 6, pp. 929–941, 2005.
[15]
D.-Q. Dai and P. C. Yuen, “Regularized discriminant analysis and its application to face recognition,” Pattern Recognition, vol. 36, no. 3, pp. 845–847, 2003.
[16]
J. H. Friedman, “Regularized discriminant analysis,” Journal of the American Statistical Association, vol. 84, no. 405, pp. 165–175, 1989.
[17]
Y. Guo, T. Hastie, and R. Tibshirani, “Regularized linear discriminant analysis and its application in microarrays,” Biostatistics, vol. 8, no. 1, pp. 86–100, 2007.
[18]
W.-K. Ching, D. Chu, L.-Z. Liao, and X. Wang, “Regularized orthogonal linear discriminant analysis,” Pattern Recognition, vol. 45, no. 7, pp. 2719–2732, 2012.
[19]
D. Chu and S. T. Goh, “A new and fast orthogonal linear discriminant analysis on undersampled problems,” SIAM Journal on Scientific Computing, vol. 32, no. 4, pp. 2274–2297, 2010.
[20]
J. Ye, “Characterization of a family of algorithms for generalized discriminant analysis on undersampled problems,” Journal of Machine Learning Research, vol. 6, pp. 483–502, 2005.
[21]
L.-F. Chen, H.-Y. M. Liao, M.-T. Ko, J.-C. Lin, and G.-J. Yu, “New LDA-based face recognition system which can solve the small sample size problem,” Pattern Recognition, vol. 33, no. 10, pp. 1713–1726, 2000.
[22]
D. Chu and G. S. Thye, “A new and fast implementation for null space based linear discriminant analysis,” Pattern Recognition, vol. 43, no. 4, pp. 1373–1379, 2010.
[23]
D. Chu, S. T. Goh, and Y. S. Hung, “Characterization of all solutions for undersampled uncorrelated linear discriminant analysis problems,” SIAM Journal on Matrix Analysis and Applications, vol. 32, no. 3, pp. 820–844, 2011.
[24]
X. He and P. Niyogi, “Locality preserving projections,” Advances in Neural Information Processing Systems, vol. 16, pp. 153–160, 2004.
[25]
Z. Zheng, F. Yang, W. Tan, J. Jia, and J. Yang, “Gabor feature-based face recognition using supervised locality preserving projection,” Signal Processing, vol. 87, no. 10, pp. 2473–2483, 2007.
[26]
W. Yu, X. Teng, and C. Liu, “Face recognition using discriminant locality preserving projections,” Image and Vision Computing, vol. 24, no. 3, pp. 239–248, 2006.
[27]
L. Zhu and S. Zhu, “Face recognition based on orthogonal discriminant locality preserving projections,” Neurocomputing, vol. 70, no. 7–9, pp. 1543–1546, 2007.
[28]
J. Yang, D. Zhang, A. F. Frangi, and J.-Y. Yang, “Two-dimensional PCA: a new approach to appearance-based face representation and recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 1, pp. 131–137, 2004.
[29]
M. Li and B. Yuan, “2D-LDA: a statistical linear discriminant analysis for image matrix,” Pattern Recognition Letters, vol. 26, no. 5, pp. 527–532, 2005.
[30]
S. Chen, H. Zhao, M. Kong, and B. Luo, “2D-LPP: a two-dimensional extension of locality preserving projections,” Neurocomputing, vol. 70, no. 4–6, pp. 912–921, 2007.
[31]
W. Yu, “Two-dimensional discriminant locality preserving projections for face recognition,” Pattern Recognition Letters, vol. 30, no. 15, pp. 1378–1383, 2009.
[32]
J. Ye, “Generalized low rank approximations of matrices,” Machine Learning, vol. 61, no. 1–3, pp. 167–191, 2005.
[33]
C.-X. Ren and D.-Q. Dai, “Bilinear Lanczos components for fast dimensionality reduction and feature extraction,” Pattern Recognition, vol. 43, no. 11, pp. 3742–3752, 2010.
[34]
X. He and P. Niyogi, “Tensor subspace analysis,” in Advances in Neural Information Processing Systems, vol. 18, 2005.
[35]
S.-J. Wang, C.-G. Zhou, N. Zhang, X.-J. Peng, Y.-H. Chen, and X. Liu, “Face recognition using second-order discriminant tensor subspace analysis,” Neurocomputing, vol. 74, no. 12-13, pp. 2142–2156, 2011.
[36]
T. T. Ngo, M. Bellalij, and Y. Saad, “The trace ratio optimization problem for dimensionality reduction,” SIAM Journal on Matrix Analysis and Applications, vol. 31, no. 5, pp. 2950–2971, 2010.
[37]
C. T. Kelley, Solving Nonlinear Equations with Newton's Method, vol. 1 of Fundamentals of Algorithms, SIAM, Philadelphia, Pa, USA, 2003.
[38]
B. N. Parlett, The Symmetric Eigenvalue Problem, vol. 20 of Classics in Applied Mathematics, SIAM, Philadelphia, Pa, USA, 1998.