Abstract
Generic L2-norm-based linear discriminant analysis (LDA) is sensitive to outliers and only captures global structure information of sample points. In this paper, a new LDA-based feature extraction algorithm is proposed to integrate both global and local structure information via a unified L1-norm optimization framework. Unlike generic L2-norm-based LDA, the proposed algorithm explicitly incorporates the local structure information of sample points and is robust to outliers. It overcomes the problem of the singularity of within-class scatter matrix as well. Experiments on several popular datasets demonstrate the effectiveness of the proposed algorithm.
Similar content being viewed by others
References
Belhumeur PN, Hepanha JP, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720
Martinez AM, Kak AC (2001) PCA versus LDA. IEEE Trans Pattern Anal Mach Intell 23(2):228–233
Ji S-W, Ye J-P (2008) Generalized linear discriminant analysis: a unified frame-work and efficient model selection. IEEE Trans Neural Netw 19(10):1768–1782
Lu J-W, Plataniotis K, Venetsanopoulos A (2005) Regularization studies of linear discriminant analysis in small sample size scenarios with application to face recognition. Pattern Recogn Lett 26:181–191
Ye J, Li Q (2005) A two-stage linear discriminant analysis via QR decomposition. IEEE Trans Pattern Anal Mach Intell 27(6):929–941
Howland P, Park H (2004) Generalizing discriminant analysis using the generalized singular value decomposition. IEEE Trans Pattern Anal Mach Intell 26(8):995–1006
Ye J-P (2006) Computational and theoretical analysis of null space and orthogonal linear discriminant analysis. J Mach Learn Res 7:1183–1204
Chen L-F, Mark Liao HY, Ko MT, Lin J-C, Yu G-J (2000) A new LDA-based face recognition system which can solve the small sample size problem. Pattern Recogn 33:1713–1726
Yu H, Yang J (2001) A direct LDA algorithm for high-dimensional data-with application to face recognition. Pattern Recogn 34:2067–2070
Cai D, He X, Han J (2008) SRDA: an efficient algorithm for large-scale discriminant analysis. IEEE Trans Knowl Data Eng 20(4):1–12
Yang J, Yang J-Y (2003) Why can LDA be performed in PCA transformed space? Pattern Recogn 36:563–566
Hou C, Zhang C, Wu Y, Jiao Y (2009) Stable local dimensionality reduction approaches. Pattern Recogn 42(9):2054–2066
Gao Q, Xu H, Li Y, Xie D (2010) Two-dimensional supervised local similarity and diversity projection. Pattern Recogn 43(10):3359–3363
Gao Q, Liu J, Zhang H, Hou J, Yang X (2012) Enhanced fisher discriminant criterion for image recognition. Pattern Recogn 45(10):3717–3724
Zhang D, He J, Zhao Y, Luo Z, Minghui D (2014) Global plus local: a complete framework for feature extraction and recognition. Pattern Recogn 47(3):1433–1442
Black MJ, Rangarajan A (1996) On the unification of line processes, outlier rejection, and robust statistics with applications in early vision. Int J Comput Vis 19(1):57–91
Pang Y, Li X, Yuan Y (2010) Robust tensor analysis with L1-norm. IEEE Trans Circuits Syst Video Technol 20(2):172–178
Wang H, Tang Q, Zheng W (2012) L1-norm-based common spatial patterns. IEEE Trans Biomed Eng 59(3):653–662
Ke Q, Kanade T (2003) Robust subspace computation using L1 norm. Carnegie Mellon University, Pittsburgh, Technical Report, CMU-CS-03-172
Ke Q, Kanade T (2005) Robust L1 norm factorization in the presence of outliers and missing data by alternative convex programming. In: Proceedings of the IEEE international conferences computer vision pattern recognition, pp 739–746
Ding C, Zhou D, He X, Zha H (2006) R1-PCA: rotational invariant L1-norm principal component analysis for robust subspace factorization. In: Proceedings international conference on machine learning, pp 281–288
Kwak N (2008) Principal component analysis based on L1-norm maximization. IEEE Trans Pattern Anal Mach Intell 30(9):672–1680
Li XL, Pang YW, Yuan Y (2009) L1-norm-based 2DPCA. IEEE Trans Syst Man Cybern B Cybern 40(4):1170–1175
Li X, Hua W, Wang H, Zhang Z (2010) Linear discriminant analysis using rotational invariant L1 norm. Neurocomputing 73(13–15):2571–2579
Zhou F, Zhang J (2013) Linear discriminant analysis based on L1-norm maximization. IEEE Trans Image Process 22(8):3018–3027
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323
Belkin M, Niyogi P (2002) Laplacian eigenmaps and spectral techniques for embedding and clustering. Adv Neural Inf Process Syst 1:585–592
Yan S, Xu D, Zhang B, Zhang H, Yang Q, Lin S (2007) “Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Trans Pattern Anal Mach Intell 29(1):40–51
Cai H-P, Mikolajczyk K, Matas J (2011) Learning linear discriminant projections for dimensionality reduction of image descriptors. IEEE Trans Pattern Anal Mach Intell 33(2):338–352
Cui Y, Fan L (2012) A novel supervised dimensionality reduction algorithm: graph-based Fisher analysis. Pattern Recogn 45(4):1471–1481
He X, Niyogi P (2004) Locality preserving projections. Adv Neural Inf Process Syst 16:153–160
Shu X, Gao Y, Lu H (2012) Efficient linear discriminant analysis with locality preserving for face recognition. Pattern Recogn 45(5):1892–1898
Masashi S (2007) Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. J Mach Learn Res 8:1027–1061
Wang N, Tao D, Gao X, Liu W, Li J (2013) Transductive face sketch-photo synthesis. IEEE Trans Neural Netw Learn Syst 24(9):1364–1376
Yang J, Zhang D, Yang J-Y, Niu B (2007) Globally maximizing, locally minimizing: unsupervised discriminant projection with applications to face and palm biometrics. IEEE Trans Pattern Anal Mach Intell 29(4):650–664
Loog M, Duin RPW (2004) Heteroscedastic extension of LDA: the Chernoff criterion. IEEE Trans Pattern Anal Mach Intell 26(6):732–739
Li X, Hao Z, Huang H (2010) An evolution algorithm with sorted race mechanism for global optimization. In: Proceedings of international conference on machine learning and computing (ICMLC 2010), vol 3, pp 1550–1555
Frank A, Asuncion A(2010). UCI Machine Learning Repository. http://archive.ics.uci.edu/ml
Yang J, Frangi AF, Yang J, Zhang D, Jin Z (2005) KPCA plus LDA: a complete kernel Fisher discriminant framework for feature extraction and recognition. IEEE Trans Pattern Anal Mach Intell 27(2):230–244
Lu J, Plataniotis KN, Venetsanopoulos AN (2003) Face recognition using kernel direct discriminant analysis algorithms. IEEE Trans Neural Netw 14(1):117–126
The ORL Face Database. AT&T (Olivetti) Research Laboratories Cambridge, UK. http://www.uk.research.att.com/facedatabasetml
http://vision.ucsd.edu/~leekc/ExtYaleDatabase/Yale%20Face%20Database.htm
Tsai JT, Liu TK, Chou JH (2004) Hybrid Taguchi-genetic algorithm for global numerical optimization. IEEE Trans Evol Comput 8(4):365–377
Leung YW, Wang YP (2001) An orthogonal genetic algorithm with quantization for global numerical optimization. IEEE Trans Evol Comput 5(1):41–53
Jenatton R, Obozinski G, Bach F (2010) Structured sparse principal component analysis. In: Proceedings of the international conference artificial intelligence statistics, pp 1–8
Zhong F, Zhang J, Li D (2014) Discriminant locality preserving projections based on L1-norm maximization. IEEE Trans Neural Netw Learn Syst 25(11):2065–2074
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1
Here we will demonstrate that the objective function \(f({\mathbf{v}}(t)),\) defined by Eq. (23), is a nondecreasing function of t via Algorithm 1.
From the definition of similarity matrix S [Eq. (9) or Eq. (10)] and the definition of matrix W (Eq. (13), it is easy to verify that each entry of the total similarity matrix \({\mathbf{Q}}\) (defined by Eq. (21) is nonnegative. Thus, Eq. (23) can be rewritten as
We first rewrite the denominator of Eq. (33) as
where
On the other hand, with Eq. (24), the numerator of Eq. (33) can be rewritten as
where
By substituting Eqs. (34) and (37) into Eq. (33), we get
Since the calculation of the derivative of \(f({\mathbf{v}}(t))\) with respect to t is difficult, a surrogate function is thus introduced as follows
We can calculate the derivative of \(F({\varvec{\upxi}})\) with respect to \({\varvec{\upxi}}\) as follows
By substituting \({\varvec{\upxi}} = {\mathbf{v}}(t)\) back into Eq. (41), we get
By substituting Eqs. (34), (37) and (38) into Eq. (42), we get
By comparing Eq. (43) with Eq. (27), it is easy to find that \({\mathbf{e}}(t)\) is proportional to \({\mathbf{d}}({\mathbf{v}}(t)),\) which means \({\mathbf{e}}(t)\) is exactly the local ascending direction of \(F({\varvec{\upxi}})\) at point \({\mathbf{v}}(t).\) Based on this conclusion, if we update \({\mathbf{v}}(t)\) via Eq. (26) and substitute \({\mathbf{v}}(t) = {\varvec{\upxi}}\) back into Eq. (40), we have \(F({\mathbf{v}}(t + 1)) \ge F({\mathbf{v}}(t)),\) i.e.,
Since the right side of Eq. (44) is exactly \(f({\mathbf{v}}(t))\) [see Eq. (39)], to complete the justification of \(f({\mathbf{v}}(t + 1)) \ge f({\mathbf{v}}(t)),\) we need to demonstrate \(f({\mathbf{v}}(t + 1)) \ge F({\mathbf{v}}(t + 1)),\) i.e.,
To this end, we need the following lemma.
Lemma 1
For any vector \({\mathbf{y}} = (y_{1} , \ldots ,y_{N} )^{T} \in \Re^{N} ,\) the following equality holds [46]
and the minimum is uniquely reached at \(\theta_{i} = \left| {y_{i} } \right|\) for i = 1, …, N.
With Lemma 1, we get
For the denominator of the right side of Eq. (45), we have
Combining Eqs. (47) and (48), we reach
Moreover, by the definition of \(p_{k} (t)\) [see Eq. (24)], we know that \(p_{k} (t)\) is the polarity of \({\mathbf{v}}^{T} (t)({\varvec{\upmu}}^{k} - {\varvec{\upmu}})\); thus, \(p_{k} (t + 1){\mathbf{v}}^{T} (t + 1)({\varvec{\upmu}}^{k} - {\varvec{\upmu}}) \ge 0\) and \(|p_{k} (t)| = 1\) hold for any k and t. For the numerator of the right side of Eq. (45), we have
Combining Eqs. (49) and (50), we reach
Since
By combining Eqs. (44), (51) and (52), we reach the conclusion \(f({\mathbf{v}}(t + 1)) \ge f({\mathbf{v}}(t)),\) which means that the objective function \(f({\mathbf{v}}(t)),\) defined by Eq. (23), is a nondecreasing function of t via Algorithm 1.
Appendix 2
We will demonstrate that the learned projection vectors \({\mathbf{v}}_{l} { (}l = 1,2, \ldots, m)\) via Algorithm 3 are orthonormal as follows:
-
1.
Because the learned \({\mathbf{v}}_{l}\) via either Algorithm 1 or Algorithm 2 is a linear combination of \(({\varvec{\upmu}}^{k} - {\varvec{\upmu}})^{l}\) and \((\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )^{l} ,\) it is in the subspace spanned by \(({\varvec{\upmu}}^{k} - {\varvec{\upmu}})^{l}\) and \((\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )^{l} .\)
-
2.
By multiplying \({\mathbf{v}}_{l - 1}^{T}\) to both sides of Eqs. (29) and (30), we get
$$\begin{aligned} & {\mathbf{v}}_{l - 1}^{T} \, ({\varvec{\upmu}}^{k} - {\varvec{\upmu}})^{l} = {\mathbf{v}}_{l - 1}^{T} ({\varvec{\upmu}}^{k} - {\varvec{\upmu}})^{l - 1} - {\mathbf{v}}_{l - 1}^{T} {\mathbf{v}}_{l - 1} {\mathbf{v}}_{l - 1}^{\text{T}} ({\varvec{\upmu}}^{k} - {\varvec{\upmu}})^{l - 1} = 0 \\ & {\mathbf{v}}_{l - 1}^{T} (\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )^{l} = {\mathbf{v}}_{l - 1}^{T} (\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )^{l - 1} - {\mathbf{v}}_{l - 1}^{T} {\mathbf{v}}_{l - 1} {\mathbf{v}}_{l - 1}^{\text{T}} (\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )^{l - 1} = 0 \\ \end{aligned}$$ -
3.
From 2, \({\mathbf{v}}_{l - 1}\) is orthogonal to \(({\varvec{\upmu}}^{k} - {\varvec{\upmu}})^{l}\) and \((\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )^{l} ,\) which demonstrates that \({\mathbf{v}}_{l - 1}\) is also orthogonal to \({\mathbf{v}}_{l}\) based on 1.
-
4.
By recursively performing 2 and 3 for each \({\mathbf{v}}_{l} { (}l = m,m - 1, \ldots, 2),\) we get \({\mathbf{v}}_{l}^{T} {\mathbf{v}}_{k} = 0,{\text{ for }}l,k = 1,2, \ldots, m{\text{ and }}l \ne k .\)
-
5.
By Algorithms 1 and 2, each output \({\mathbf{v}}_{l} { (}l = 1,2, \ldots, m)\) is a unit vector because of the normalization operation inside these algorithms.
Summarize the above 5 points, we have
Appendix 3
Here we will prove that Eq. (32) is actually an extension of generic LDA algorithm where the within-class scatter matrix is modified to include the influence of local similarity.
From Eq. (6), the numerator of the right side of Eq. (32) can be rewritten as
By Eq. (16), we know that
On the other hand, following some simple algebraic steps, we see that
where \({\mathbf{D}} = {\text{diag}}(D_{11} , \ldots ,D_{nn} ),\,D_{ii} = \sum\nolimits_{j = 1}^{n} {S_{ij} } (i = 1, \ldots ,n)\) and \({\mathbf{S}}_{{\rm LW}} = {\mathbf{X}}({\mathbf{D}} - {\mathbf{S}}){\mathbf{X}}^{T}\) can be called as the local within-class scatter matrix to avoid confusing with the scatter matrices S B and S W in LDA.
By substituting Eqs. (53), (54) and (55) into Eq. (32), we get
where \(\varepsilon \, \in ( 0 , 1 )\) is a tuning parameter that controls the trade-off between global geometrical information and local geometrical information. If we define a total within-class scatter matrix \({\mathbf{S}}_{{\rm TW}}\) as:
Equation (56) can be further simplified as a generic LDA formulation:
Thus, complete our proof.
Rights and permissions
About this article
Cite this article
Zhang, D., Li, X., He, J. et al. A new linear discriminant analysis algorithm based on L1-norm maximization and locality preserving projection. Pattern Anal Applic 21, 685–701 (2018). https://doi.org/10.1007/s10044-017-0594-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-017-0594-y