A new linear discriminant analysis algorithm based on L1-norm maximization and locality preserving projection

458 Accesses
7 Citations
Explore all metrics

Abstract

Generic L2-norm-based linear discriminant analysis (LDA) is sensitive to outliers and only captures global structure information of sample points. In this paper, a new LDA-based feature extraction algorithm is proposed to integrate both global and local structure information via a unified L1-norm optimization framework. Unlike generic L2-norm-based LDA, the proposed algorithm explicitly incorporates the local structure information of sample points and is robust to outliers. It overcomes the problem of the singularity of within-class scatter matrix as well. Experiments on several popular datasets demonstrate the effectiveness of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust discriminant analysis with adaptive locality preserving

Article 17 January 2019

Linear Discriminant Analysis with Maximum Correntropy Criterion

A New Manifold-Based Feature Extraction Method

References

Belhumeur PN, Hepanha JP, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720
Article Google Scholar
Martinez AM, Kak AC (2001) PCA versus LDA. IEEE Trans Pattern Anal Mach Intell 23(2):228–233
Article Google Scholar
Ji S-W, Ye J-P (2008) Generalized linear discriminant analysis: a unified frame-work and efficient model selection. IEEE Trans Neural Netw 19(10):1768–1782
Article Google Scholar
Lu J-W, Plataniotis K, Venetsanopoulos A (2005) Regularization studies of linear discriminant analysis in small sample size scenarios with application to face recognition. Pattern Recogn Lett 26:181–191
Article Google Scholar
Ye J, Li Q (2005) A two-stage linear discriminant analysis via QR decomposition. IEEE Trans Pattern Anal Mach Intell 27(6):929–941
Article Google Scholar
Howland P, Park H (2004) Generalizing discriminant analysis using the generalized singular value decomposition. IEEE Trans Pattern Anal Mach Intell 26(8):995–1006
Article Google Scholar
Ye J-P (2006) Computational and theoretical analysis of null space and orthogonal linear discriminant analysis. J Mach Learn Res 7:1183–1204
MathSciNet MATH Google Scholar
Chen L-F, Mark Liao HY, Ko MT, Lin J-C, Yu G-J (2000) A new LDA-based face recognition system which can solve the small sample size problem. Pattern Recogn 33:1713–1726
Article Google Scholar
Yu H, Yang J (2001) A direct LDA algorithm for high-dimensional data-with application to face recognition. Pattern Recogn 34:2067–2070
Article MATH Google Scholar
Cai D, He X, Han J (2008) SRDA: an efficient algorithm for large-scale discriminant analysis. IEEE Trans Knowl Data Eng 20(4):1–12
Article Google Scholar
Yang J, Yang J-Y (2003) Why can LDA be performed in PCA transformed space? Pattern Recogn 36:563–566
Article Google Scholar
Hou C, Zhang C, Wu Y, Jiao Y (2009) Stable local dimensionality reduction approaches. Pattern Recogn 42(9):2054–2066
Article MATH Google Scholar
Gao Q, Xu H, Li Y, Xie D (2010) Two-dimensional supervised local similarity and diversity projection. Pattern Recogn 43(10):3359–3363
Article MATH Google Scholar
Gao Q, Liu J, Zhang H, Hou J, Yang X (2012) Enhanced fisher discriminant criterion for image recognition. Pattern Recogn 45(10):3717–3724
Article Google Scholar
Zhang D, He J, Zhao Y, Luo Z, Minghui D (2014) Global plus local: a complete framework for feature extraction and recognition. Pattern Recogn 47(3):1433–1442
Article MATH Google Scholar
Black MJ, Rangarajan A (1996) On the unification of line processes, outlier rejection, and robust statistics with applications in early vision. Int J Comput Vis 19(1):57–91
Article Google Scholar
Pang Y, Li X, Yuan Y (2010) Robust tensor analysis with L1-norm. IEEE Trans Circuits Syst Video Technol 20(2):172–178
Article Google Scholar
Wang H, Tang Q, Zheng W (2012) L1-norm-based common spatial patterns. IEEE Trans Biomed Eng 59(3):653–662
Article Google Scholar
Ke Q, Kanade T (2003) Robust subspace computation using L1 norm. Carnegie Mellon University, Pittsburgh, Technical Report, CMU-CS-03-172
Ke Q, Kanade T (2005) Robust L1 norm factorization in the presence of outliers and missing data by alternative convex programming. In: Proceedings of the IEEE international conferences computer vision pattern recognition, pp 739–746
Ding C, Zhou D, He X, Zha H (2006) R1-PCA: rotational invariant L1-norm principal component analysis for robust subspace factorization. In: Proceedings international conference on machine learning, pp 281–288
Kwak N (2008) Principal component analysis based on L1-norm maximization. IEEE Trans Pattern Anal Mach Intell 30(9):672–1680
Article Google Scholar
Li XL, Pang YW, Yuan Y (2009) L1-norm-based 2DPCA. IEEE Trans Syst Man Cybern B Cybern 40(4):1170–1175
Google Scholar
Li X, Hua W, Wang H, Zhang Z (2010) Linear discriminant analysis using rotational invariant L1 norm. Neurocomputing 73(13–15):2571–2579
Article Google Scholar
Zhou F, Zhang J (2013) Linear discriminant analysis based on L1-norm maximization. IEEE Trans Image Process 22(8):3018–3027
Article MathSciNet MATH Google Scholar
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Article Google Scholar
Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323
Article Google Scholar
Belkin M, Niyogi P (2002) Laplacian eigenmaps and spectral techniques for embedding and clustering. Adv Neural Inf Process Syst 1:585–592
Google Scholar
Yan S, Xu D, Zhang B, Zhang H, Yang Q, Lin S (2007) “Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Trans Pattern Anal Mach Intell 29(1):40–51
Article Google Scholar
Cai H-P, Mikolajczyk K, Matas J (2011) Learning linear discriminant projections for dimensionality reduction of image descriptors. IEEE Trans Pattern Anal Mach Intell 33(2):338–352
Article Google Scholar
Cui Y, Fan L (2012) A novel supervised dimensionality reduction algorithm: graph-based Fisher analysis. Pattern Recogn 45(4):1471–1481
Article MATH Google Scholar
He X, Niyogi P (2004) Locality preserving projections. Adv Neural Inf Process Syst 16:153–160
Google Scholar
Shu X, Gao Y, Lu H (2012) Efficient linear discriminant analysis with locality preserving for face recognition. Pattern Recogn 45(5):1892–1898
Article MATH Google Scholar
Masashi S (2007) Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. J Mach Learn Res 8:1027–1061
MATH Google Scholar
Wang N, Tao D, Gao X, Liu W, Li J (2013) Transductive face sketch-photo synthesis. IEEE Trans Neural Netw Learn Syst 24(9):1364–1376
Article Google Scholar
Yang J, Zhang D, Yang J-Y, Niu B (2007) Globally maximizing, locally minimizing: unsupervised discriminant projection with applications to face and palm biometrics. IEEE Trans Pattern Anal Mach Intell 29(4):650–664
Article Google Scholar
Loog M, Duin RPW (2004) Heteroscedastic extension of LDA: the Chernoff criterion. IEEE Trans Pattern Anal Mach Intell 26(6):732–739
Article Google Scholar
Li X, Hao Z, Huang H (2010) An evolution algorithm with sorted race mechanism for global optimization. In: Proceedings of international conference on machine learning and computing (ICMLC 2010), vol 3, pp 1550–1555
Frank A, Asuncion A(2010). UCI Machine Learning Repository. http://archive.ics.uci.edu/ml
Yang J, Frangi AF, Yang J, Zhang D, Jin Z (2005) KPCA plus LDA: a complete kernel Fisher discriminant framework for feature extraction and recognition. IEEE Trans Pattern Anal Mach Intell 27(2):230–244
Article Google Scholar
Lu J, Plataniotis KN, Venetsanopoulos AN (2003) Face recognition using kernel direct discriminant analysis algorithms. IEEE Trans Neural Netw 14(1):117–126
Article Google Scholar
The ORL Face Database. AT&T (Olivetti) Research Laboratories Cambridge, UK. http://www.uk.research.att.com/facedatabasetml
http://vision.ucsd.edu/~leekc/ExtYaleDatabase/Yale%20Face%20Database.htm
Tsai JT, Liu TK, Chou JH (2004) Hybrid Taguchi-genetic algorithm for global numerical optimization. IEEE Trans Evol Comput 8(4):365–377
Article Google Scholar
Leung YW, Wang YP (2001) An orthogonal genetic algorithm with quantization for global numerical optimization. IEEE Trans Evol Comput 5(1):41–53
Article Google Scholar
Jenatton R, Obozinski G, Bach F (2010) Structured sparse principal component analysis. In: Proceedings of the international conference artificial intelligence statistics, pp 1–8
Zhong F, Zhang J, Li D (2014) Discriminant locality preserving projections based on L1-norm maximization. IEEE Trans Neural Netw Learn Syst 25(11):2065–2074
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Engineering, Guangdong Medical University, Dongguan, China
Di Zhang & Xueqiang Li
Department of Physics, Shaoguan University, Shaoguan, China
Jiazhong He
School of Electronics and Information, South China University of Technology, Guangzhou, China
Minghui Du

Authors

Di Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xueqiang Li
View author publications
You can also search for this author in PubMed Google Scholar
Jiazhong He
View author publications
You can also search for this author in PubMed Google Scholar
Minghui Du
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Di Zhang.

Appendices

Appendix 1

Here we will demonstrate that the objective function $f({\mathbf{v}}(t)),$ defined by Eq. (23), is a nondecreasing function of t via Algorithm 1.

From the definition of similarity matrix S [Eq. (9) or Eq. (10)] and the definition of matrix W (Eq. (13), it is easy to verify that each entry of the total similarity matrix ${\mathbf{Q}}$ (defined by Eq. (21) is nonnegative. Thus, Eq. (23) can be rewritten as

$$f({\mathbf{v}}(t)) = \frac{{\sum\nolimits_{k = 1}^{C} {m_{k} } \left| {{\mathbf{v}}^{T} (t)({\varvec{\upmu}}^{k} - {\varvec{\upmu}})} \right|}}{{\frac{1}{2}\sum\nolimits_{i,j} {\left| {{\mathbf{v}}^{T} (t)(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )Q_{ij} } \right|} }}$$

(33)

We first rewrite the denominator of Eq. (33) as

$$\begin{aligned} & \frac{1}{2}\sum\limits_{i,j} {\left| {{\mathbf{v}}^{T} (t)(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )Q_{ij} } \right|} \\ & \quad = \frac{1}{4}{\mathbf{v}}^{T} (t)\left\{ {\sum\limits_{i,j} {\frac{{(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )^{T} Q_{ij}^{2} }}{{\left| {{\mathbf{v}}^{T} (t)(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )Q_{ij} } \right|}}} } \right\}{\mathbf{v}}(t) \, + \, \frac{1}{4}\sum\limits_{i,j} {\left| {{\mathbf{v}}^{T} (t)(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )Q_{ij} } \right|} \\ & \quad = \, \frac{1}{4}{\mathbf{v}}^{T} (t){\mathbf{W}}(t){\mathbf{v}}(t) \, + \, \frac{1}{4}\left\| {{\mathbf{A}}(t)} \right\|_{1} \\ \end{aligned}$$

(34)

where

$$A_{ij} (t) \, = \, {\mathbf{v}}^{T} (t)(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )Q_{ij}$$

(35)

$${\mathbf{W}}(t) \, = \, \sum\limits_{i,j} {\frac{{(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )^{T} Q_{ij}^{2} }}{{\left| {A_{ij} (t)} \right|}}}$$

(36)

On the other hand, with Eq. (24), the numerator of Eq. (33) can be rewritten as

$$\begin{aligned} \sum\limits_{k = 1}^{C} {m_{k} } \left| {{\mathbf{v}}^{T} (t)({\varvec{\upmu}}^{k} - {\varvec{\upmu}})} \right| & = \sum\limits_{k = 1}^{C} {m_{k} p_{k} (t){\mathbf{v}}^{T} (t)({\varvec{\upmu}}^{k} - {\varvec{\upmu}})} \, \\ & = \, {\mathbf{v}}^{T} (t)\sum\limits_{k = 1}^{C} {m_{k} p_{k} (t)({\varvec{\upmu}}^{k} - {\varvec{\upmu}})} \\ & = {\mathbf{v}}^{T} (t){\mathbf{B}}(t) \\ \end{aligned}$$

(37)

where

$${\mathbf{B}}(t) = \sum\limits_{k = 1}^{C} {m_{k} p_{k} (t)({\varvec{\upmu}}^{k} - {\varvec{\upmu}})}$$

(38)

By substituting Eqs. (34) and (37) into Eq. (33), we get

$$f({\mathbf{v}}(t)) = \frac{{{\mathbf{v}}^{T} (t){\mathbf{B}}(t)}}{{\frac{1}{4}{\mathbf{v}}^{T} (t){\mathbf{W}}(t){\mathbf{v}}(t) \, + \, \frac{1}{4}\left\| {{\mathbf{A}}(t)} \right\|_{1} }}$$

(39)

Since the calculation of the derivative of $f({\mathbf{v}}(t))$ with respect to t is difficult, a surrogate function is thus introduced as follows

$$F({\varvec{\upxi}}) = \frac{{{\varvec{\upxi}}^{T} {\mathbf{B}}(t)}}{{\frac{1}{4}{\varvec{\upxi}}^{T} {\mathbf{W}}(t){\varvec{\upxi}} \, + \, \frac{1}{4}\left\| {{\mathbf{A}}(t)} \right\|_{1} }}$$

(40)

We can calculate the derivative of $F({\varvec{\upxi}})$ with respect to ${\varvec{\upxi}}$ as follows

$${\mathbf{d}}({\varvec{\upxi}}) = \, \frac{{\partial F({\varvec{\upxi}})}}{{\partial {\varvec{\upxi}}}} \, = \, \frac{{\left( {\frac{1}{4}{\varvec{\upxi}}^{T} {\mathbf{W}}(t){\varvec{\upxi}} \, + \, \frac{1}{4}\left\| {{\mathbf{A}}(t)} \right\|_{1} } \right){\mathbf{B}}(t) - \, {\varvec{\upxi}}^{T} {\mathbf{B}}(t){\mathbf{W}}(t){\varvec{\upxi}} \, }}{{\left( {\frac{1}{4}{\varvec{\upxi}}^{T} {\mathbf{W}}(t){\varvec{\upxi}} \, + \, \frac{1}{4}\left\| {{\mathbf{A}}(t)} \right\|_{1} } \right)^{2} }}$$

(41)

By substituting ${\varvec{\upxi}} = {\mathbf{v}}(t)$ back into Eq. (41), we get

$${\mathbf{d}}({\mathbf{v}}(t)) = \, \frac{{\left( {\frac{1}{4}{\mathbf{v}}^{T} (t){\mathbf{W}}(t){\mathbf{v}}(t) + \, \frac{1}{4}\left\| {{\mathbf{A}}(t)} \right\|_{1} } \right){\mathbf{B}}(t) }}{{\left( {\frac{1}{4}{\mathbf{v}}^{T} (t){\mathbf{W}}(t){\mathbf{v}}(t) \, + \, \frac{1}{4}\left\| {{\mathbf{A}}(t)} \right\|_{1} } \right)^{2} }} - \frac{{ \, {\mathbf{v}}^{T} (t){\mathbf{B}}(t){\mathbf{W}}(t){\mathbf{v}}(t) \, }}{{\left( {\frac{1}{4}{\mathbf{v}}^{T} (t){\mathbf{W}}(t){\mathbf{v}}(t) \, + \, \frac{1}{4}\left\| {{\mathbf{A}}(t)} \right\|_{1} } \right)^{2} }} \,$$

(42)

By substituting Eqs. (34), (37) and (38) into Eq. (42), we get

$$\begin{aligned} {\mathbf{d}}({\mathbf{v}}(t)) & = \, \frac{{\sum\nolimits_{k = 1}^{C} {m_{k} p_{k} (t)({\varvec{\upmu}}^{k} - {\varvec{\upmu}})} }}{{\frac{1}{2}\sum\nolimits_{i,j} {\left| {{\mathbf{v}}^{T} (t)(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )Q_{ij} } \right|} }} - \frac{{ \, \left( {\sum\nolimits_{k = 1}^{C} {m_{k} } \left| {{\mathbf{v}}^{T} (t)({\varvec{\upmu}}^{k} - {\varvec{\upmu}})} \right|} \right) \, \left( {\sum\nolimits_{i,j} {q_{ij} (t)(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )Q_{ij} } } \right) \, }}{{\left( {\frac{1}{2}\sum\nolimits_{i,j} {\left| {{\mathbf{v}}^{T} (t)(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )Q_{ij} } \right|} } \right)^{2} }} \\ & \quad \propto \, \frac{{\sum\nolimits_{k = 1}^{C} {m_{k} } p_{k} (t)({\varvec{\upmu}}^{k} - {\varvec{\upmu}})}}{{\sum\nolimits_{k = 1}^{C} {m_{k} } \left| {{\mathbf{v}}^{T} (t)({\varvec{\upmu}}^{k} - {\varvec{\upmu}})} \right|}} - \frac{{\sum\nolimits_{i,j} {q_{ij} (t)(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )Q_{ij} } }}{{\sum\nolimits_{i,j} {\left| {{\mathbf{v}}^{T} (t)(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )} \right|Q_{ij} } }} \, \\ \end{aligned}$$

(43)

By comparing Eq. (43) with Eq. (27), it is easy to find that ${\mathbf{e}}(t)$ is proportional to ${\mathbf{d}}({\mathbf{v}}(t)),$ which means ${\mathbf{e}}(t)$ is exactly the local ascending direction of $F({\varvec{\upxi}})$ at point ${\mathbf{v}}(t).$ Based on this conclusion, if we update ${\mathbf{v}}(t)$ via Eq. (26) and substitute ${\mathbf{v}}(t) = {\varvec{\upxi}}$ back into Eq. (40), we have $F({\mathbf{v}}(t + 1)) \ge F({\mathbf{v}}(t)),$ i.e.,

$$\frac{{{\mathbf{v}}^{T} (t + 1){\mathbf{B}}(t)}}{{\frac{1}{4}{\mathbf{v}}^{T} (t + 1){\mathbf{W}}(t){\mathbf{v}}(t + 1) \, + \, \frac{1}{4}\left\| {{\mathbf{A}}(t)} \right\|_{1} }} \ge \frac{{{\mathbf{v}}^{T} (t){\mathbf{B}}(t)}}{{\frac{1}{4}{\mathbf{v}}^{T} (t){\mathbf{W}}(t){\mathbf{v}}(t) \, + \, \frac{1}{4}\left\| {{\mathbf{A}}(t)} \right\|_{1} }}$$

(44)

Since the right side of Eq. (44) is exactly $f({\mathbf{v}}(t))$ [see Eq. (39)], to complete the justification of $f({\mathbf{v}}(t + 1)) \ge f({\mathbf{v}}(t)),$ we need to demonstrate $f({\mathbf{v}}(t + 1)) \ge F({\mathbf{v}}(t + 1)),$ i.e.,

$$f({\mathbf{v}}(t + 1)) \ge \frac{{{\mathbf{v}}^{T} (t + 1){\mathbf{B}}(t)}}{{\frac{1}{4}{\mathbf{v}}^{T} (t + 1){\mathbf{W}}(t){\mathbf{v}}(t + 1) \, + \, \frac{1}{4}\left\| {{\mathbf{A}}(t)} \right\|_{1} }}$$

(45)

To this end, we need the following lemma.

Lemma 1

For any vector ${\mathbf{y}} = (y_{1} , \ldots ,y_{N} )^{T} \in \Re^{N} ,$ the following equality holds [46]

$$\left\| {\mathbf{y}} \right\|_{1} = \sum\limits_{i = 1}^{N} {\left| {y_{i} } \right|} = \mathop {\hbox{min} }\limits_{{{\varvec{\uptheta}} \in \Re_{ + }^{N} }} \frac{1}{2}\sum\limits_{i = 1}^{N} {\frac{{y_{i}^{2} }}{{\theta_{i} }} + \frac{1}{2}\left\| {\varvec{\upzeta}} \right\|_{1} }$$

(46)

and the minimum is uniquely reached at $\theta_{i} = \left| {y_{i} } \right|$ for i = 1, …, N.

With Lemma 1, we get

$$\frac{1}{2}\sum\limits_{i,j} {\left| {{\mathbf{v}}^{T} (t + 1)(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )Q_{ij} } \right|} \, = \, \mathop {\hbox{min} }\limits_{{{\varvec{\uptheta}} \in \Re_{ + }^{N} }} \frac{1}{4}\sum\limits_{i,j} {\frac{{({\mathbf{v}}^{T} (t + 1)(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )Q_{ij} )^{2} }}{{\theta_{i} }}} + \frac{1}{4}\left\| {\varvec{\upzeta}} \right\|_{1} \,$$

(47)

For the denominator of the right side of Eq. (45), we have

$$\begin{aligned} \frac{1}{4}{\mathbf{v}}^{T} (t + 1){\mathbf{W}}(t){\mathbf{v}}(t + 1) \, + \, \frac{1}{4}\left\| {{\mathbf{A}}(t)} \right\|_{1} & = \, \frac{1}{4}\sum\limits_{i,j} {\frac{{\left( {{\mathbf{v}}^{T} (t + 1)(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )Q_{ij} } \right)^{2} }}{{\left| {A_{ij} (t)} \right|}}} + \, \frac{1}{4}\left\| {{\mathbf{A}}(t)} \right\|_{1} \\ & \quad \ge \, \mathop {\hbox{min} }\limits_{{{\varvec{\uptheta}} \in \Re_{ + }^{N} }} \frac{1}{4}\sum\limits_{i,j} {\frac{{({\mathbf{v}}^{T} (t + 1)(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )Q_{ij} )^{2} }}{{\theta_{i} }}} + \frac{1}{4}\left\| {\varvec{\upzeta}} \right\|_{1} \, \\ \end{aligned}$$

(48)

Combining Eqs. (47) and (48), we reach

$$\frac{1}{2}\sum\limits_{i,j} {\left| {{\mathbf{v}}^{T} (t + 1)(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )Q_{ij} } \right|} \, \le \, \frac{1}{4}{\mathbf{v}}^{T} (t + 1){\mathbf{W}}(t){\mathbf{v}}(t + 1) \, + \, \frac{1}{4}\left\| {{\mathbf{A}}(t)} \right\|_{1}$$

(49)

Moreover, by the definition of $p_{k} (t)$ [see Eq. (24)], we know that $p_{k} (t)$ is the polarity of ${\mathbf{v}}^{T} (t)({\varvec{\upmu}}^{k} - {\varvec{\upmu}})$; thus, $p_{k} (t + 1){\mathbf{v}}^{T} (t + 1)({\varvec{\upmu}}^{k} - {\varvec{\upmu}}) \ge 0$ and $|p_{k} (t)| = 1$ hold for any k and t. For the numerator of the right side of Eq. (45), we have

$$\begin{aligned} {\mathbf{v}}^{T} (t + 1){\mathbf{B}}(t) \, & = {\mathbf{v}}^{T} (t + 1)\sum\limits_{k = 1}^{C} {m_{k} p_{k} (t)({\varvec{\upmu}}^{k} - {\varvec{\upmu}})} \, \\ & \le \, {\mathbf{v}}^{T} (t + 1)\sum\limits_{k = 1}^{C} {m_{k} p_{k} (t + 1)({\varvec{\upmu}}^{k} - {\varvec{\upmu}})} \\ & = \, \sum\limits_{k = 1}^{C} {m_{k} } \left| {{\mathbf{v}}^{T} (t + 1)({\varvec{\upmu}}^{k} - {\varvec{\upmu}})} \right| \\ \end{aligned}$$

(50)

Combining Eqs. (49) and (50), we reach

$$\frac{{\sum\nolimits_{k = 1}^{C} {m_{k} } \left| {{\mathbf{v}}^{T} (t + 1)({\varvec{\upmu}}^{k} - {\varvec{\upmu}})} \right|}}{{\frac{1}{2}\sum\nolimits_{i,j} {\left| {{\mathbf{v}}^{T} (t + 1)(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )} \right|Q_{ij} } }} \ge \frac{{{\mathbf{v}}^{T} (t + 1){\mathbf{B}}(t)}}{{\frac{1}{4}{\mathbf{v}}^{T} (t + 1){\mathbf{W}}(t){\mathbf{v}}(t + 1) \, + \, \frac{1}{4}\left\| {{\mathbf{A}}(t)} \right\|_{1} }}$$

(51)

Since

$$f({\mathbf{v}}(t + 1)) = \frac{{\sum\nolimits_{k = 1}^{C} {m_{k} } \left| {{\mathbf{v}}^{T} (t + 1)({\varvec{\upmu}}^{k} - {\varvec{\upmu}})} \right|}}{{\frac{1}{2}\sum\nolimits_{i,j} {\left| {{\mathbf{v}}^{T} (t + 1)(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )} \right|Q_{ij} } }}$$

(52)

By combining Eqs. (44), (51) and (52), we reach the conclusion $f({\mathbf{v}}(t + 1)) \ge f({\mathbf{v}}(t)),$ which means that the objective function $f({\mathbf{v}}(t)),$ defined by Eq. (23), is a nondecreasing function of t via Algorithm 1.

Appendix 2

We will demonstrate that the learned projection vectors ${\mathbf{v}}_{l} { (}l = 1,2, \ldots, m)$ via Algorithm 3 are orthonormal as follows:

1.
Because the learned ${\mathbf{v}}_{l}$ via either Algorithm 1 or Algorithm 2 is a linear combination of $({\varvec{\upmu}}^{k} - {\varvec{\upmu}})^{l}$ and $(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )^{l} ,$ it is in the subspace spanned by $({\varvec{\upmu}}^{k} - {\varvec{\upmu}})^{l}$ and $(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )^{l} .$
2.
By multiplying ${\mathbf{v}}_{l - 1}^{T}$ to both sides of Eqs. (29) and (30), we get
$$\begin{aligned} & {\mathbf{v}}_{l - 1}^{T} \, ({\varvec{\upmu}}^{k} - {\varvec{\upmu}})^{l} = {\mathbf{v}}_{l - 1}^{T} ({\varvec{\upmu}}^{k} - {\varvec{\upmu}})^{l - 1} - {\mathbf{v}}_{l - 1}^{T} {\mathbf{v}}_{l - 1} {\mathbf{v}}_{l - 1}^{\text{T}} ({\varvec{\upmu}}^{k} - {\varvec{\upmu}})^{l - 1} = 0 \\ & {\mathbf{v}}_{l - 1}^{T} (\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )^{l} = {\mathbf{v}}_{l - 1}^{T} (\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )^{l - 1} - {\mathbf{v}}_{l - 1}^{T} {\mathbf{v}}_{l - 1} {\mathbf{v}}_{l - 1}^{\text{T}} (\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )^{l - 1} = 0 \\ \end{aligned}$$
3.
From 2, ${\mathbf{v}}_{l - 1}$ is orthogonal to $({\varvec{\upmu}}^{k} - {\varvec{\upmu}})^{l}$ and $(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )^{l} ,$ which demonstrates that ${\mathbf{v}}_{l - 1}$ is also orthogonal to ${\mathbf{v}}_{l}$ based on 1.
4.
By recursively performing 2 and 3 for each ${\mathbf{v}}_{l} { (}l = m,m - 1, \ldots, 2),$ we get ${\mathbf{v}}_{l}^{T} {\mathbf{v}}_{k} = 0,{\text{ for }}l,k = 1,2, \ldots, m{\text{ and }}l \ne k .$
5.
By Algorithms 1 and 2, each output ${\mathbf{v}}_{l} { (}l = 1,2, \ldots, m)$ is a unit vector because of the normalization operation inside these algorithms.

Summarize the above 5 points, we have

$${\mathbf{v}}_{l}^{T} {\mathbf{v}}_{k} = \delta_{lk} ,\quad \, l,k = 1,2, \ldots, m$$

Appendix 3

Here we will prove that Eq. (32) is actually an extension of generic LDA algorithm where the within-class scatter matrix is modified to include the influence of local similarity.

From Eq. (6), the numerator of the right side of Eq. (32) can be rewritten as

$$\sum\limits_{k = 1}^{C} {m_{k} } ||{\mathbf{v}}^{T} ({\varvec{\upmu}}^{k} - {\varvec{\upmu}})||_{2}^{2} = {\mathbf{v}}^{T} {\mathbf{S}}_{B} {\mathbf{v}}$$

(53)

By Eq. (16), we know that

$$\frac{(1 - \varepsilon )}{2}\sum\limits_{i,j} {\left\| {{\mathbf{v}}^{T} (\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )} \right\|_{2}^{2} } W_{ij} = (1 - \varepsilon ){\mathbf{v}}^{T} {\mathbf{S}}_{{\rm W}} {\mathbf{v}}$$

(54)

On the other hand, following some simple algebraic steps, we see that

$$\begin{aligned} & \frac{\varepsilon }{2}\sum\limits_{i,j} {\left\| {{\mathbf{v}}^{T} {\mathbf{x}}_{i} - {\mathbf{v}}^{T} {\mathbf{x}}_{j} } \right\|_{2}^{2} S_{ij} } \\ & \quad = \frac{\varepsilon }{2}\sum\limits_{i,j} { ({\mathbf{v}}^{T} {\mathbf{x}}_{i} - {\mathbf{v}}^{T} {\mathbf{x}}_{j} )({\mathbf{v}}^{T} {\mathbf{x}}_{i} - {\mathbf{v}}^{T} {\mathbf{x}}_{j} )^{T} S_{ij} } \\ & \quad = \frac{\varepsilon }{2}\left\{ {{\mathbf{v}}^{T} \left( {\sum\limits_{i,j} {({\mathbf{x}}_{i} - {\mathbf{x}}_{j} )S_{ij} ({\mathbf{x}}_{i} - {\mathbf{x}}_{j} )^{T} } } \right){\mathbf{v}}} \right\} \\ & \quad = \frac{\varepsilon }{2}\left\{ {{\mathbf{v}}^{T} \left( {2\sum\limits_{i,j} {{\mathbf{x}}_{i} S_{ij} {\mathbf{x}}_{i}^{T} } - 2\sum\limits_{i,j} {{\mathbf{x}}_{i} S_{ij} {\mathbf{x}}_{j}^{T} } } \right){\mathbf{v}}} \right\} \\ & \quad = \varepsilon \, {\mathbf{v}}^{T} \left( {{\mathbf{XDX}}^{T} - {\mathbf{XSX}}^{T} } \right) \, {\mathbf{v}} = \varepsilon \, ^{T} {\mathbf{S}}_{{\rm LW}} {\mathbf{v}} \\ \end{aligned}$$

(55)

where ${\mathbf{D}} = {\text{diag}}(D_{11} , \ldots ,D_{nn} ),\,D_{ii} = \sum\nolimits_{j = 1}^{n} {S_{ij} } (i = 1, \ldots ,n)$ and ${\mathbf{S}}_{{\rm LW}} = {\mathbf{X}}({\mathbf{D}} - {\mathbf{S}}){\mathbf{X}}^{T}$ can be called as the local within-class scatter matrix to avoid confusing with the scatter matrices S _B and S _W in LDA.

By substituting Eqs. (53), (54) and (55) into Eq. (32), we get

$$\mathop {\arg \hbox{max} }\limits_{{\mathbf{v}}} \frac{{{\mathbf{v}}^{T} {\mathbf{S}}_{B} {\mathbf{v}}}}{{{\mathbf{v}}^{T} [(1 - \varepsilon ){\mathbf{S}}_{{\rm W}} + \varepsilon {\mathbf{S}}_{{\rm LW}} ]{\mathbf{v}}}}$$

(56)

where $\varepsilon \, \in ( 0 , 1 )$ is a tuning parameter that controls the trade-off between global geometrical information and local geometrical information. If we define a total within-class scatter matrix ${\mathbf{S}}_{{\rm TW}}$ as:

$${\mathbf{S}}_{{\rm TW}} = (1 - \varepsilon ){\mathbf{S}}_{{\rm W}} + \varepsilon {\mathbf{S}}_{{\rm LW}}$$

(57)

Equation (56) can be further simplified as a generic LDA formulation:

$$\mathop {\arg\,\hbox{max} }\limits_{{\mathbf{v}}} \frac{{{\mathbf{v}}^{T} {\mathbf{S}}_{{\rm B}} {\mathbf{v}}}}{{{\mathbf{v}}^{T} {\mathbf{S}}_{{\rm TW}} {\mathbf{v}}}}$$

(58)

Thus, complete our proof.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, D., Li, X., He, J. et al. A new linear discriminant analysis algorithm based on L1-norm maximization and locality preserving projection. Pattern Anal Applic 21, 685–701 (2018). https://doi.org/10.1007/s10044-017-0594-y

Download citation

Received: 04 June 2015
Accepted: 18 January 2017
Published: 07 February 2017
Issue Date: August 2018
DOI: https://doi.org/10.1007/s10044-017-0594-y

A new linear discriminant analysis algorithm based on L1-norm maximization and locality preserving projection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Robust discriminant analysis with adaptive locality preserving

Linear Discriminant Analysis with Maximum Correntropy Criterion

A New Manifold-Based Feature Extraction Method

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1

Lemma 1

Appendix 2

Appendix 3

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A new linear discriminant analysis algorithm based on L1-norm maximization and locality preserving projection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Robust discriminant analysis with adaptive locality preserving

Linear Discriminant Analysis with Maximum Correntropy Criterion

A New Manifold-Based Feature Extraction Method

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1

Lemma 1

Appendix 2

Appendix 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation