Skip to main content
Log in

A new linear discriminant analysis algorithm based on L1-norm maximization and locality preserving projection

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Generic L2-norm-based linear discriminant analysis (LDA) is sensitive to outliers and only captures global structure information of sample points. In this paper, a new LDA-based feature extraction algorithm is proposed to integrate both global and local structure information via a unified L1-norm optimization framework. Unlike generic L2-norm-based LDA, the proposed algorithm explicitly incorporates the local structure information of sample points and is robust to outliers. It overcomes the problem of the singularity of within-class scatter matrix as well. Experiments on several popular datasets demonstrate the effectiveness of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Belhumeur PN, Hepanha JP, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720

    Article  Google Scholar 

  2. Martinez AM, Kak AC (2001) PCA versus LDA. IEEE Trans Pattern Anal Mach Intell 23(2):228–233

    Article  Google Scholar 

  3. Ji S-W, Ye J-P (2008) Generalized linear discriminant analysis: a unified frame-work and efficient model selection. IEEE Trans Neural Netw 19(10):1768–1782

    Article  Google Scholar 

  4. Lu J-W, Plataniotis K, Venetsanopoulos A (2005) Regularization studies of linear discriminant analysis in small sample size scenarios with application to face recognition. Pattern Recogn Lett 26:181–191

    Article  Google Scholar 

  5. Ye J, Li Q (2005) A two-stage linear discriminant analysis via QR decomposition. IEEE Trans Pattern Anal Mach Intell 27(6):929–941

    Article  Google Scholar 

  6. Howland P, Park H (2004) Generalizing discriminant analysis using the generalized singular value decomposition. IEEE Trans Pattern Anal Mach Intell 26(8):995–1006

    Article  Google Scholar 

  7. Ye J-P (2006) Computational and theoretical analysis of null space and orthogonal linear discriminant analysis. J Mach Learn Res 7:1183–1204

    MathSciNet  MATH  Google Scholar 

  8. Chen L-F, Mark Liao HY, Ko MT, Lin J-C, Yu G-J (2000) A new LDA-based face recognition system which can solve the small sample size problem. Pattern Recogn 33:1713–1726

    Article  Google Scholar 

  9. Yu H, Yang J (2001) A direct LDA algorithm for high-dimensional data-with application to face recognition. Pattern Recogn 34:2067–2070

    Article  MATH  Google Scholar 

  10. Cai D, He X, Han J (2008) SRDA: an efficient algorithm for large-scale discriminant analysis. IEEE Trans Knowl Data Eng 20(4):1–12

    Article  Google Scholar 

  11. Yang J, Yang J-Y (2003) Why can LDA be performed in PCA transformed space? Pattern Recogn 36:563–566

    Article  Google Scholar 

  12. Hou C, Zhang C, Wu Y, Jiao Y (2009) Stable local dimensionality reduction approaches. Pattern Recogn 42(9):2054–2066

    Article  MATH  Google Scholar 

  13. Gao Q, Xu H, Li Y, Xie D (2010) Two-dimensional supervised local similarity and diversity projection. Pattern Recogn 43(10):3359–3363

    Article  MATH  Google Scholar 

  14. Gao Q, Liu J, Zhang H, Hou J, Yang X (2012) Enhanced fisher discriminant criterion for image recognition. Pattern Recogn 45(10):3717–3724

    Article  Google Scholar 

  15. Zhang D, He J, Zhao Y, Luo Z, Minghui D (2014) Global plus local: a complete framework for feature extraction and recognition. Pattern Recogn 47(3):1433–1442

    Article  MATH  Google Scholar 

  16. Black MJ, Rangarajan A (1996) On the unification of line processes, outlier rejection, and robust statistics with applications in early vision. Int J Comput Vis 19(1):57–91

    Article  Google Scholar 

  17. Pang Y, Li X, Yuan Y (2010) Robust tensor analysis with L1-norm. IEEE Trans Circuits Syst Video Technol 20(2):172–178

    Article  Google Scholar 

  18. Wang H, Tang Q, Zheng W (2012) L1-norm-based common spatial patterns. IEEE Trans Biomed Eng 59(3):653–662

    Article  Google Scholar 

  19. Ke Q, Kanade T (2003) Robust subspace computation using L1 norm. Carnegie Mellon University, Pittsburgh, Technical Report, CMU-CS-03-172

  20. Ke Q, Kanade T (2005) Robust L1 norm factorization in the presence of outliers and missing data by alternative convex programming. In: Proceedings of the IEEE international conferences computer vision pattern recognition, pp 739–746

  21. Ding C, Zhou D, He X, Zha H (2006) R1-PCA: rotational invariant L1-norm principal component analysis for robust subspace factorization. In: Proceedings international conference on machine learning, pp 281–288

  22. Kwak N (2008) Principal component analysis based on L1-norm maximization. IEEE Trans Pattern Anal Mach Intell 30(9):672–1680

    Article  Google Scholar 

  23. Li XL, Pang YW, Yuan Y (2009) L1-norm-based 2DPCA. IEEE Trans Syst Man Cybern B Cybern 40(4):1170–1175

    Google Scholar 

  24. Li X, Hua W, Wang H, Zhang Z (2010) Linear discriminant analysis using rotational invariant L1 norm. Neurocomputing 73(13–15):2571–2579

    Article  Google Scholar 

  25. Zhou F, Zhang J (2013) Linear discriminant analysis based on L1-norm maximization. IEEE Trans Image Process 22(8):3018–3027

    Article  MathSciNet  MATH  Google Scholar 

  26. Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326

    Article  Google Scholar 

  27. Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323

    Article  Google Scholar 

  28. Belkin M, Niyogi P (2002) Laplacian eigenmaps and spectral techniques for embedding and clustering. Adv Neural Inf Process Syst 1:585–592

    Google Scholar 

  29. Yan S, Xu D, Zhang B, Zhang H, Yang Q, Lin S (2007) “Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Trans Pattern Anal Mach Intell 29(1):40–51

    Article  Google Scholar 

  30. Cai H-P, Mikolajczyk K, Matas J (2011) Learning linear discriminant projections for dimensionality reduction of image descriptors. IEEE Trans Pattern Anal Mach Intell 33(2):338–352

    Article  Google Scholar 

  31. Cui Y, Fan L (2012) A novel supervised dimensionality reduction algorithm: graph-based Fisher analysis. Pattern Recogn 45(4):1471–1481

    Article  MATH  Google Scholar 

  32. He X, Niyogi P (2004) Locality preserving projections. Adv Neural Inf Process Syst 16:153–160

    Google Scholar 

  33. Shu X, Gao Y, Lu H (2012) Efficient linear discriminant analysis with locality preserving for face recognition. Pattern Recogn 45(5):1892–1898

    Article  MATH  Google Scholar 

  34. Masashi S (2007) Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. J Mach Learn Res 8:1027–1061

    MATH  Google Scholar 

  35. Wang N, Tao D, Gao X, Liu W, Li J (2013) Transductive face sketch-photo synthesis. IEEE Trans Neural Netw Learn Syst 24(9):1364–1376

    Article  Google Scholar 

  36. Yang J, Zhang D, Yang J-Y, Niu B (2007) Globally maximizing, locally minimizing: unsupervised discriminant projection with applications to face and palm biometrics. IEEE Trans Pattern Anal Mach Intell 29(4):650–664

    Article  Google Scholar 

  37. Loog M, Duin RPW (2004) Heteroscedastic extension of LDA: the Chernoff criterion. IEEE Trans Pattern Anal Mach Intell 26(6):732–739

    Article  Google Scholar 

  38. Li X, Hao Z, Huang H (2010) An evolution algorithm with sorted race mechanism for global optimization. In: Proceedings of international conference on machine learning and computing (ICMLC 2010), vol 3, pp 1550–1555

  39. Frank A, Asuncion A(2010). UCI Machine Learning Repository. http://archive.ics.uci.edu/ml

  40. Yang J, Frangi AF, Yang J, Zhang D, Jin Z (2005) KPCA plus LDA: a complete kernel Fisher discriminant framework for feature extraction and recognition. IEEE Trans Pattern Anal Mach Intell 27(2):230–244

    Article  Google Scholar 

  41. Lu J, Plataniotis KN, Venetsanopoulos AN (2003) Face recognition using kernel direct discriminant analysis algorithms. IEEE Trans Neural Netw 14(1):117–126

    Article  Google Scholar 

  42. The ORL Face Database. AT&T (Olivetti) Research Laboratories Cambridge, UK. http://www.uk.research.att.com/facedatabasetml

  43. http://vision.ucsd.edu/~leekc/ExtYaleDatabase/Yale%20Face%20Database.htm

  44. Tsai JT, Liu TK, Chou JH (2004) Hybrid Taguchi-genetic algorithm for global numerical optimization. IEEE Trans Evol Comput 8(4):365–377

    Article  Google Scholar 

  45. Leung YW, Wang YP (2001) An orthogonal genetic algorithm with quantization for global numerical optimization. IEEE Trans Evol Comput 5(1):41–53

    Article  Google Scholar 

  46. Jenatton R, Obozinski G, Bach F (2010) Structured sparse principal component analysis. In: Proceedings of the international conference artificial intelligence statistics, pp 1–8

  47. Zhong F, Zhang J, Li D (2014) Discriminant locality preserving projections based on L1-norm maximization. IEEE Trans Neural Netw Learn Syst 25(11):2065–2074

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Di Zhang.

Appendices

Appendix 1

Here we will demonstrate that the objective function \(f({\mathbf{v}}(t)),\) defined by Eq. (23), is a nondecreasing function of t via Algorithm 1.

From the definition of similarity matrix S [Eq. (9) or Eq. (10)] and the definition of matrix W (Eq. (13), it is easy to verify that each entry of the total similarity matrix \({\mathbf{Q}}\) (defined by Eq. (21) is nonnegative. Thus, Eq. (23) can be rewritten as

$$f({\mathbf{v}}(t)) = \frac{{\sum\nolimits_{k = 1}^{C} {m_{k} } \left| {{\mathbf{v}}^{T} (t)({\varvec{\upmu}}^{k} - {\varvec{\upmu}})} \right|}}{{\frac{1}{2}\sum\nolimits_{i,j} {\left| {{\mathbf{v}}^{T} (t)(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )Q_{ij} } \right|} }}$$
(33)

We first rewrite the denominator of Eq. (33) as

$$\begin{aligned} & \frac{1}{2}\sum\limits_{i,j} {\left| {{\mathbf{v}}^{T} (t)(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )Q_{ij} } \right|} \\ & \quad = \frac{1}{4}{\mathbf{v}}^{T} (t)\left\{ {\sum\limits_{i,j} {\frac{{(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )^{T} Q_{ij}^{2} }}{{\left| {{\mathbf{v}}^{T} (t)(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )Q_{ij} } \right|}}} } \right\}{\mathbf{v}}(t) \, + \, \frac{1}{4}\sum\limits_{i,j} {\left| {{\mathbf{v}}^{T} (t)(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )Q_{ij} } \right|} \\ & \quad = \, \frac{1}{4}{\mathbf{v}}^{T} (t){\mathbf{W}}(t){\mathbf{v}}(t) \, + \, \frac{1}{4}\left\| {{\mathbf{A}}(t)} \right\|_{1} \\ \end{aligned}$$
(34)

where

$$A_{ij} (t) \, = \, {\mathbf{v}}^{T} (t)(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )Q_{ij}$$
(35)
$${\mathbf{W}}(t) \, = \, \sum\limits_{i,j} {\frac{{(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )^{T} Q_{ij}^{2} }}{{\left| {A_{ij} (t)} \right|}}}$$
(36)

On the other hand, with Eq. (24), the numerator of Eq. (33) can be rewritten as

$$\begin{aligned} \sum\limits_{k = 1}^{C} {m_{k} } \left| {{\mathbf{v}}^{T} (t)({\varvec{\upmu}}^{k} - {\varvec{\upmu}})} \right| & = \sum\limits_{k = 1}^{C} {m_{k} p_{k} (t){\mathbf{v}}^{T} (t)({\varvec{\upmu}}^{k} - {\varvec{\upmu}})} \, \\ & = \, {\mathbf{v}}^{T} (t)\sum\limits_{k = 1}^{C} {m_{k} p_{k} (t)({\varvec{\upmu}}^{k} - {\varvec{\upmu}})} \\ & = {\mathbf{v}}^{T} (t){\mathbf{B}}(t) \\ \end{aligned}$$
(37)

where

$${\mathbf{B}}(t) = \sum\limits_{k = 1}^{C} {m_{k} p_{k} (t)({\varvec{\upmu}}^{k} - {\varvec{\upmu}})}$$
(38)

By substituting Eqs. (34) and (37) into Eq. (33), we get

$$f({\mathbf{v}}(t)) = \frac{{{\mathbf{v}}^{T} (t){\mathbf{B}}(t)}}{{\frac{1}{4}{\mathbf{v}}^{T} (t){\mathbf{W}}(t){\mathbf{v}}(t) \, + \, \frac{1}{4}\left\| {{\mathbf{A}}(t)} \right\|_{1} }}$$
(39)

Since the calculation of the derivative of \(f({\mathbf{v}}(t))\) with respect to t is difficult, a surrogate function is thus introduced as follows

$$F({\varvec{\upxi}}) = \frac{{{\varvec{\upxi}}^{T} {\mathbf{B}}(t)}}{{\frac{1}{4}{\varvec{\upxi}}^{T} {\mathbf{W}}(t){\varvec{\upxi}} \, + \, \frac{1}{4}\left\| {{\mathbf{A}}(t)} \right\|_{1} }}$$
(40)

We can calculate the derivative of \(F({\varvec{\upxi}})\) with respect to \({\varvec{\upxi}}\) as follows

$${\mathbf{d}}({\varvec{\upxi}}) = \, \frac{{\partial F({\varvec{\upxi}})}}{{\partial {\varvec{\upxi}}}} \, = \, \frac{{\left( {\frac{1}{4}{\varvec{\upxi}}^{T} {\mathbf{W}}(t){\varvec{\upxi}} \, + \, \frac{1}{4}\left\| {{\mathbf{A}}(t)} \right\|_{1} } \right){\mathbf{B}}(t) - \, {\varvec{\upxi}}^{T} {\mathbf{B}}(t){\mathbf{W}}(t){\varvec{\upxi}} \, }}{{\left( {\frac{1}{4}{\varvec{\upxi}}^{T} {\mathbf{W}}(t){\varvec{\upxi}} \, + \, \frac{1}{4}\left\| {{\mathbf{A}}(t)} \right\|_{1} } \right)^{2} }}$$
(41)

By substituting \({\varvec{\upxi}} = {\mathbf{v}}(t)\) back into Eq. (41), we get

$${\mathbf{d}}({\mathbf{v}}(t)) = \, \frac{{\left( {\frac{1}{4}{\mathbf{v}}^{T} (t){\mathbf{W}}(t){\mathbf{v}}(t) + \, \frac{1}{4}\left\| {{\mathbf{A}}(t)} \right\|_{1} } \right){\mathbf{B}}(t) }}{{\left( {\frac{1}{4}{\mathbf{v}}^{T} (t){\mathbf{W}}(t){\mathbf{v}}(t) \, + \, \frac{1}{4}\left\| {{\mathbf{A}}(t)} \right\|_{1} } \right)^{2} }} - \frac{{ \, {\mathbf{v}}^{T} (t){\mathbf{B}}(t){\mathbf{W}}(t){\mathbf{v}}(t) \, }}{{\left( {\frac{1}{4}{\mathbf{v}}^{T} (t){\mathbf{W}}(t){\mathbf{v}}(t) \, + \, \frac{1}{4}\left\| {{\mathbf{A}}(t)} \right\|_{1} } \right)^{2} }} \,$$
(42)

By substituting Eqs. (34), (37) and (38) into Eq. (42), we get

$$\begin{aligned} {\mathbf{d}}({\mathbf{v}}(t)) & = \, \frac{{\sum\nolimits_{k = 1}^{C} {m_{k} p_{k} (t)({\varvec{\upmu}}^{k} - {\varvec{\upmu}})} }}{{\frac{1}{2}\sum\nolimits_{i,j} {\left| {{\mathbf{v}}^{T} (t)(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )Q_{ij} } \right|} }} - \frac{{ \, \left( {\sum\nolimits_{k = 1}^{C} {m_{k} } \left| {{\mathbf{v}}^{T} (t)({\varvec{\upmu}}^{k} - {\varvec{\upmu}})} \right|} \right) \, \left( {\sum\nolimits_{i,j} {q_{ij} (t)(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )Q_{ij} } } \right) \, }}{{\left( {\frac{1}{2}\sum\nolimits_{i,j} {\left| {{\mathbf{v}}^{T} (t)(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )Q_{ij} } \right|} } \right)^{2} }} \\ & \quad \propto \, \frac{{\sum\nolimits_{k = 1}^{C} {m_{k} } p_{k} (t)({\varvec{\upmu}}^{k} - {\varvec{\upmu}})}}{{\sum\nolimits_{k = 1}^{C} {m_{k} } \left| {{\mathbf{v}}^{T} (t)({\varvec{\upmu}}^{k} - {\varvec{\upmu}})} \right|}} - \frac{{\sum\nolimits_{i,j} {q_{ij} (t)(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )Q_{ij} } }}{{\sum\nolimits_{i,j} {\left| {{\mathbf{v}}^{T} (t)(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )} \right|Q_{ij} } }} \, \\ \end{aligned}$$
(43)

By comparing Eq. (43) with Eq. (27), it is easy to find that \({\mathbf{e}}(t)\) is proportional to \({\mathbf{d}}({\mathbf{v}}(t)),\) which means \({\mathbf{e}}(t)\) is exactly the local ascending direction of \(F({\varvec{\upxi}})\) at point \({\mathbf{v}}(t).\) Based on this conclusion, if we update \({\mathbf{v}}(t)\) via Eq. (26) and substitute \({\mathbf{v}}(t) = {\varvec{\upxi}}\) back into Eq. (40), we have \(F({\mathbf{v}}(t + 1)) \ge F({\mathbf{v}}(t)),\) i.e.,

$$\frac{{{\mathbf{v}}^{T} (t + 1){\mathbf{B}}(t)}}{{\frac{1}{4}{\mathbf{v}}^{T} (t + 1){\mathbf{W}}(t){\mathbf{v}}(t + 1) \, + \, \frac{1}{4}\left\| {{\mathbf{A}}(t)} \right\|_{1} }} \ge \frac{{{\mathbf{v}}^{T} (t){\mathbf{B}}(t)}}{{\frac{1}{4}{\mathbf{v}}^{T} (t){\mathbf{W}}(t){\mathbf{v}}(t) \, + \, \frac{1}{4}\left\| {{\mathbf{A}}(t)} \right\|_{1} }}$$
(44)

Since the right side of Eq. (44) is exactly \(f({\mathbf{v}}(t))\) [see Eq. (39)], to complete the justification of \(f({\mathbf{v}}(t + 1)) \ge f({\mathbf{v}}(t)),\) we need to demonstrate \(f({\mathbf{v}}(t + 1)) \ge F({\mathbf{v}}(t + 1)),\) i.e.,

$$f({\mathbf{v}}(t + 1)) \ge \frac{{{\mathbf{v}}^{T} (t + 1){\mathbf{B}}(t)}}{{\frac{1}{4}{\mathbf{v}}^{T} (t + 1){\mathbf{W}}(t){\mathbf{v}}(t + 1) \, + \, \frac{1}{4}\left\| {{\mathbf{A}}(t)} \right\|_{1} }}$$
(45)

To this end, we need the following lemma.

Lemma 1

For any vector \({\mathbf{y}} = (y_{1} , \ldots ,y_{N} )^{T} \in \Re^{N} ,\) the following equality holds [46]

$$\left\| {\mathbf{y}} \right\|_{1} = \sum\limits_{i = 1}^{N} {\left| {y_{i} } \right|} = \mathop {\hbox{min} }\limits_{{{\varvec{\uptheta}} \in \Re_{ + }^{N} }} \frac{1}{2}\sum\limits_{i = 1}^{N} {\frac{{y_{i}^{2} }}{{\theta_{i} }} + \frac{1}{2}\left\| {\varvec{\upzeta}} \right\|_{1} }$$
(46)

and the minimum is uniquely reached at \(\theta_{i} = \left| {y_{i} } \right|\) for i = 1, …, N.

With Lemma 1, we get

$$\frac{1}{2}\sum\limits_{i,j} {\left| {{\mathbf{v}}^{T} (t + 1)(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )Q_{ij} } \right|} \, = \, \mathop {\hbox{min} }\limits_{{{\varvec{\uptheta}} \in \Re_{ + }^{N} }} \frac{1}{4}\sum\limits_{i,j} {\frac{{({\mathbf{v}}^{T} (t + 1)(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )Q_{ij} )^{2} }}{{\theta_{i} }}} + \frac{1}{4}\left\| {\varvec{\upzeta}} \right\|_{1} \,$$
(47)

For the denominator of the right side of Eq. (45), we have

$$\begin{aligned} \frac{1}{4}{\mathbf{v}}^{T} (t + 1){\mathbf{W}}(t){\mathbf{v}}(t + 1) \, + \, \frac{1}{4}\left\| {{\mathbf{A}}(t)} \right\|_{1} & = \, \frac{1}{4}\sum\limits_{i,j} {\frac{{\left( {{\mathbf{v}}^{T} (t + 1)(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )Q_{ij} } \right)^{2} }}{{\left| {A_{ij} (t)} \right|}}} + \, \frac{1}{4}\left\| {{\mathbf{A}}(t)} \right\|_{1} \\ & \quad \ge \, \mathop {\hbox{min} }\limits_{{{\varvec{\uptheta}} \in \Re_{ + }^{N} }} \frac{1}{4}\sum\limits_{i,j} {\frac{{({\mathbf{v}}^{T} (t + 1)(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )Q_{ij} )^{2} }}{{\theta_{i} }}} + \frac{1}{4}\left\| {\varvec{\upzeta}} \right\|_{1} \, \\ \end{aligned}$$
(48)

Combining Eqs. (47) and (48), we reach

$$\frac{1}{2}\sum\limits_{i,j} {\left| {{\mathbf{v}}^{T} (t + 1)(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )Q_{ij} } \right|} \, \le \, \frac{1}{4}{\mathbf{v}}^{T} (t + 1){\mathbf{W}}(t){\mathbf{v}}(t + 1) \, + \, \frac{1}{4}\left\| {{\mathbf{A}}(t)} \right\|_{1}$$
(49)

Moreover, by the definition of \(p_{k} (t)\) [see Eq. (24)], we know that \(p_{k} (t)\) is the polarity of \({\mathbf{v}}^{T} (t)({\varvec{\upmu}}^{k} - {\varvec{\upmu}})\); thus, \(p_{k} (t + 1){\mathbf{v}}^{T} (t + 1)({\varvec{\upmu}}^{k} - {\varvec{\upmu}}) \ge 0\) and \(|p_{k} (t)| = 1\) hold for any k and t. For the numerator of the right side of Eq. (45), we have

$$\begin{aligned} {\mathbf{v}}^{T} (t + 1){\mathbf{B}}(t) \, & = {\mathbf{v}}^{T} (t + 1)\sum\limits_{k = 1}^{C} {m_{k} p_{k} (t)({\varvec{\upmu}}^{k} - {\varvec{\upmu}})} \, \\ & \le \, {\mathbf{v}}^{T} (t + 1)\sum\limits_{k = 1}^{C} {m_{k} p_{k} (t + 1)({\varvec{\upmu}}^{k} - {\varvec{\upmu}})} \\ & = \, \sum\limits_{k = 1}^{C} {m_{k} } \left| {{\mathbf{v}}^{T} (t + 1)({\varvec{\upmu}}^{k} - {\varvec{\upmu}})} \right| \\ \end{aligned}$$
(50)

Combining Eqs. (49) and (50), we reach

$$\frac{{\sum\nolimits_{k = 1}^{C} {m_{k} } \left| {{\mathbf{v}}^{T} (t + 1)({\varvec{\upmu}}^{k} - {\varvec{\upmu}})} \right|}}{{\frac{1}{2}\sum\nolimits_{i,j} {\left| {{\mathbf{v}}^{T} (t + 1)(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )} \right|Q_{ij} } }} \ge \frac{{{\mathbf{v}}^{T} (t + 1){\mathbf{B}}(t)}}{{\frac{1}{4}{\mathbf{v}}^{T} (t + 1){\mathbf{W}}(t){\mathbf{v}}(t + 1) \, + \, \frac{1}{4}\left\| {{\mathbf{A}}(t)} \right\|_{1} }}$$
(51)

Since

$$f({\mathbf{v}}(t + 1)) = \frac{{\sum\nolimits_{k = 1}^{C} {m_{k} } \left| {{\mathbf{v}}^{T} (t + 1)({\varvec{\upmu}}^{k} - {\varvec{\upmu}})} \right|}}{{\frac{1}{2}\sum\nolimits_{i,j} {\left| {{\mathbf{v}}^{T} (t + 1)(\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )} \right|Q_{ij} } }}$$
(52)

By combining Eqs. (44), (51) and (52), we reach the conclusion \(f({\mathbf{v}}(t + 1)) \ge f({\mathbf{v}}(t)),\) which means that the objective function \(f({\mathbf{v}}(t)),\) defined by Eq. (23), is a nondecreasing function of t via Algorithm 1.

Appendix 2

We will demonstrate that the learned projection vectors \({\mathbf{v}}_{l} { (}l = 1,2, \ldots, m)\) via Algorithm 3 are orthonormal as follows:

  1. 1.

    Because the learned \({\mathbf{v}}_{l}\) via either Algorithm 1 or Algorithm 2 is a linear combination of \(({\varvec{\upmu}}^{k} - {\varvec{\upmu}})^{l}\) and \((\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )^{l} ,\) it is in the subspace spanned by \(({\varvec{\upmu}}^{k} - {\varvec{\upmu}})^{l}\) and \((\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )^{l} .\)

  2. 2.

    By multiplying \({\mathbf{v}}_{l - 1}^{T}\) to both sides of Eqs. (29) and (30), we get

    $$\begin{aligned} & {\mathbf{v}}_{l - 1}^{T} \, ({\varvec{\upmu}}^{k} - {\varvec{\upmu}})^{l} = {\mathbf{v}}_{l - 1}^{T} ({\varvec{\upmu}}^{k} - {\varvec{\upmu}})^{l - 1} - {\mathbf{v}}_{l - 1}^{T} {\mathbf{v}}_{l - 1} {\mathbf{v}}_{l - 1}^{\text{T}} ({\varvec{\upmu}}^{k} - {\varvec{\upmu}})^{l - 1} = 0 \\ & {\mathbf{v}}_{l - 1}^{T} (\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )^{l} = {\mathbf{v}}_{l - 1}^{T} (\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )^{l - 1} - {\mathbf{v}}_{l - 1}^{T} {\mathbf{v}}_{l - 1} {\mathbf{v}}_{l - 1}^{\text{T}} (\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )^{l - 1} = 0 \\ \end{aligned}$$
  3. 3.

    From 2, \({\mathbf{v}}_{l - 1}\) is orthogonal to \(({\varvec{\upmu}}^{k} - {\varvec{\upmu}})^{l}\) and \((\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )^{l} ,\) which demonstrates that \({\mathbf{v}}_{l - 1}\) is also orthogonal to \({\mathbf{v}}_{l}\) based on 1.

  4. 4.

    By recursively performing 2 and 3 for each \({\mathbf{v}}_{l} { (}l = m,m - 1, \ldots, 2),\) we get \({\mathbf{v}}_{l}^{T} {\mathbf{v}}_{k} = 0,{\text{ for }}l,k = 1,2, \ldots, m{\text{ and }}l \ne k .\)

  5. 5.

    By Algorithms 1 and 2, each output \({\mathbf{v}}_{l} { (}l = 1,2, \ldots, m)\) is a unit vector because of the normalization operation inside these algorithms.

Summarize the above 5 points, we have

$${\mathbf{v}}_{l}^{T} {\mathbf{v}}_{k} = \delta_{lk} ,\quad \, l,k = 1,2, \ldots, m$$

Appendix 3

Here we will prove that Eq. (32) is actually an extension of generic LDA algorithm where the within-class scatter matrix is modified to include the influence of local similarity.

From Eq. (6), the numerator of the right side of Eq. (32) can be rewritten as

$$\sum\limits_{k = 1}^{C} {m_{k} } ||{\mathbf{v}}^{T} ({\varvec{\upmu}}^{k} - {\varvec{\upmu}})||_{2}^{2} = {\mathbf{v}}^{T} {\mathbf{S}}_{B} {\mathbf{v}}$$
(53)

By Eq. (16), we know that

$$\frac{(1 - \varepsilon )}{2}\sum\limits_{i,j} {\left\| {{\mathbf{v}}^{T} (\overline{{\mathbf{x}}}_{i} - \overline{{\mathbf{x}}}_{j} )} \right\|_{2}^{2} } W_{ij} = (1 - \varepsilon ){\mathbf{v}}^{T} {\mathbf{S}}_{{\rm W}} {\mathbf{v}}$$
(54)

On the other hand, following some simple algebraic steps, we see that

$$\begin{aligned} & \frac{\varepsilon }{2}\sum\limits_{i,j} {\left\| {{\mathbf{v}}^{T} {\mathbf{x}}_{i} - {\mathbf{v}}^{T} {\mathbf{x}}_{j} } \right\|_{2}^{2} S_{ij} } \\ & \quad = \frac{\varepsilon }{2}\sum\limits_{i,j} { ({\mathbf{v}}^{T} {\mathbf{x}}_{i} - {\mathbf{v}}^{T} {\mathbf{x}}_{j} )({\mathbf{v}}^{T} {\mathbf{x}}_{i} - {\mathbf{v}}^{T} {\mathbf{x}}_{j} )^{T} S_{ij} } \\ & \quad = \frac{\varepsilon }{2}\left\{ {{\mathbf{v}}^{T} \left( {\sum\limits_{i,j} {({\mathbf{x}}_{i} - {\mathbf{x}}_{j} )S_{ij} ({\mathbf{x}}_{i} - {\mathbf{x}}_{j} )^{T} } } \right){\mathbf{v}}} \right\} \\ & \quad = \frac{\varepsilon }{2}\left\{ {{\mathbf{v}}^{T} \left( {2\sum\limits_{i,j} {{\mathbf{x}}_{i} S_{ij} {\mathbf{x}}_{i}^{T} } - 2\sum\limits_{i,j} {{\mathbf{x}}_{i} S_{ij} {\mathbf{x}}_{j}^{T} } } \right){\mathbf{v}}} \right\} \\ & \quad = \varepsilon \, {\mathbf{v}}^{T} \left( {{\mathbf{XDX}}^{T} - {\mathbf{XSX}}^{T} } \right) \, {\mathbf{v}} = \varepsilon \, ^{T} {\mathbf{S}}_{{\rm LW}} {\mathbf{v}} \\ \end{aligned}$$
(55)

where \({\mathbf{D}} = {\text{diag}}(D_{11} , \ldots ,D_{nn} ),\,D_{ii} = \sum\nolimits_{j = 1}^{n} {S_{ij} } (i = 1, \ldots ,n)\) and \({\mathbf{S}}_{{\rm LW}} = {\mathbf{X}}({\mathbf{D}} - {\mathbf{S}}){\mathbf{X}}^{T}\) can be called as the local within-class scatter matrix to avoid confusing with the scatter matrices S B and S W in LDA.

By substituting Eqs. (53), (54) and (55) into Eq. (32), we get

$$\mathop {\arg \hbox{max} }\limits_{{\mathbf{v}}} \frac{{{\mathbf{v}}^{T} {\mathbf{S}}_{B} {\mathbf{v}}}}{{{\mathbf{v}}^{T} [(1 - \varepsilon ){\mathbf{S}}_{{\rm W}} + \varepsilon {\mathbf{S}}_{{\rm LW}} ]{\mathbf{v}}}}$$
(56)

where \(\varepsilon \, \in ( 0 , 1 )\) is a tuning parameter that controls the trade-off between global geometrical information and local geometrical information. If we define a total within-class scatter matrix \({\mathbf{S}}_{{\rm TW}}\) as:

$${\mathbf{S}}_{{\rm TW}} = (1 - \varepsilon ){\mathbf{S}}_{{\rm W}} + \varepsilon {\mathbf{S}}_{{\rm LW}}$$
(57)

Equation (56) can be further simplified as a generic LDA formulation:

$$\mathop {\arg\,\hbox{max} }\limits_{{\mathbf{v}}} \frac{{{\mathbf{v}}^{T} {\mathbf{S}}_{{\rm B}} {\mathbf{v}}}}{{{\mathbf{v}}^{T} {\mathbf{S}}_{{\rm TW}} {\mathbf{v}}}}$$
(58)

Thus, complete our proof.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, D., Li, X., He, J. et al. A new linear discriminant analysis algorithm based on L1-norm maximization and locality preserving projection. Pattern Anal Applic 21, 685–701 (2018). https://doi.org/10.1007/s10044-017-0594-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-017-0594-y

Keywords

Navigation