Document Zbl 1498.68270

Unique sharp local minimum in \(\ell_1\)-minimization complete dictionary learning. (English) Zbl 1498.68270

J. Mach. Learn. Res. 21, Paper No. 63, 52 p. (2020).

Summary: We study the problem of globally recovering a dictionary from a set of signals via \(\ell_1\)-minimization. We assume that the signals are generated as i.i.d. random linear combinations of the \(K\) atoms from a complete reference dictionary \(D^*\in \mathbb{R}^{K\times K}\), where the linear combination coefficients are from either a Bernoulli type model or exact sparse model. First, we obtain a necessary and sufficient norm condition for the reference dictionary \(D^*\) to be a sharp local minimum of the expected \(\ell_1\) objective function. Our result substantially extends that of the last two authors [J. Mach. Learn. Res. 18, Paper No. 168, 56 p. (2018; Zbl 1467.68155)] and allows the combination coefficient to be non-negative. Secondly, we obtain an explicit bound on the region within which the objective value of the reference dictionary is minimal. Thirdly, we show that the reference dictionary is the unique sharp local minimum, thus establishing the first known global property of \(\ell_1\)-minimization dictionary learning. Motivated by the theoretical results, we introduce a perturbation based test to determine whether a dictionary is a sharp local minimum of the objective function. In addition, we also propose a new dictionary learning algorithm based on Block Coordinate Descent, called DL-BCD, which is guaranteed to decrease the obective function monotonically. Simulation studies show that DL-BCD has competitive performance in terms of recovery rate compared to other state-of-the-art dictionary learning algorithms when the reference dictionary is generated from random Gaussian matrices.

Cited in 1 Document

MSC:

68T05	Learning and adaptive systems in artificial intelligence
62H12	Estimation in multivariate analysis
62J05	Linear regression; mixed models
90C26	Nonconvex programming, global optimization

Keywords:

dictionary learning; \(\ell_1\)-minimization; local and global identifiability; nonconvex optimization; sharp local minimum

Citations:

Zbl 1467.68155

Cite Review PDF

Full Text: arXiv Link

References:

[1]	Alekh Agarwal, Animashree Anandkumar, and Praneeth Netrapalli. A Clustering Approach to Learn Sparsely-Used Overcomplete Dictionaries.arXiv preprint, arxiv:1309(1952):1- 31, 2013. ISSN 0018-9448. doi: 10.1109/TIT.2016.2614684. · Zbl 1359.62229
[2]	Alekh Agarwal, Animashree Anandkumar, Prateek Jain, and Praneeth Netrapalli. Learning Sparsely Used Overcomplete Dictionaries via Alternating Minimization.SIAM Journal on Optimization, 26(4):2775-2799, 2014. · Zbl 1358.90104
[3]	Michal Aharon, Michael Elad, and Alfred M Bruckstein. K-SVD and its Non-Negative Variant for Dictionary Design.International Society for Optics and Photonics, 5914: 591411, 2005. · Zbl 1096.68042
[4]	Michal Aharon, Michael Elad, and Alfred Bruckstein. K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation.IEEE Transactions on Signal Processing, 54(11):4311-4322, 2006. ISSN 1053587X. doi: 10.1109/TSP.2006.881199. · Zbl 1375.94040
[5]	Sanjeev Arora, Aditya Bhaskara, Rong Ge, and Tengyu Ma. More algorithms for provable dictionary learning.arXiv preprint arXiv:1401.0579, page 23, 2014a.
[6]	Sanjeev Arora, Rong Ge, and Ankur Moitra. New algorithms for learning incoherent and overcomplete dictionaries. InConference on Learning Theory, pages 779-806, 2014b.
[7]	Sanjeev Arora, Rong Ge, Tengyu Ma, and Ankur Moitra. Simple, Efficient, and Neural Algorithms for Sparse Coding.arXiv:1503.00778 [cs, stat], 2015. ISSN 15337928.
[8]	A. M. Bagirov, L. Jin, N. Karmitsa, A. Al Nuaimat, and N. Sultanova. Subgradient method for nonconvex nonsmooth optimization.Journal of Optimization Theory and Applications, 157(2):416-435, May 2013.ISSN 1573-2878.doi: 10.1007/s10957-012-0167-6. URLhttps://doi.org/10.1007/s10957-012-0167-6. · Zbl 1282.90133
[9]	Boaz Barak, Jonathan A Kelner, and David Steurer.Dictionary Learning and Tensor Decomposition via the Sum-of-Squares Method.Proceedings of the Forty-seventh Annual ACM Symposium on Theory of Computing, pages 143-151, 2014. ISSN 07378017. doi: 10.1145/2746539.2746605. · Zbl 1321.68396
[10]	Jean-Philippe Brunet, Pablo Tamayo, Todd R. Golub, and Jill P. Mesirov. Metagenes and molecular pattern discovery using matrix factorization.Proceedings of the National Academy of Sciences, 101(12):4164-4169, 2004. ISSN 0027-8424. doi: 10.1073/pnas. 0308531101.
[11]	Emmanuel J. Candes, Michael B. Wakin, and Stephen P. Boyd. Enhancing sparsity by reweighted L1 minimization.Journal of Fourier analysis and applications, 14(5):877- 905, 2008. · Zbl 1176.94014
[12]	Niladri S. Chatterji and Peter L. Bartlett. Alternating minimization for dictionary learning with random initialization. InAdvances in Neural Information Processing Systems, pages 1994-2003, 2017.
[13]	Scott Shaobing Chen, David L Donoho, and Michael A Saunders. Atomic decomposition by basis pursuit.SIAM review, 43(1):129-159, 2001. · Zbl 0979.94010
[14]	Pierre Comon. Independent component analysis, a new concept?Signal processing, 36(3): 287-314, 1994. · Zbl 0791.62004
[15]	Anulekha Dhara and Joydeep Dutta.Optimality conditions in convex optimization: a finitedimensional view. CRC Press, 2011. · Zbl 1251.90002
[16]	Michael Elad and Michal Aharon. Image denoising via sparse and redundant representations over learned dictionaries.Image Processing, IEEE Transactions on, 15(12):3736-3745, 2006.
[17]	J-J Fuchs. On sparse representations in arbitrary redundant bases.IEEE transactions on Information theory, 50(6):1341-1344, 2004. · Zbl 1284.94018
[18]	Rong Ge, Jason D. Lee, and Tengyu Ma. Matrix Completion has No Spurious Local Minimum.Proceedings of the 30th International Conference on Neural Information Processing Systems, pages 1-27, 2016. ISSN 10495258.
[19]	Quan Geng, John Wright, and Huan Wang. On the local correctness of L1-minimization for dictionary learning. InInformation Theory (ISIT), 2014 IEEE International Symposium on, pages 3180-3184. IEEE, 2014.
[20]	R´emi Gribonval and Karin Schnass. Dictionary Identification - Sparse Matrix-Factorisation via L1-Minimisation.Information Theory, IEEE Transactions on, 56(7):3523-3539, 2010. · Zbl 1366.62104
[21]	R´emi Gribonval, Rodolphe Jenatton, and Francis Bach. Sparse and Spurious: Dictionary Learning With Noise and Outliers.IEEE Transactions on Information Theory, 2015. ISSN 00189448. doi: 10.1109/TIT.2015.2472522. · Zbl 1359.94096
[22]	Sepp Hochreiter and J¨urgen Schmidhuber. Flat minima.Neural Computation, 9(1):1-42, 1997. · Zbl 0872.68150
[23]	Patrik O. Hoyer. Non-negative sparse coding.Neural Networks for Signal Processing Proceedings of the IEEE Workshop, 2002-Janua:557-565, 2002. ISSN 0780376161. doi: 10.1109/NNSP.2002.1030067.
[24]	Kenneth Kreutz-delgado, Joseph F. Murray, Bhaskar D. Rao, Kjersti Engan, Te-Won Lee, and Terrence J. Sejnowski. Dictionary learning algorithms for sparse representation.Neural computation, 15(2):349-96, 2003. ISSN 0899-7667. doi: 10.1162/089976603762552951. · Zbl 1047.68101
[25]	Daniel Lee and Sebastian Seung. Algorithms for Non-negative Matrix Factorization. In Advances in Neural Information Processing Systems 13, 2001. ISBN 9781424418206. doi: 10.1109/IJCNN.2008.4634046.
[26]	Sylvain Lesage, R´emi Gribonval, Fr´ed´eric Bimbot, and Laurent Benaroya. Learning unions of orthonormal bases with thresholded singular value decomposition. InAcoustics, Speech, and Signal Processing, 2005. Proceedings.(ICASSP’05). IEEE International Conference on, volume 5, pages v—-293. IEEE, IEEE, 2005. doi: 10.1109/ICASSP.2005.1416298i.
[27]	Julien Mairal, Francis Bach, Jean Ponce, and Guillermo Sapiro. Online Dictionary Learning for Sparse Coding.Proceedings of the 26th Annual International Conference on Machine Learning, pages 689-696, 2009a. · Zbl 1242.62087
[28]	Julien Mairal, Francis R. Bach, Jean Ponce, Guillermo Sapiro, and Andrew Zisserman. Supervised dictionary learning. InAdvances in neural information processing systems, pages 1033-1040, 2009b.
[29]	Julien Mairal, Rodolphe Jenatton, Francis Bach, Jean Ponce, Guillaume Obozinski, Bin Yu, Guillermo Sapiro, and Zaid Harchaoui. SPAMS : a SPArse Modeling Software , v2.5, 2014.
[30]	Saralees Nadarajah and Samuel Kotz. On the linear combination of laplace random variables.Probab. Eng. Inf. Sci., 19(4):463-470, October 2005.ISSN 0269-9648.doi: 10.1017/S0269964805050291. · Zbl 1336.62052
[31]	Bruno A. Olshausen and David J. Field. Emergence of simple-cell receptive field properties by learning a sparse code for natural images.Nature, 381(6583):607-609, 1996.
[32]	Bruno A. Olshausen and David J. Field. Sparse coding with an overcomplete basis set: A strategy employed by V1?Vision research, 37(23):3311-3325, 1997. ISSN 00426989. doi: 10.1016/S0042-6989(97)00169-7.
[33]	Jason T. Parker and Philip Schniter. Parametric Bilinear Generalized Approximate Message Passing. InIEEE Journal on Selected Topics in Signal Processing, volume 10, pages 795- 808, 2016. ISBN 1053-587X. doi: 10.1109/JSTSP.2016.2539123.
[34]	Jason T. Parker, Philip Schniter, and Volkan Cevher. Bilinear generalized approximate message passing—Part I: Derivation.IEEE Transactions on Signal Processing, 62(22): 5839-5853, 2014a. · Zbl 1394.94447
[35]	Jason T. Parker, Philip Schniter, and Volkan Cevher. Bilinear generalized approximate message passing—Part II: Applications.IEEE Transactions on Signal Processing, 62 (22):5854-5867, 2014b. · Zbl 1394.94448
[36]	Gabriel Peyr´e. Sparse modeling of textures.Journal of Mathematical Imaging and Vision, 34(1):17-31, 2009.
[37]	Boris Teodorovic Polyak. Sharp Minima. InInstitute of Control Sciences Lecture Notes, Moscow IIASA Workshop On Generalized Lagrangians and Their Applications, IIASA, Laxenburg, Austria, 1979.
[38]	Qiang Qiu, Vishal M Patel, and Rama Chellappa. Information-theoretic Dictionary Learning for Image Classification.Pattern Analysis and Machine Intelligence, IEEE Transactions on, 36(11):2173-2184, 2014.
[39]	Joseph Lee Rodgers, W. Alan Nicewander, and Larry Toothaker. Linearly independent, orthogonal, and uncorrelated variables.American Statistician, 38(2):133-134, 1984. ISSN 15372731. doi: 10.1080/00031305.1984.10483183.
[40]	Ron Rubinstein, Alfred M. Bruckstein, and Michael Elad. Dictionaries for sparse representation modeling.Proceedings of the IEEE, 98(6):1045-1057, 2010. ISSN 00189219. doi: 10.1109/JPROC.2010.2040551.
[41]	Karin Schnass. On the identifiability of overcomplete dictionaries via the minimisation principle underlying K-SVD.Applied and Computational Harmonic Analysis, 37(3):464- 491, 2014. ISSN 1096603X. doi: 10.1016/j.acha.2014.01.005. · Zbl 1297.94018
[42]	Karin Schnass. Local Identification of Overcomplete Dictionaries.Journal of Machine Learning Research, 16:1211-1242, 2015. ISSN 15337928. · Zbl 1351.68232
[43]	Daniel A. Spielman, Huan Wang, and John Wright. Exact recovery of sparsely-used dictionaries. InProceedings of the Twenty-Third international joint conference on Artificial Intelligence, pages 3087-3090. AAAI Press, 2013.
[44]	Nathan Srebro and Tommi Jaakkola. Weighted low-rank approximations. InProceedings of the Twentieth International Conference on Machine Learning, volume 3, pages 720-727, 2003. ISBN 1-57735-189-4. doi: 10.1.1.5.3301.
[45]	Ju Sun, Qing Qu, and John Wright. Complete Dictionary Recovery Over the Sphere I: Overview and the Geometric Picture.IEEE Transactions on Information Theory, 63(2): 853-884, 2017a. ISSN 00189448. doi: 10.1109/TIT.2016.2632162. · Zbl 1364.94164
[46]	Ju Sun, Qing Qu, and John Wright. Complete Dictionary Recovery Over the Sphere II: Recovery by Riemannian Trust-Region Method.IEEE Transactions on Information Theory, 63(2):885-914, 2017b. ISSN 00189448. doi: 10.1109/TIT.2016.2632149. · Zbl 1364.94165
[47]	Vladimir Vapnik. Statistical learning theory, 1998. · Zbl 0935.62007
[48]	Christoph Witzgall and R. Fletcher. Practical Methods of Optimization.Mathematics of Computation, 1989. ISSN 00255718. doi: 10.2307/2008742.
[49]	Siqi Wu and Bin Yu. Local identifiability of l 1 -minimization dictionary learning: a sufficient and almost necessary condition.Journal of Machine Learning Research, 18:1-56, 2018. · Zbl 1467.68155
[50]	Siqi Wu, Antony Joseph, Ann S. Hammonds, Susan E. Celniker, Bin Yu, and Erwin Frise. Stability-driven nonnegative matrix factorization to interpret spatial gene expression and build local gene networks.Proceedings of the National Academy of Sciences, 113(16): 201521171, 2016. ISSN 0027-8424. doi: 10.1073/pnas.1521171113.

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.