×

Decentralized dictionary learning over time-varying digraphs. (English) Zbl 1434.68403

Summary: This paper studies Dictionary Learning problems wherein the learning task is distributed over a multi-agent network, modeled as a time-varying directed graph. This formulation is relevant, for instance, in Big Data scenarios where massive amounts of data are collected/stored in different locations (e.g., sensors, clouds) and aggregating and/or processing all data in a fusion center might be inefficient or unfeasible, due to resource limitations, communication overheads or privacy issues. We develop a unified decentralized algorithmic framework for this class of nonconvex problems, which is proved to converge to stationary solutions at a sublinear rate. The new method hinges on Successive Convex Approximation techniques, coupled with a decentralized tracking mechanism aiming at locally estimating the gradient of the smooth part of the sum-utility. To the best of our knowledge, this is the first provably convergent decentralized algorithm for Dictionary Learning and, more generally, bi-convex problems over (time-varying) (di)graphs.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
62H12 Estimation in multivariate analysis
68T10 Pattern recognition, speech recognition
90C26 Nonconvex programming, global optimization
90C35 Programming involving graphs or networks

References:

[1] University of Southern California, Signal and image processing institute. Volume 3: Miscellaneous image database, 1997. Available online:http://sipi.usc.edu/database/ database.php?volume=misc.
[2] M. Aharon, M. Elad, and A. Bruckstein. K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation.IEEE Transactions on Signal Processing, 54(11): 4311-4322, November 2006. · Zbl 1375.94040
[3] A. Aravkin, S. Becker, V. Cevher, and P. Olsen. A variational approach to stable principal component pursuit. InProceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence, UAI’14, pages 32-41, Arlington, Virginia, United States, 2014. AUAI Press. ISBN 978-0-9749039-1-0.
[4] D. P. Bertsekas and J. N. Tsitsiklis.Parallel and distributed computation: numerical methods. Athena Scientific, 1997. · Zbl 1325.65001
[5] P. Bianchi and J. Jakubowicz. Convergence of a multi-agent projected stochastic gradient algorithm for non-convex optimization.IEEE Transactions on Automatic Control, 58(2): 391-405, February 2013. · Zbl 1369.90131
[6] T. Bouwmans, A. Sobral, S. Javed, S. K. Jung, and E. Zahzah. Decomposition into low-rank plus additive matrices for background/foreground separation: A review for a comparative evaluation with a large-scale dataset.Computer Science Review, 23:1-71, February 2017. · Zbl 1398.68572
[7] E. J. Cand‘es, X. Li, Y. Ma, and J. Wright. Robust principal component analysis?Journal of the ACM, 58(3):1-37, June 2011. · Zbl 1327.62369
[8] P. Chainais and C. Richard. Learning a common dictionary over a sensor network. In Proceedings of the 2013 5th IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), pages 133-136, Saint Martin, French West Indies, France, December 2013.
[9] J. Chen, Z. J. Towfic, and A. H. Sayed. Dictionary learning over distributed models.IEEE Transactions on Signal Processing, 63(4):1001-1016, February 2015. · Zbl 1395.68229
[10] S. Chouvardas, Y. Kopsinis, and S. Theodoridis. An online algorithm for distributed dictionary learning. InProceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3292-3296, Brisbane, Queensland, Australia, April 2015.
[11] A. Daneshmand, F. Facchinei, V. Kungurtsev, and G. Scutari. Hybrid random/deterministic parallel algorithms for convex and nonconvex big data optimization.IEEE Transactions on Signal Processing, 63(13):3914-3929, August 2015. · Zbl 1394.94146
[12] A. Daneshmand, G. Scutari, and F. Facchinei. Distributed dictionary learning. InProceedings of the 50th Asilomar Conference on Signals, Systems and Computers, pages 1001-1005, November 2016.
[13] P. Di Lorenzo and G. Scutari. NEXT: In-network nonconvex optimization.IEEE Transactions on Signal and Information Processing over Networks, 2(2):120-136, June 2016.
[14] M. Elad and M. Aharon. Image denoising via sparse and redundant representations over learned dictionaries.IEEE Transactions on Image Processing, 15(12):3736-3745, December 2006.
[15] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge.International Journal of Computer Vision, 88(2): 303-338, June 2010.
[16] F. Facchinei, G. Scutari, and Simone Sagratella. Parallel selective algorithms for nonconvex big data optimization.IEEE Transactions on Signal Processing, 63(7):1874-1889, April 2015. · Zbl 1394.94174
[17] T. Hastie, R. Tibshirani, and M. Wainwright.Statistical Learning with Sparsity. CRC Press, Taylor & Francis Group, 2015. · Zbl 1319.68003
[18] M. Hong, D. Hajinezhad, and M. M. Zhao. Prox-PDA: The proximal primal-dual algorithm for fast distributed nonconvex optimization and learning over networks. InProceedings of the 34th International Conference on Machine Learning (ICML), volume 70, pages 1529-1538, Sydney, Australia, August 2017.
[19] P. O. Hoyer. Non-negative matrix factorization with sparseness constraints.Journal of Machine Learning Research, 5:1457-1469, December 2004. · Zbl 1222.68218
[20] Z. Jiang, Z. Lin, and L. S. Davis. Learning a discriminative dictionary for sparse coding via label consistent k-svd. InProceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’11, pages 1697-1704, Colorado Springs, CO, USA, June 2011.
[21] D. Kempe, A. Dobra, and J. Gehrke. Gossip-based computation of aggregate information. InProceedins of the 44th Annual IEEE Symposium on Foundations of Computer Science, pages 482-491, Cambridge, MA, USA, October 2003.
[22] J. Kim.High-radix Interconnection Networks. PhD thesis, Stanford, CA, USA, 2008.
[23] D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization.Nature, 401(6755):788-791, October 1999. · Zbl 1369.68285
[24] M. Lee, H. Shen, J. Z. Huang, and J. S. Marron. Biclustering via sparse singular value decomposition.Biometrics, 66(4):1087-1095, December 2010. · Zbl 1233.62182
[25] J. Liang, M. Zhang, X. Zeng, and G. Yu. Distributed dictionary learning for sparse representation in sensor networks.IEEE Transactions on Image Processing, 23(6):2528-2541, June 2014. · Zbl 1374.94540
[26] S. C. Madeira and A. L. Oliveira. Biclustering algorithms for biological data analysis: a survey.IEEE/ACM Transactions on Computational Biology and Bioinformatics, 1(1): 24-45, January 2004.
[27] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman. Supervised dictionary learning. InProceedings of Advances in Neural Information Processing Systems (NIPS), pages 1033-1040, Lake Tahoe, NV, December 2008.
[28] J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online learning for matrix factorization and sparse coding.Journal of Machine Learning Research, 11:19-60, January 2010. · Zbl 1242.62087
[29] A. Nedi´c, A. Ozdaglar, and P. A. Parrilo.Constrained consensus and optimization in multi-agent networks.IEEE Transactions on Automatic Control, 55(4):922-938, April 2010. · Zbl 1368.90143
[30] H. Raja and W. U. Bajwa. Cloud K-SVD: Computing data-adaptive representations in the cloud. InProceedings of 51st Annual Allerton Conference, pages 1474-1481, Allerton House, UIUC, Illinois, USA, October 2013.
[31] M. Razaviyayn, M. Hong, Z. Luo, and J. Pang. Parallel successive convex approximation for nonsmooth nonconvex optimization. InNIPS, 2014a.
[32] M. Razaviyayn, H. W. Tseng, and Z. Q. Luo. Dictionary learning for sparse representation: Complexity and algorithms. InProceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5247-5251, Florence, Italy, May 2014b.
[33] B. Recht, M. Fazel, and P. A. Parrilo. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization.SIAM Review, 52(3):471-501, August 2010. · Zbl 1198.90321
[34] R.T. Rockafellar and J.B. Wets.Variational Analysis. Springer, 1998. · Zbl 0888.49001
[35] G. Scutari and Y. Sun. Parallel and distributed successive convex approximation methods for big-data optimization. In F. Facchinei and J.-S. Pang, editors,Multi-agent Optimization, chapter 3, pages 141-308. Lecture Notes in Mathematics 2224, Springer, 2018. · Zbl 1461.90101
[36] G. Scutari and Y. Sun. Distributed nonconvex constrained optimization over time-varying digraphs.Mathematical Programming, to appear 2019. · Zbl 1415.90130
[37] G. Scutari, F. Facchinei, P. Song, D. P. Palomar, and J.-S. Pang. Decomposition by partial linearization: Parallel optimization of multiuser systems.IEEE Transaction on Signal Processing, 62:641-656, February 2014. · Zbl 1394.94507
[38] H. Shen and J. Z. Huang. Sparse principal component analysis via regularized low rank matrix approximation.Journal of Multivariate Analysis, 6(99):1015-1034, July 2008. · Zbl 1141.62049
[39] N. Srebro and A. Shraibman. Rank, trace-norm and max-norm. InProceedings of the Learning Theory: 18th Annual Conference on Learning Theory (COLT), pages 545-560, Bertinoro, Italy, June 2005. · Zbl 1137.68563
[40] Y. Sun, G. Scutari, and D. Palomar. Distributed nonconvex multiagent optimization over time-varying networks. InProceedings of the 2016 50th Asilomar Conference on Signals, Systems and Computers, pages 788-794, November 2016.
[41] K. K. Sung.Learning and Example Selection for Object and Pattern Recognition. PhD thesis, Artificial Intelligence Laboratory and Center for Biological and Computational Learning, MIT, Cambridge, MA, 1996.
[42] T. Tatarenko and B. Touri. Non-convex distributed optimization.IEEE Trans. on Automatic Control, 62:3744-3757, August 2017. · Zbl 1373.90123
[43] I. Tosic and P. Frossard. Dictionary learning.IEEE Signal Processing Magazine, 28(2): 27-38, March 2011. · Zbl 1372.94246
[44] M. Udell, C. Horn, R. Zadeh, and S. Boyd. Generalized low rank models.Foundations and Trends in Machine Learning, 9(1):1-118, June 2016.
[45] H. T. Wai, T. H. Chang, and A. Scaglione. A consensus-based decentralized algorithm for non-convex optimization with application to dictionary learning.InProceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3546-3550, Brisbane, Queensland, Australia, April 2015.
[46] H. T. Wai, J. Lafond, A. Scaglione, and E. Moulines. Decentralized frankwolfe algorithm for convex and nonconvex problems.IEEE Transaction on Automatic Control, 62:5522-5537, November 2017. · Zbl 1390.90525
[47] L. Xiao, S. Boyd, and S. Lall. A scheme for robust distributed sensor fusion based on average consensus. InProceedings of the 4th international symposium on Information processing in sensor networks, pages 63-70, Los Angeles, CA, April 2005.
[48] L. Xiao, S. Boyd, and S. J. Kim. Distributed average consensus with least-mean-square deviation.Journal of Parallel and Distributed Computing, 67(1):33-46, January 2007. · Zbl 1109.68019
[49] M. Zhao, Q. Shi, and M. Hong. A distributed algorithm for dictionary learning over networks. InProceedings of 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pages 505-509, December 2016.
[50] H.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.