Document Zbl 1490.62024

A nonlinear matrix decomposition for mining the zeros of sparse data. (English) Zbl 1490.62024

SIAM J. Math. Data Sci. 4, No. 2, 431-463 (2022).

Summary: We describe a simple iterative solution to a widely recurring problem in multivariate data analysis: given a sparse nonnegative matrix \(\mathbf{X}\), how to estimate a low-rank matrix \(\Theta\) such that \(\mathbf{X}\approx f(\Theta)\), where \(f\) is an elementwise nonlinearity? We develop a latent variable model for this problem and consider those sparsifying nonlinearities, popular in neural networks, that map all negative values to zero. The model seeks to explain the variability of sparse high-dimensional data in terms of a smaller number of degrees of freedom. We show that exact inference in this model is tractable and derive an expectation-maximization (EM) algorithm to estimate the low-rank matrix \(\Theta\). Notably, we do not parameterize \(\Theta\) as a product of smaller matrices to be alternately optimized; instead, we estimate \(\Theta\) directly via the singular value decomposition of matrices that are repeatedly inferred (at each iteration of the EM algorithm) from the model’s posterior distribution. We use the model to analyze large sparse matrices that arise from data sets of binary, grayscale, and color images. In all of these cases, we find that the model discovers much lower-rank decompositions than purely linear approaches.

Cited in 1 Document

MSC:

62-08	Computational methods for problems pertaining to statistics
15A23	Factorization of matrices
15B48	Positive matrices and their generalizations; cones of matrices
62R07	Statistical aspects of big data and data science
65F50	Computational methods for sparse matrices
65F55	Numerical methods for low-rank matrix approximation; matrix compression
68T07	Artificial neural networks and deep learning

Keywords:

matrix factorization; latent variable modeling; unsupervised learning

Software:

LBFGS-B; darch; dSprites; STL-10 dataset; PMA; GitHub; Pyglrm; LowRankModels

Cite Review PDF

Full Text: DOI

References:

[1]	C. Anderson-Bergman, T. G. Kolda, and K. Kincher-Winoto, XPCA: Extending PCA for a Combination of Discrete and Continuous Variables, preprint, arXiv:1808.07510, 2018.
[2]	A. M. S. Ang and N. Gillis, Accelerating nonnegative matrix factorization algorithms using extrapolation, Neural Comput., 31 (2019), pp. 417-439. · Zbl 1470.65083
[3]	D. J. Bartholomew, M. Knott, and I. Moustaki, Latent Variable Models and Factor Analysis: A Unified Approach, Wiley, Chichester, UK, 2011. · Zbl 1266.62040
[4]	S. Bengio, K. Dembczynski, T. Joachims, M. Kloft, and M. Varma, Extreme Classification, Dagstuhl Rep., 8 (2019), pp. 62-80.
[5]	S. A. Bhaskar, Probabilistic low-rank matrix completion from quantized measurements, J. Mach. Learn. Res. (JMLR), 17 (2016), pp. 1-34. · Zbl 1395.62120
[6]	S. A. Bhaskar and A. Javanmard, 1-bit matrix completion under exact low-rank constraint, in Proceedings of the 49th Annual Conference on Information Sciences and Systems (CISS-15), IEEE, Piscalaway, NJ, 2015, pp. 1-6.
[7]	Y. Bi and J. Lavaei, On the absence of spurious local minima in nonlinear low-rank matrix recovery problems, Proc. Mach. Learn. Res. (PMLR), 130 (2021), pp. 379-387.
[8]	E. Bingham, A. Kaban, and M. Fortelius, The aspect Bernoulli model: Multiple causes of presences and absences, PAA Pattern Anal. Appl., 12 (2009), pp. 55-78. · Zbl 1422.68204
[9]	D. Blei, Build, compute, critique, repeat: Data analysis with latent variable models, Annu. Rev. Stat. Appl., 1 (2014), pp. 203-232.
[10]	D. M. Blei, A. Y. Ng, and M. I. Jordan, Latent Dirichlet allocation, J. Mach. Learn. Res. (JMLR), 3 (2003), pp. 993-1022. · Zbl 1112.68379
[11]	J.-P. Brunet, P. Tamayo, T. R. Golub, and J. P. Mesirov, Metagenes and molecular pattern discovery using matrix factorization, Proc. Nat. Acad. Sci. USA, 101 (2004), pp. 4164-4169.
[12]	R. H. Byrd, P. Lu, J. Nocedal, and C. Zhu, A limited memory algorithm for bound constrained optimization, SIAM J. Sci. Comput., 16 (1995), pp. 1190-1208. · Zbl 0836.65080
[13]	T. Cai and W.-X. Zhou, A max-norm constrained minimization approach to 1-bit matrix completion, J. Mach. Learn. Res. (JMLR), 14 (2013), pp. 3619-3647. · Zbl 1318.62172
[14]	E. Candés and B. Recht, Exact matrix completion via convex optimization, Found. Comput. Math., 9 (2009), pp. 717-772. · Zbl 1219.90124
[15]	J. Canny, A computational approach to edge detection, IEEE Trans. Pattern Anal. Mach. Intell., 8 (1986), pp. 679-698.
[16]	Y. Cao and Y. Xie, Categorical matrix completion, in Proceedings of the 6th IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP-15), IEEE, Piscataway, NJ, 2013, pp. 369-372.
[17]	A. Cemgil, Bayesian inference for non-negative matrix factorisation models, Comput. Intell. Neurosci. 2009 (2009), 785152.
[18]	S. Chatterjee, Matrix estimation by universal singular value thresholding, Ann. Statist., 43 (2015), pp. 177-214. · Zbl 1308.62038
[19]	Y. Chi, Y. M. Lu, and Y. Chen, Non-convex optimization meets low-rank matrix factorization: An overview, IEEE Trans. Signal Process., 67 (2019), pp. 5239-5269. · Zbl 1543.90234
[20]	A. Cichocki, R. Zdunek, and S. Amari, Hierarchical ALS algorithms for nonnegative matrix and 3d tensor factorization, in Independent Component Analysis and Signal Separation, M. E. Davies, C. J. James, S. A. Abdallah, and M. D. Plumbley, eds., Lecture Notes in Comput. Sci. 4666, Springer, Berlin, 2007, pp. 169-176. · Zbl 1172.94390
[21]	A. Cichocki, R. Zdunek, A. H. Phan, and S. Amari, Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation, Wiley, Hoboken, NJ, 2009.
[22]	A. K. Cline and I. S. Dhillon, Computation of the singular value decomposition, in Handbook of Linear Algebra, CRC Press, Boca Raton, FL, 2006, 45.
[23]	A. Coates, A. Ng, and H. Lee, An analysis of single-layer networks in unsupervised feature learning, Proc. Mach. Learn. Res, (PMLR), 15 (2011), pp. 215-223.
[24]	M. Collins, S. Dasgupta, and R. E. Schapire, A generalization of principal components analysis to the exponential family, in Advances in Neural Information Processing Systems 14, T. G. Dietterich, S. Becker, and Z. Ghahramani, eds., MIT Press, Cambridge, MA, 2002, pp. 617-624.
[25]	M. A. Davenport, Y. Plan, E. van den Berg, and M. Wooters, 1-bit matrix completion, Inf. Inference, 3 (2014), pp. 189-223. · Zbl 1309.62124
[26]	S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, Indexing by latent semantic analysis, J. Assoc. Inform. Sci. Tech., 41 (1990), pp. 391-407.
[27]	A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. B, Stat. Methodol., 39 (1977), pp. 1-38. · Zbl 0364.62022
[28]	E. M. Dodds and M. R. M. Robert, On the sparse structure of natural sounds and natural images: Similarities, differences, and implications for neural coding, Front. Comput. Neurosci., 13 (2019), pp. 1-19.
[29]	D. Donoho, High-dimensional data analysis: The curses and blessings of dimensionality, AMS Math Challenges Lecture, AMS, Providence, RI, 2000, pp. 1-32.
[30]	C. Eckart and G. Young, The approximation of one matrix by another of lower rank, Psychometrika, 1 (1936), pp. 211-218. · JFM 62.1075.02
[31]	N. B. Erichson, A. Mendible, S. Wihlborn, and J. N. Kutz, Randomized nonnegative matrix factorization, Pattern Recognit. Lett., 104 (2018), pp. 1-7.
[32]	J. Fan and J. Cheng, Matrix completion by deep matrix factorization, Neural Netw., 98 (2018), pp. 34-41.
[33]	P. Foldiak and M. Young, Sparse coding in the primate cortex, in The Handbook of Brain Theory and Neural Networks, MIT Press, Cambridge, MA, 1995, pp. 895-898.
[34]	A. Frieze, R. Kannan, and S. Vempala, Fast Monte-Carlo algorithms for finding low-rank approximations, J. ACM, 51 (1998), pp. 1025-1041. · Zbl 1125.65005
[35]	X. Fu, K. Huang, N. D. Sidiropoulos, and W.-K. Ma, Nonnegative matrix factorization for signal and data analytics: Identifiability, algorithms, and applications, IEEE Signal Process. Mag., 36 (2019), pp. 59-80.
[36]	R. S. Ganti, L. Balzano, and R. Willett, Matrix completion under monotonic single index models, in Advances in Neural Information Processing Systems 28, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, eds., Curran Associates, Red Hook, NY, 2015, pp. 1864-1872.
[37]	N. Gillis, Nonnegative Matrix Factorization, SIAM, Philadelphia, 2021. · Zbl 1470.68009
[38]	G. Golub and W. Kahan, Calculating the singular values and pseudo-inverse of a matrix, J. Ser. B, Numer. Anal., 2 (1965), pp. 205-224. · Zbl 0194.18201
[39]	P. Gopalan, L. Charlin, and D. Blei, Content-based recommendations with Poisson factorization, in Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Q. Weinberger, eds., Curran Associates, Red Hook, NY, 2014, pp. 3176-3184.
[40]	P. Gopalan, J. M. Hofman, and D. M. Blei, Scalable recommendation with hierarchical poisson factorization, in Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence (UAI-15), Curran Associates, Red Hook, NY, 2015, pp. 326-335.
[41]	G. J. Gordon, Generalized \({^2}\) linear \({^2}\) models, in Advances in Neural Information Processing Systems 15, S. Becker, S. Thrun, and K. Obermayer, eds., MIT Press, Cambridge, MA, 2003, pp. 593-600.
[42]	S. Gunasekar, P. Ravikumar, and J. Ghosh, Exponential family matrix completion under structural constraints, in Proceedings of the 31st International Conference on Machine Learning (ICML-14), IEEE, Piscataway, NJ, 2014, pp. 1917-1925.
[43]	J. Guo, E. Levina, G. Michailidis, and J. Zhu, Graphical models for ordinal data, J. Comput. Graph. Statist., 24 (2015), pp. 183-204.
[44]	N. Halko, P. G. Martinsson, and J. A. Tropp, Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions, SIAM Rev., 53 (2011), pp. 217-288. · Zbl 1269.65043
[45]	P. D. Handschutter, N. Gillis, and X. Siebert, A survey on deep matrix factorizations, Comput. Sci. Rev., 42, (2021), 100423. · Zbl 1486.68147
[46]	T. Hastie, R. Mazumder, J. D. Lee, and R. Zadeh, Matrix completion and low-rank SVD via fast alternating least squares, J. Mach. Learn. Res. (AMLR), 16 (2015), pp. 3367-3402. · Zbl 1352.65117
[47]	N. C. Henderson and R. Varadhan, Damped Anderson acceleration with restarts and monotonicity control for accelerating EM and EM-like algorithms, J. Comput. Graph. Statist., 28 (2019), pp. 834-846. · Zbl 07499030
[48]	J. M. Hernandez-Lobato, N. Houlsby, and Z. Ghahramani, Stochastic inference for scalable probabilistic modeling of binary matrices, in Proceedings of the 31st International Conference on Machine Learning (ICML-14), IEEE, Piscataway, NJ, 2014, pp. 379-387.
[49]	G. E. Hinton and Z. Ghahramani, Generative models for discovering sparse distributed representations, Philos. Trans. Roy. Soc. B, 352 (1997), pp. 1177-1190.
[50]	G. E. Hinton and R. R. Salakhutdinov, Reducing the dimensionality of data with neural networks, Science, 313 (2006), pp. 504-507. · Zbl 1226.68083
[51]	P. D. Hoff, Bilinear mixed-effects models for dyadic data, J. Amer. Statist. Assoc., 100 (2005), pp. 286-295. · Zbl 1117.62353
[52]	P. D. Hoff, Modeling homophily and stochastic equivalence in symmetric relational data, in Advances in Neural Information Processing Systems 20, J. Platt, D. Koller, Y. Singer, and S. Roweis, eds., Curran Associates, Red Hook, NY, 2008, pp. 657-665.
[53]	M. Hoffman, D. Blei, J. Paisley, and C. Wang, Stochastic variational inference, J. Mach. Learn. Res. (JMLR), 14 (2013), pp. 1303-1347. · Zbl 1317.68163
[54]	T. Hofmann, Probabilistic latent semantic analysis, in Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI-99), Morgan Kaufmann, San Francisco, 1999, pp. 289-296.
[55]	C.-J. Hsieh and I. S. Dhillon, Fast coordinate descent methods with variable selection for non-negative matrix factorization, in Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACH, New York, 2011, ACM, pp. 1064-1072.
[56]	D. J. Hsu, S. M. Kakade, J. Langford, and T. Zhang, Multi-label prediction via compressed sensing, in Advances in Neural Information Processing Systems 22, Y. Bengio, D. Schuurmans, J. Lafferty, C. Williams, and A. Culotta, eds., Curran Associates, Red Hook, NY, 2009, pp. 772-780.
[57]	P. Jain and P. Kar, Non-convex optimization for machine learning, Found. Trends Mach. Learn., 10 (2017), pp. 142-336. · Zbl 1388.68251
[58]	M. Jamshidian and R. I. Jennrich, Conjugate gradient acceleration of the EM algorithm, J. Amer. Statist. Assoc., 88 (1993), pp. 221-228. · Zbl 0775.65025
[59]	M. Jamshidian and R. I. Jennrich, Acceleration of the EM algorithm by using quasi-Newton methods, J. R. Stat. Soc., Ser. B., 59 (1997), pp. 569-587. · Zbl 0889.62042
[60]	C. C. Johnson, Logistic matrix factorization for implicit feedback data, in Neural Information Processing Systems (NIPS-14) Workshop on Distributed Matrix Computations, Curran Associates, Red Hook, NY, 2014.
[61]	M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul, An introduction to variational methods for graphical models, Mach. Learn., 37 (1999), pp. 183-233. · Zbl 0945.68164
[62]	R. H. Keshavan, A. Montanari, and S. Oh, Matrix completion from a few entries, IEEE Trans. Inform. Theory, 56 (2010), pp. 2980-2998. · Zbl 1366.62111
[63]	J. Kim, Y. He, and H. Park, Algorithms for nonnegative matrix and tensor factorizations: A unified view based on block coordinate descent framework, J. Global Optim., 58 (2014), pp. 285-319. · Zbl 1321.90129
[64]	T. G. Kolda and D. P. O’Leary, A semidiscrete matrix decomposition for latent semantic indexing in information retrieval, ACM Trans. Inform. Systems, 16 (1998), pp. 322-346.
[65]	T. G. Kolda and D. P. O’Leary, Algorithm 805: Computation and uses of the semidiscrete matrix decomposition, ACM Trans. Math. Software, 26 (2000), pp. 415-435.
[66]	Y. Koren, R. Bell, and C. Volinsky, Matrix factorization techniques for recommender systems, Computer, 42 (2009), pp. 30-37.
[67]	J. Lafond, Low rank matrix completion with exponential family noise, in Proceedings of the 28th Conference on Learning Theory (COLT-15), ACH, New York, 2015, pp. 1224-1243.
[68]	J. Lafond, O. Klopp, E. Moulines, and J. Salmon, Probabilistic low-rank matrix completion on finite alphabets, in Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Q. Weinberger, eds., Curran Associates, Red Hook, NY, 2014, pp. 1727-1735.
[69]	A. S. Lan, C. Studer, and R. Baraniuk, Matrix recovery from quantized and corrupted measurements, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-14), IEEE, Piscataway, NJ, 2014, pp. 4973-4977.
[70]	J. S. Larsen and L. K. H. Clemmensen, Non-negative matrix factorization for binary data, in Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K-15), Springer, Cham, 2015, pp. 555-563.
[71]	Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proc. IEEE, 86 (1998), pp. 2278-2324.
[72]	D. D. Lee and H. S. Seung, Learning the parts of objects by non-negative matrix factorization, Nature, 401 (1999), pp. 788-791. · Zbl 1369.68285
[73]	D. D. Lee and H. S. Seung, Algorithms for non-negative matrix factorization, in Advances in Neural Information Processing Systems 13, T. Leen, T. Dietterich, and V. Tresp, eds., MIT Press, Cambridge, MA, 2001, pp. 535-541.
[74]	D. D. Lee and H. Sompolinsky, Learning a continuous hidden variable model for binary data, in Advances in Neural Information Processing Systems 11, M. J. Kearns, S. A. Solla, and D. A. Cohn, eds., MIT Press, Cambridge, MA, 1999, pp. 515-521.
[75]	T. Lesieur, F. Krzakala, and L. Zdeborovâ¡, MMSE of probabilistic low-rank matrix estimation: Universality with respect to the output channel, in Proceedings of 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton-15), IEEE, Piscataway, NJ, 2015, pp. 680-687.
[76]	C. Liu, D. B. Rubin, and Y. N. Wu, Parameter expansion to accelerate EM: The PX-EM algorithm, Biometrika, 85 (1998), pp. 755-770. · Zbl 0921.62071
[77]	A. Lumbreras, L. Filstroff, and C. Févotte, Bayesian mean-parameterized nonnegative binary matrix factorization, Data Min. Knowl. Discov., 34 (2020), pp. 1898-1935. · Zbl 1460.62037
[78]	Z. Ma, Z. Ma, and H. Yuan, Universal latent space model fitting for large networks with edge covariates, J. Mach. Learn. Res. (JMLR), 21 (2020), pp. 1-67. · Zbl 1497.68432
[79]	J. B. MacQueen, Some methods for classification and analysis of multivariate observations, in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, Berkeley, CA, 1967, pp. 281-297. · Zbl 0214.46201
[80]	M. W. Mahoney, Randomized algorithms for matrices and data, Found. Trends Mach. Learn., 3 (2010), pp. 123-224. · Zbl 1232.68173
[81]	I. Markovsky, Low Rank Approximation: Algorithms, Implementation, Applications, Comm. Control. Engrg. Ser., Springer, London, 2012. · Zbl 1245.93005
[82]	L. Matthey, I. Higgins, D. Hassabis, and A. Lerchner, dSprites: Disentanglement Testing Sprites Dataset, https://github.com/deepmind/dsprites-dataset/, 2017.
[83]	A. Mazumdar and A. S. Rawat, Learning and recovery in the ReLU model, in Proceedings of the 57th Annual Allerton Conference on Communication, Control, and Computing, IEEE, Piscataway, NJ, 2019, pp. 108-115.
[84]	P. McCullagh and J. A. Nelder, Generalized Linear Models, Chapman & Hall/CRC, Boca Raton, FL, 1989. · Zbl 0744.62098
[85]	G. J. McLachlan and K. E. Basford, Mixture Models: Inference and Applications to Clustering, CRC Press, Boca Raton, FL, 1987. · Zbl 0697.62050
[86]	E. Meeds, Z. Ghahramani, R. M. Neal, and S. T. Roweis, Modeling dyadic data with binary latent factors, in Advances in Neural Information Processing Systems 19, B. Schölkopf, J. Platt, and T. Hofmann, eds., MIT Press, Cambridge, MA, 2007, pp. 977-984.
[87]	J. J. Meulman, A. J. V. der Kooij, and W. J. Heiser, Principal components analysis with nonlinear optimal scaling transformations for ordinal and nominal data, in The Sage Handbook of Quantitative Methodology for the Social Sciences, Sage, Thousand Oaks, CA, 2004, pp. 49-72.
[88]	A. Mnih and R. R. Salakhutdinov, Probabilistic matrix factorization, in Advances in Neural Information Processing Systems 20, J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis, eds., Curran Associates, Red Hook, NY, 2008, pp. 1257-1264.
[89]	R. M. Neal, Probabilistic inference using Markov chain Monte Carlo methods, Technical report CRG-TR-93-1, Department of Computer Science, University of Toronto, Toronto, 1993.
[90]	D. P. O’Leary and S. Peleg, Digital image compression by outer product expansion, IEEE Trans. Commun., 31 (1983), pp. 441-444.
[91]	B. A. Olshausen and D. J. Field, Emergence of simple-cell receptive field properties by learning a sparse code for natural images, Nature, 381 (1996), pp. 607-609.
[92]	P. Paatero and U. Tapper, Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values, Environmetrics, 5 (1994), pp. 111-126.
[93]	J. Paisley, D. Blei, and M. Jordan, Bayesian nonnegative matrix factorization with stochastic variational inference, in Handbook of Mixed Membership Models and Their Applications, E. Airoldi, D. Blei, E. Erosheva, and S. Fienberg, eds., Chapman and Hall/CRC Handb. Mod. Stat. Methods, Chapman and Hall/CRC, Boca Raton, FL, 2014.
[94]	B. Ren, L. Pueyo, G. Zhu, and B. Duchêne, Non-negative matrix factorization: Robust extraction of extended structures, Astrophys. J., 852 (2018), p. 104.
[95]	L. Rencker, F. Bach, W. Wang, and M. D. Plumbley, Sparse recovery and dictionary learning from nonlinear compressive measurements, IEEE Trans. Signal Process., 67 (2019), pp. 5659-5670.
[96]	J. D. M. Rennie and N. Srebro, Fast maximum margin matrix factorization for collaborative prediction, in Proceedings of the 22nd International Conference on Machine Learning, ACH, New York, 2005, pp. 713-719.
[97]	J. D. M. Rennie and N. Srebro, Loss functions for preference levels: regression with discrete ordered labels, in Proceedings of the IJCAI Multidisciplinary Workshop on Advances in Preference Handling, Informs, Catonsville, MD, 2005, pp. 180-186.
[98]	D. B. Rubin and D. T. Thayer, EM algorithms for ML factor analysis, Psychometrika, 47 (1982), pp. 69-76. · Zbl 0483.62046
[99]	R. R. Salakhutdinov, S. T. Roweis, and Z. Ghahramani, Optimization with EM and expectation-conjugate-gradient, in Proceedings of the 20th International Conference on Machine Learning (ICML-03), ACM, New York, 2003, pp. 672-679.
[100]	L. Saul and F. Pereira, Aggregate and mixed-order Markov models for statistical language processing, in Proceedings of the 2nd Conference on Empirical Methods in Natural Language Processing (EMNLP-97), Association for Computational Linguistics, Somerset, NY, 1997, pp. 81-89.
[101]	A. I. Schein, L. K. Saul, and L. H. Ungar, A generalized linear model for principal component analysis of binary data, Proc. Mach. Learn. Res. (PMLR), RY (2003), pp. 240-247.
[102]	H. S. Seung and D. D. Lee, The manifold ways of perception, Science, 290 (2000), pp. 2268-2269.
[103]	A. P. Singh and G. J. Gordon, A unified view of matrix factorization models, in Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD-08), Springer, Berlin, 2008, pp. 358-373.
[104]	G.-J. Song and M. K. Ng, Nonnegative low rank matrix approximation for nonnegative matrices, App. Math. Lett., 105 (2020), 106300. · Zbl 1436.65056
[105]	H. A. Song and S. Lee, Hierarchical representation using NMF, in Proceedings of the International Conference on Neural Information Processing (ICONIP-13), Springer, Berlin, 2013, pp. 466-473.
[106]	A. Soni, S. Jain, J. Haupt, and S. Gonella, Noisy matrix completion under sparse factor models, IEEE Trans. Inform. Theory, 62 (2016), pp. 3636-3661. · Zbl 1359.94173
[107]	N. Srebro, Learning with Matrix Factorizations, PhD thesis, Massachusetts Institute of Technology, Cambridge, MA, 2004.
[108]	N. Srebro and T. Jaakkola, Weighted low-rank approximations, in Proceedings of the 20th International Conference on Machine Learning (ICML-03), ACM, New York, 2003, pp. 720-727.
[109]	N. Srebro, J. Rennie, and T. S. Jaakkola, Maximum-margin matrix factorization, in Advances in Neural Information Processing Systems 17, L. K. Saul, Y. Weiss, and L. Bottou, eds., MIT Press, Cambridge, MA, 2005, pp. 1329-1336.
[110]	J. Sun, S. Boyd, L. Xiao, and P. Diaconis, The fastest mixing Markov process on a graph, and a connection to a maximum variance unfolding problem, SIAM Rev., 48 (2006), pp. 681-699. · Zbl 1109.60324
[111]	L. Taslaman and B. Nilsson, A framework for regularized non-negative matrix factorization, with application to the analysis of gene expression data, PLOS One, 7 (2012), e46331.
[112]	M. E. Tipping, Probabilistic visualisation of high-dimensional binary data, in Advances in Neural Information Processing Systems 11, M. J. Kearns, S. A. Solla, and D. A. Cohn, eds., MIT Press, Cambridge, MA, 1989, pp. 592-598.
[113]	M. E. Tipping and C. M. Bishop, Probabilistic principal component analysis, J. R. Stat. Soc. Ser. B Stat. Methodol., 61 (1999), pp. 611-622. · Zbl 0924.62068
[114]	A. M. Tomé, R. Schachtner, V. Vigneron, C. G. Puntonet, and E. W. Lang, A logistic non-negative matrix factorization approach to binary data sets, Multidimens. Syst. Signal Process., 26 (2013), pp. 125-143.
[115]	G. Trigeorgis, K. Bousmalis, S. Zafeiriou, and B. Schuller, A deep matrix factorization method for learning attribute representations, IEEE Trans. Pattern Anal. Mach. Intell., 39 (2016), pp. 417-429.
[116]	J. A. Tropp, A. Yurtsever, M. Udell, and V. Cevher, Practical sketching algorithms for low-rank matrix approximation, SIAM J. Matrix Anal. Appl., 38 (2017), pp. 1454-1485. · Zbl 1379.65026
[117]	J. A. Tropp, A. Yurtsever, M. Udell, and V. Cevher, Streaming low-rank matrix approximation with an application to scientific simulation, SIAM J. Sci. Comput., 41 (2019), pp. A2430-A2463. · Zbl 1420.65060
[118]	M. Turk and A. Pentland, Eigenfaces for recognition, J. Cogn. Neurosci., 3 (1991), pp. 71-86.
[119]	M. Udell, C. Horn, R. Zadeh, and S. Boyd, Generalized low rank models, Found. Trends Mach. Learn., 9 (2016), pp. 1-118. · Zbl 1350.68221
[120]	K. Q. Weinberger and L. K. Saul, Unsupervised learning of image manifolds by semidefinite programming, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR-04), IEEE Computer Society, Los Alamitos, CA, 2004, pp. 998-995.
[121]	K. Q. Weinberger, F. Sha, and L. K. Saul, Learning a kernel matrix for nonlinear dimensionality reduction, in Proceedings of the 21st International Conference on Machine Learning (ICML-04), ACM, New York, 2004, pp. 839-846.
[122]	D. M. Witten, R. Tibshirani, and T. Hastie, A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis, Biostatistics, 10 (2009), pp. 515-534. · Zbl 1437.62658
[123]	J. Wright and Y. Ma, High-Dimensional Data Analysis with Low-Dimensional Models: Principles, Computation, and Applications, Cambridge University Press, Cambridge, 2021. · Zbl 1478.68009
[124]	Y.-J. Wu, E. Levina, and J. Zhu, Generalized Linear Models with Low Rank Effects for Network Data, 2017, arXiv preprint, arXiv:1705.06672.
[125]	J. Xu, D. Hsu, and A. Maleki, Benefits of over-parameterization with EM, in Advances in Neural Information Processing Systems 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, eds., Curran Associates, Red Hook, NY, 2018, pp. 10685-10695.
[126]	H. J. Xue, X. Dai, J. Zhang, S. Huang, and J. Chen, Deep matrix factorization models for recommender systems, in Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI-17), International Joint Conferences on Artificial Intelligence Organization, 2017, pp. 3203-3209.
[127]	J. Yu, G. Zhou, A. Cichocki, and S. Xie, Learning the hierarchical parts of objects by deep non-smooth nonnegative matrix factorization, IEEE Access, 6 (2018), pp. 58096-58105.
[128]	Y. Yu, Monotonically overrelaxed EM algorithms, J. Comput. Graph. Statist., 21 (2012), pp. 518-537.
[129]	H. Zhao, Z. Ding, and Y. Fu, Multi-view clustering via deep matrix factorization, in Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17), AAAI Press, Palo Alto, CA, 2017, pp. 2921-2927.
[130]	Y. Zhao and M. Udell, Matrix completion with quantified uncertainty through low rank Gaussian copula, in Advances in Neural Information Processing Systems 33, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, eds., Curran Associates, Red Hook, NY, 2020, pp. 20977-20988.
[131]	Y. Zhao and M. Udell, Missing value imputation for mixed data via Gaussian copula, in Proceedings of the 26th International Conference on Knowledge Discovery and Data Mining (KDD-20), ACM, New York, 2020, pp. 636-646.

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.