×

Descriptive matrix factorization for sustainability. Adopting the principle of opposites. (English) Zbl 1235.62002

Summary: Climate change, the global energy footprint, and strategies for sustainable development have become topics of considerable political and public interest. The public debate is informed by an exponentially growing amount of data and there are diverse partisan interest when it comes to interpretation. We therefore believe that data analysis methods are called for that provide results which are intuitively understandable even to non-experts. Moreover, such methods should be efficient so that non-expert users can perform their own analysis at low expense in order to understand the effects of different parameters and influential factors.
We discuss a new technique for factorizing data matrices that meets both these requirements. The basic idea is to represent a set of data by means of convex combinations of extreme data points. This often accommodates human cognition. In contrast to established factorization methods, the approach presented in this paper can also determine over-complete bases. At the same time, convex combinations allow for highly efficient matrix factorization. Based on techniques adopted from the field of distance geometry, we derive a linear time algorithm to determine suitable basis vectors for factorization. By means of the example of several environmental and developmental data sets we discuss the performance and characteristics of the proposed approach and validate that significant efficiency gains are obtainable without performance decreases compared to existing convexity constrained approaches.

MSC:

62-07 Data analysis (statistics) (MSC2010)
62P12 Applications of statistics to environmental and related topics
15A23 Factorization of matrices

Software:

MapReduce
Full Text: DOI

References:

[1] Achlioptas D, McSherry F (2007) Fast computation of low-rank matrix approximations. J ACM 54(9): 1–19 · Zbl 1311.94031
[2] Aguilar O, Huerta G, Prado R, West M (1998) Bayesian inference on latent structure in time series. In: Bernardo J, Bergen J, Dawid A, Smith A (eds) Bayesian statistics. Oxford University Press, Oxford · Zbl 0974.62066
[3] Blumenthal LM (1953) Theory and applications of distance geometry. Oxford University Press, Oxford · Zbl 0050.38502
[4] Chan B, Mitchell D, Cram L (2003) Archetypal analysis of galaxy spectra. Mon Not R Astron Soc 338(3): 790–795 · doi:10.1046/j.1365-8711.2003.06099.x
[5] Chang CI, Wu CC, Liu WM, Ouyang YC (2006) A new growing method for simplex-based endmember extraction algorithm. IEEE T Geosci Remote 44(10): 2804–2819 · doi:10.1109/TGRS.2006.881803
[6] Crippen G (1988) Distance geometry and molecular conformation. Wiley, New York · Zbl 1066.51500
[7] Cutler A, Breiman L (1994) Archetypal analysis. Technometrics 36(4): 338–347 · Zbl 0804.62002 · doi:10.1080/00401706.1994.10485840
[8] Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1): 107–113 · doi:10.1145/1327452.1327492
[9] Ding C, Li T, Jordan M (2010) Convex and semi-nonnegative matrix factorizations. IEEE T Pattern Anal 32(1): 45–55 · doi:10.1109/TPAMI.2008.277
[10] Drineas P, Kannan R, Mahoney M (2006) Fast Monte Carlo algorithms III: computing a compressed approixmate matrix decomposition. SIAM J Comput 36(1): 184–206 · Zbl 1111.68149 · doi:10.1137/S0097539704442702
[11] Faloutsos C, Lin KI (1995) FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In: Proceedings of the ACM SIGMOD international conference on management of data, San Diego
[12] Foster D, Nascimento S, Amano K (2004) Information limits on neural identification of coloured surfaces in natural scenes. Visual Neurosci 21: 331–336 · doi:10.1017/S0952523804213335
[13] Gomes C (2009) Computational sustainability. The Bridge, National Academy of Engineering 39(4): 6–11
[14] Goreinov SA, Tyrtyshnikov EE (2001) The maximum-volume concept in approximation by low-rank matrices. Contemp Math 280: 47–51 · Zbl 1003.15025 · doi:10.1090/conm/280/4620
[15] Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Educ Psychol 24(7): 498–520 · JFM 59.1182.04 · doi:10.1037/h0070888
[16] Kersting K, Wahabzada M, Thurau C, Bauckhage C (2010) Hierarchical convex NMF for clustering massive data. In: Proceedings of the 2nd Asian Conference on Machine Learning (ACML-10) · Zbl 1235.62002
[17] Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755): 788–799 · Zbl 1369.68285 · doi:10.1038/44565
[18] Lucas A, Klaassen P, Spreij P, Straetmans S (2003) Tail behaviour of credit loss distributions for general latent factor models. Appl Math Finance 10(4): 337–357 · Zbl 1070.91032 · doi:10.1080/1350486032000160786
[19] MacKay D (2009) Sustainable energy–without the hot air. UIT Cambridge Ltd, Cambridge
[20] Miao L, Qi H (2007) Endmember extraction from highly mixed data using minimum volume constrained nonnegative matrix factorization. IEEE T Geosci Remote 45(3): 765–777 · doi:10.1109/TGRS.2006.888466
[21] Nascimento JMP, Dias JMB (2005) Vertex component analysis: a fast algorithm to unmix hyperspectral data. IEEE T Geosci Remote 43(4): 898–910 · doi:10.1109/TGRS.2005.844293
[22] Ostrouchov G, Samatova N (2005) On fastmap and the convex hull of multivariate data: toward fast and robust dimension reduction. IEEE T Pattern Anal 27(8): 1340–1434 · doi:10.1109/TPAMI.2005.164
[23] Sippl M, Sheraga H (1986) Cayley-Menger coordinates. Proc Natl Acad Sci 83(8): 2283–2287 · Zbl 0587.51013 · doi:10.1073/pnas.83.8.2283
[24] Spearman C (1904) General intelligence objectively determined and measured. Am J Psychol 15: 201–293 · doi:10.2307/1412107
[25] Thurau C, Kersting K, Bauckhage C (2009) Convex non-negative matrix factorization in the wild. In: Proceedings of the IEEE International Conference on Data Mining, Miami · Zbl 1235.62002
[26] Thurau C, Kersting K, Wahabzada M, Bauckhage C (2010) Convex non-negative matrix factorization for massive datasets. Knowl Inf Syst (KAIS). doi: 10.1007/s10115-010-0352-6 · Zbl 1235.62002
[27] Winter ME (1999) N-FINDR: an algorithm for fast and autonomous spectral endmember determination in hyperspectral data. In: Proceedings of the International Conference on Applied Geologic Remote Sensing, Vancouver
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.