×

Two-way incremental seriation in the temporal domain with three-dimensional visualization: making sense of evolving high-dimensional datasets. (English) Zbl 1471.62217

Summary: Two-way seriation is a popular technique to analyze groups of similar instances and their features, as well as the connections between the groups themselves. The two-way seriated data may be visualized as a two-dimensional heat map or as a three-dimensional landscape where colour codes or height correspond to the values in the matrix. To achieve a meaningful visualization of high-dimensional data, a compactly supported convolution kernel is introduced, which is similar to filter kernels used in image reconstruction and geostatistics. This filter populates the high-dimensional space with values that interpolate nearby elements and provides insight into the clustering structure. Ordinary two-way seriation is also extended to deal with updates of both the row and column spaces. Combined with the convolution kernel, a three-dimensional visualization of dynamics is demonstrated on two datasets, a news collection and a set of microarray measurements.

MSC:

62-08 Computational methods for problems pertaining to statistics
62H35 Image analysis in multivariate analysis

References:

[1] Anders, S., Visualization of genomic data with the Hilbert curve, Bioinformatics, 25, 1231-1235, (2009)
[2] Berry, M.; Hendrickson, B.; Raghavan, P., Sparse matrix reordering schemes for browsing hypertext, Lectures in Applied Mathematics, 32, 99-124, (1996) · Zbl 0857.68036
[3] Boucher, A.; Seto, K.; Journel, A., A novel method for mapping land cover changes: incorporating time and space with geostatistics, IEEE Transactions on Geoscience and Remote Sensing, 44, 3427-3435, (2006)
[4] Budanitsky, A.; Hirst, G., Evaluating wordnet-based measures of lexical semantic relatedness, Computational Linguistics, 32, 13-47, (2006) · Zbl 1234.68399
[5] Caraux, G.; Pinloche, S., Permutmatrix: a graphical environment to arrange gene expression profiles in optimal linear order, Bioinformatics, 21, 1280-1281, (2005)
[6] Chen, C., Generalized association plots: information visualization via iteratively generated correlation matrices, Statistica Sinica, 12, 7-30, (2002) · Zbl 1027.62047
[7] Chen, S., Rosenfeld, R., 1999. A Gaussian prior for smoothing maximum entropy models. Technical Report. Carnegie Mellon University.
[8] Climer, S., Zhang, W., 2004. Take a walk and cluster genes: a TSP-based approach to optimal rearrangement clustering. In: Proceedings of ICML-04, 21st International Conference on Machine Learning. Banff, Canada, p. 22.
[9] Davidson, G.; Hendrickson, B.; Johnson, D.; Meyers, C.; Wylie, B., Knowledge mining with vxinsight: discovery through interaction, Journal of Intelligent Information Systems, 11, 259-285, (1998)
[10] Deerwester, S.; Dumais, S.; Furnas, G.; Landauer, T.; Harshman, R., Indexing by latent semantic analysis, Journal of the American Society for Information Science, 41, 391-407, (1990)
[11] Eisen, M.; Spellman, P.; Brown, P.; Botstein, D., Cluster analysis and display of genome-wide expression patterns, Proceedings of the National Academy of Sciences, 95, 14863-14868, (1998)
[12] Ernst, J.; Nau, G.; Bar-Joseph, Z., Clustering short time series gene expression data, Bioinformatics, 21, i159-i168, (2005)
[13] Fellbaum, C., Wordnet: an electronic lexical database, (1998), MIT Press Cambridge, MA, USA · Zbl 0913.68054
[14] Foster, M., Evans, A., 2008. Performance evaluation of multivariate interpolation methods for scattered data in geoscience applications. In: Proceedings of IGARSS-08, International Geoscience and Remote Sensing Symposium. Boston, MA, USA.
[15] Gasch, A.; Spellman, P.; Kao, C.; Carmel-Harel, O.; Eisen, M.; Storz, G.; Botstein, D.; Brown, P., Genomic expression programs in the response of yeast cells to environmental changes, Science, 11, 4241-4257, (2000)
[16] Hahsler, M.; Hornik, K., TSP-infrastructure for the traveling salesperson problem, Journal of Statistical Software, 23, 1-21, (2007)
[17] Hahsler, M.; Hornik, K.; Buchta, C., Getting things in order: an introduction to the \(R\) package seriation. technical report. department of statistics and mathematics, WU Vienna university of economics and business, (2007)
[18] Havre, S., Hetzler, B., Nowell, L., 2000. ThemeRiver: visualizing theme changes over time. In: Proceedings of Infovis-00, IEEE Symposium on Information Visualization. Salt Lake City, UT, USA, pp. 115-123.
[19] Hubert, L., Some applications of graph theory and related non-metric techniques to problems of approximate seriation: the case of symmetric proximity measures, British Journal of Mathematical and Statistical Psychology, 27, 133-153, (1974) · Zbl 0285.92029
[20] Kanerva, P., Kristofersson, J., Holst, A., 2000. Random indexing of text samples for latent semantic analysis. In: Proceedings of CogSci-00, 22nd Annual Conference of the Cognitive Science Society. Philadelphia, PA, USA.
[21] Kohonen, T.; Kaski, S.; Lagus, K.; Salojärvi, J.; Honkela, J.; Paatero, V.; Saarela, A., Self organization of a massive text document collection, IEEE Transactions on Neural Networks, 11, 574-585, (2000)
[22] Landauer, T., Laham, D., Rehder, B., Schreiner, M., 1997. How well can passage meaning be derived without using word order? A comparison of latent semantic analysis and humans. In: Proceedings of CogSci-97, 19th Annual Conference of the Cognitive Science Society. Stanford, CA, USA, p. 412.
[23] Liiv, I., Seriation and matrix reordering methods: an historical overview, Statistical Analysis and Data Mining, 3, 70-91, (2010) · Zbl 07260234
[24] Liiv, I.; Opik, R.; Ubi, J.; Stasko, J., Visual matrix explorer for collaborative seriation, Wiley Interdisciplinary Reviews: Computational Statistics, (2011)
[25] McCormick, W.; Schweitzer, P.; White, T., Problem decomposition and data reorganization by a clustering technique, Operations Research, 20, 993-1009, (1972) · Zbl 0249.90046
[26] Pham, T., van Vliet, L., 2003. Normalized averaging using adaptive applicability functions with applications in image reconstruction from sparsely and randomly sampled data. In: Proceedings of SCIA-03, 13th Scandinavian Conference on Image Analysis. Halmstad, Sweden, pp. 485-492. · Zbl 1040.68732
[27] Rosenkrantz, D.; Stearns, R.; Lewis, P., An analysis of several heuristics for the traveling salesman problem, SIAM Journal on Computing, 6, 563-581, (1977) · Zbl 0364.90104
[28] Stuart, J.; Segal, E.; Koller, D.; Kim, S., A gene-coexpression network for global discovery of conserved genetic modules, Science, 302, 249-255, (2003)
[29] Wilkinson, L.; Friendly, M., The history of the cluster heat map, The American Statistician, 63, 179-184, (2009)
[30] Wilkinson, L.; Wills, G., The grammar of graphics, (2005), Springer Verlag · Zbl 1080.68107
[31] Wise, J., Thomas, J., Pennock, K., Lantrip, D., Pottier, M., Schur, A., Crow, V., 1995. Visualizing the non-visual: spatial analysis and interaction with information from text documents. In: Proceedings of Infovis-95, IEEE Symposium on Information Visualization. Atlanta, GA, USA.
[32] Wittek, P.; Darányi, S.; Tan, C. L., An ordering of terms based on semantic relatedness, (Bunt, H., Proceedings of IWCS-09, 8th International Conference on Computational Semantics, (2009), Tilburg The Netherlands)
[33] Wittek, P., Tan, C.L., 2009. A kernel-based feature weighting for text classification. In: Proceedings of IJCNN-09, IEEE International Joint Conference on Neural Networks. Atlanta, GA, USA, pp. 3373-3379.
[34] Wittek, P.; Tan, C. L., Compactly supported basis functions as support vector kernels for classification, IEEE Transactions on Pattern Analysis and Machine Intelligence, 33, 2039-2050, (2011)
[35] Wu, H.; Tien, Y.; Chen, C., GAP: a graphical environment for matrix visualization and cluster analysis, Computational Statistics & Data Analysis, 54, 767-778, (2010) · Zbl 1464.62013
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.