Dimension reduction and visualization of multiple time series data: a symbolic data analysis approach

336 Accesses
1 Citation
Explore all metrics

Abstract

Exploratory analysis and visualization of multiple time series data are essential for discovering the underlying dynamics of a series before attempting modeling and forecasting. This study extends two dimension reduction methods - principal component analysis (PCA) and sliced inverse regression (SIR) - to multiple time series data. This is achieved through the innovative path point approach, a new addition to the symbolic data analysis framework. By transforming multiple time series data into time-dependent intervals marked by starting and ending values, each series is geometrically represented as successive directed segments with unique path points. These path points serve as the foundation of our novel representation approach. PCA and SIR are then applied to the data table formed by the coordinates of these path points, enabling visualization of temporal trajectories of objects within a reduced-dimensional subspace. Empirical studies encompassing simulations, microarray time series data from a yeast cell cycle, and financial data confirm the effectiveness of our path point approach in revealing the structure and behavior of objects within a 2D factorial plane. Comparative analyses with existing methods, such as the applied vector approach for PCA and SIR on time-dependent interval data, further underscore the strength and versatility of our path point representation in the realm of time series data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dimensionality Reduction for Visualization of Time Series and Trajectories

Dimensionality reduction for multivariate time-series data mining

Article 19 January 2022

Optimal dimension reduction for high-dimensional and functional time series

Article 16 February 2018

References

Aigner W, Miksch S, Müller W, Schumann H, Tominski C (2007) Visualizing time-oriented data—a systematic view. Comput Graph 31(3):401–409
Google Scholar
Bar-Joseph Z, Gitter A, Simon I (2012) Studying and modelling dynamic biological processes using time-series gene expression data. Nat Rev Genet 13(8):552–564
Google Scholar
Becker C, Fried R (2003) Sliced inverse regression for high-dimensional time series. In: Exploratory data analysis in empirical research: proceedings of the 25th annual conference of the gesellschaft fur klassifickation, University of Munich. pp 3 – 11
Bertrand P, Goupil F (2000) Descriptive statistics for symbolic data. In: Bock HH, Diday E (eds) Analysis of symbolic data: exploratory methods for extracting statistical information from complex data. Springer-Verlag, Berlin, pp 103–124
Google Scholar
Billard L (2008) Sample covariance functions for complex quantitative data. In: Mizuta M. and Nakano J. (Ed): Proceedings of the international association of statistical computing conference, pp 157 – 163. Yokohama
Billard L, Diday E (2003) From the statistics of data to the statistics of knowledge: symbolic data analysis. J Am Stat Assoc 98(462):470–487
MathSciNet Google Scholar
Billard L, Diday E (2006) Symbolic data analysis: conceptual statistics and data mining. Wiley
Bock HH, Diday E (2000) Analysis of symbolic data: explanatory methods for extracting statistical information from complex data. Springer-Verlag, Berlin
Google Scholar
Cazes P, Chouakria A, Diday E, Schecktman Y (1997) Extension de l’analyse en composantes principales ’a des donn’ees de type intervalle. Rev Stat Appl 45:5–24
Google Scholar
Chen CH, Li KC (1998) Can SIR be as popular as multiple linear regression? Stat Sinica 8:289–316
MathSciNet Google Scholar
Cho RJ et al (1998) A genomewide transcriptional analysis of the mitotic cell cycle. Mol Cell 2(1):65–73
MathSciNet Google Scholar
Chouakria A (1998) Extension de l’analyse en composantes principales ’a des donn’ees de type intervalle. Doctoral thesis; University of Paris IX Dauphine
Cook RD (1994) On the interpretation of regression plots. J Am Stat Assoc 89:177–190
MathSciNet Google Scholar
Cook RD (1996) Graphics for regressions with a binary response. J Am Stat Assoc 91:983–992
MathSciNet Google Scholar
Cook RD (2000) SAVE: a method for dimension reduction and graphics in regression. Commun Stat Theor Methods 29:2109–2121
Google Scholar
Cook RD, Critchley F (2000) Identifying regression outliers and mixtures graphically. J Am Stat Assoc 95:781–794
MathSciNet Google Scholar
Cox TF, Cox MAA (2001) Multidimensional scaling. Chapman and Hall, London
Google Scholar
Diday E (2016) Thinking by classes in data science: the symbolic data analysis paradigm. WIREs Comput Stat 8:172–205
MathSciNet Google Scholar
Douzal-Chouakria A, Billard L, Diday E (2011) Principal component analysis for interval-valued observations. Stat Anal Data Min 4:229–246
MathSciNet Google Scholar
D’Urso P, Giordani P (2004) A least squares approach to principal component analysis for interval valued data. Chem Intell Lab Syst 70:179–192
Google Scholar
Ernst J, Nau GJ, Bar-Joseph Z (2005) Clustering short time series gene expression data. Bioinformatics 21(Suppl 1):i159-68
Google Scholar
Fernández A, García S, Galar M, Prati RC, Krawczyk B, Herrera F (2018) Dimensionality reduction for imbalanced learning. In: Learning from imbalanced data sets. Springer, Cham
Ferre L (1998) Determining the dimension in sliced inverse regression and related methods. J Am Stat Assoc 93(441):132–140
MathSciNet Google Scholar
Gioia F, Lauro NC (2006) Principal component analysis on interval data. Comput Stat 21:343–363
MathSciNet Google Scholar
Giordani P, Kiers HAL (2006) A comparison of three methods for principal component analysis of fuzzy interval data. Comput Stati Data Anal 51:379–397
MathSciNet Google Scholar
Gracia A, Gonzalez S, Robles V, Menasalvas E (2014) A methodology to compare dimensionality reduction algorithms in terms of loss of quality. Inf Sci 270:1–27
MathSciNet Google Scholar
Ichino M (2011) The quantile method for symbolic principal component analysis. Stat Anal Data Min 4(2):184–198
MathSciNet Google Scholar
Irpino A (2006) Spaghetti PCA analysis: an extension of principal components analysis to time dependent interval data. Pattern Recogn Lett 27:504–513
Google Scholar
Irpino A (2013) Basic univariate and bivariate statistics for symbolic data: a critical review. Technical report
Klemelä J (2009) Smoothing of multivariate data: density estimation and visualization. Publisher: Wiley; 1 edition
Lauro CN, Gioia F (2006) Dependence and interdependence analysis for interval-valued variables. In: Bock H-H, Ferligoj A, Ziberna A (eds) Data Sci Classif, vol Batagelj. Springer-Verlag, Berlin, pp 171–183
Google Scholar
Lauro CN, Palumbo F (2000) Principal component analysis of interval data: a symbolic analysis approach. Comput Stat 15(1):73–87
Google Scholar
Lauro CN, Verde R, Irpino A (2008) Principal component analysis of symbolic data described by intervals, pp 279 – 311. In: Symbolic data analysis and the SODAS software edited by Edwin Diday. 2008
Lauro CN, Verde R (2000) Factorial data analysis on symbolic objects under cohesion constrains. In: Kiers HAL, Rasson JP, Groenen PJP, Schader M (eds) Data analysis classification and related methods. Springer-Verlag, Heidelberg
Google Scholar
Le-Rademacher J, Billard L (2012) Symbolic-covariance principal component analysis and visualization for interval-valued data. J Comput Graph Stat 21(2):413–432
MathSciNet Google Scholar
Lee JA, Verleysen M (2009) Quality assessment of dimensionality reduction: rank-based criteria. Neurocomputing 72:1431–1443
Google Scholar
Li KC (1991) Sliced inverse regression for dimension reduction. J Am Stat Assoc 86:316–342
MathSciNet Google Scholar
Li W, Guo J, Chen Y, Wang M (2016) A new representation of interval symbolic data and its application in dynamic clustering. J Classif 33(1):149–165
MathSciNet Google Scholar
Liquet B, Saracco J (2012) A graphical tool for selecting the number of slices and the dimension of the model in SIR and SAVE approaches. Comput Stat 27:103–125
MathSciNet Google Scholar
Lu HS, Wu HM (2010) Visualization, screening, and classification of cell cycle-regulated genes in yeast. Int J Syst Synth Biol 1(2):185–198
Google Scholar
Maia ALS, de Carvalho FAT, Ludermir TB (2008) Forecasting models for interval-valued time series. Neurocomputing 71(16–18):3344–3352
Google Scholar
Nueda MJ, Conesa A, Westerhuis JA, Hoefsloot HCJ, Smilde AK, Talon M, Ferrer A (2007) Discovering gene expression patterns in time course microarray experiments by ANOVA-SCA. Bioinformatics 23(14):1792–1800
Google Scholar
Palumbo F, Lauro CN (2003) A PCA for interval valued data based on midpoints and radii. In: Yanai H, Okada A, Shigematu K, Kano Y, Meulman JJ (eds) New developments in psychometrics. Springer-Verlag, Japan, pp 641–648
Google Scholar
Park J, Sriram TN, Yin X (2009) Central mean subspace in time series. J Comput Graph Stat 18:717–730
MathSciNet Google Scholar
Park J, Sriram TN, Yin X (2010) Dimension reduction in time series. Stat Sinica 20:747–770
MathSciNet Google Scholar
Raychaudhuri S, Stuart JM, Altman RB (2000) Principal components analysis to summarize microarray experiments: application to sporulation time series. Pacific symposium on biocomputing pp 455- 466
Sardá-Espinosa A (2019) Time-series clustering in r using the Dtwclust package. R J 11(1):22–43
Google Scholar
Setodji CM, Cook RD (2004) K-means inverse regression. Technometrics 46(4):421–429
MathSciNet Google Scholar
Spellman PT et al (1998) Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell 9(12):3273–3297
Google Scholar
Sun Y, Wong AKC, Kamel MS (2009) Classification of imbalanced data: a review. Int J Pattern Recogn Artif Intell 23(04):687–719
Google Scholar
Teles P, Brito P (2015) Modeling interval time series with space-time. Commun Stat Theory Methods 44(17):3599–3627
MathSciNet Google Scholar
Tsay RS (2010) Analysis of financial time series, 3rd edn. Wiley
Google Scholar
Wall ME, Rechtsteiner A, Rocha LM (2003) Singular value decomposition and principal component analysis. In: A practical approach to microarray data analysis (D.P. Berrar, W. Dubitzky and M. Granzow, eds.) Kluwer: Norwell, MA, pp 91 – 109
Wang H, Guan R, Wu J (2012) CIPCA: complete-information-based principal component analysis for interval-valued data. Neurocomputing 86:158–169
Google Scholar
Wei WWS (2019) Multivariate time series analysis with applications. Wiley
Google Scholar
Wills G (2012) Visualizing time: designing graphical representations for statistical data (statistics and computing). Springer, Verlag New York Inc
Google Scholar
Wu HM (2008) Kernel Sliced inverse regression with applications on classification. J Comput Graph Stat 17(3):590–610
MathSciNet Google Scholar
Wu HM, Lu HHS (2004) Supervised motion segmentation by spatial-frequential analysis and dynamic sliced inverse regression. Stat Sinica 14:413–430
MathSciNet Google Scholar
Wu HM, Kao CH, Chen CH (2020) Dimension reduction and visualization of symbolic interval-valued data using sliced inverse regression. In: Advances in data science: symbolic, complex, and network data (eds. Diday, E., Guan, R., Saporta, G., and Wang, H.). Wiley, pp 49 – 78
Yao WT, Wu HM (2013) Isometric sliced inverse regression or nonlinear manifolds learning. Stat Comput 23:563–576
MathSciNet Google Scholar
Zhao J, Chevalier F, Pietriga E, Balakrishnan R (2011) Exploratory analysis of time-series with chronolenses. IEEE Transact Vis Comput Graph 17(12):2422–2431
Google Scholar

Download references

Acknowledgements

We are grateful for the valuable comments provided by the Editor, Associate Editor, and anonymous referees. Their input has helped us improve the paper immensely.

Funding

This research was supported by grants from the Ministry of Science and Technology of Taiwan, R. O. C. under the grants MOST103-2118-M-032-006 and MOST111-2628-E-038-001-MY2.

Author information

Authors and Affiliations

Graduate Institute of Biomedical Informatics, Taipei Medical University, Taipei City, 11031, Taiwan, ROC
Emily Chia-Yu Su
Department of Statistics, National Chengchi University, Taipei City, 11605, Taiwan, ROC
Han-Ming Wu

Authors

Emily Chia-Yu Su
View author publications
You can also search for this author in PubMed Google Scholar
Han-Ming Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Han-Ming Wu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (PDF 184 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Su, E.CY., Wu, HM. Dimension reduction and visualization of multiple time series data: a symbolic data analysis approach. Comput Stat 39, 1937–1969 (2024). https://doi.org/10.1007/s00180-023-01440-7

Download citation

Received: 02 December 2022
Accepted: 09 November 2023
Published: 06 December 2023
Issue Date: June 2024
DOI: https://doi.org/10.1007/s00180-023-01440-7

Dimension reduction and visualization of multiple time series data: a symbolic data analysis approach

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Dimensionality Reduction for Visualization of Time Series and Trajectories

Dimensionality reduction for multivariate time-series data mining

Optimal dimension reduction for high-dimensional and functional time series

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (PDF 184 kb)

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Dimension reduction and visualization of multiple time series data: a symbolic data analysis approach

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Dimensionality Reduction for Visualization of Time Series and Trajectories

Dimensionality reduction for multivariate time-series data mining

Optimal dimension reduction for high-dimensional and functional time series

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (PDF 184 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation