×

Two-stage data segmentation permitting multiscale change points, heavy tails and dependence. (English) Zbl 1497.62230

Summary: The segmentation of a time series into piecewise stationary segments is an important problem both in time series analysis and signal processing. In the presence of multiscale change points with both large jumps over short intervals and small jumps over long intervals, multiscale methods achieve good adaptivity but require a model selection step for removing false positives and duplicate estimators. We propose a localised application of the Schwarz criterion, which is applicable with any multiscale candidate generating procedure fulfilling mild assumptions, and establish its theoretical consistency in estimating the number and locations of multiple change points under general assumptions permitting heavy tails and dependence. In particular, combined with a MOSUM-based candidate generating procedure, it attains minimax rate optimality in both detection lower bound and localisation for i.i.d. sub-Gaussian errors. Overall competitiveness of the proposed methodology compared to existing methods is shown through its theoretical and numerical performance.

MSC:

62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
62G32 Statistics of extreme values; tail inference

References:

[1] Arias-Castro, E.; Candes, EJ; Durand, A., Detection of an anomalous cluster in a network, The Annals of Statistics, 39, 278-304 (2011) · Zbl 1209.62097
[2] Baranowski, R.; Chen, Y.; Fryzlewicz, P., Narrowest-over-threshold detection of multiple change-points and change-point-like features, Journal of the Royal Statistical Society: Series B, 81, 649-672 (2019) · Zbl 1420.62157 · doi:10.1111/rssb.12322
[3] Berkes, I.; Liu, W.; Wu, WB, Komlós-Major-Tusnády approximation under dependence, The Annals of Probability, 42, 794-817 (2014) · Zbl 1308.60037 · doi:10.1214/13-AOP850
[4] Boysen, L.; Kempe, A.; Liebscher, V.; Munk, A.; Wittich, O., Consistencies and rates of convergence of jump-penalized least squares estimators, The Annals of Statistics, 37, 157-183 (2009) · Zbl 1155.62034 · doi:10.1214/07-AOS558
[5] Chan, H. P., Chen, H. (2017). Multi-sequence segmentation via score and higher-criticism tests. arXiv preprint, arXiv:1706.07586.
[6] Chan, HP; Walther, G., Detection with the scan and the average likelihood ratio, Statistica Sinica, 23, 409-428 (2013) · Zbl 1257.62096
[7] Chan, K. W. (2020). Mean-structure and autocorrelation consistent covariance matrix estimation. Journal of Business & Economic Statistics, 1-15.
[8] Chan, NH; Yau, CY; Zhang, R-M, Group lasso for structural break time series, Journal of the American Statistical Association, 109, 590-599 (2014) · Zbl 1367.62251 · doi:10.1080/01621459.2013.866566
[9] Cho, H.; Fryzlewicz, P., Multiscale and multilevel technique for consistent segmentation of nonstationary time series, Statistica Sinica, 22, 207-229 (2012) · Zbl 1417.62240 · doi:10.5705/ss.2009.280
[10] Cho, H., Kirch, C. (2020). Data segmentation algorithms: Univariate mean change and beyond. arXiv preprint arXiv:2012.12814.
[11] Csörgö, M.; Horváth, L., Limit theorems in change-point analysis (1997), New York: Wiley, New York · Zbl 0884.62023
[12] Davis, RA; Yau, CY, Consistency of minimum description length model selection for piecewise stationary time series models, Electronic Journal of Statistics, 7, 381-411 (2013) · Zbl 1337.62254 · doi:10.1214/13-EJS769
[13] De Haan, L.; Ferreira, A., Extreme value theory: An introduction (2007), New York: Springer, New York · Zbl 1101.62002
[14] Dette, H., Schüler, T., Vetter, M. (2020). Multiscale change point detection for dependent data. To appear in Scandinavian Journal of Statistics · Zbl 1467.62064
[15] Diskin, SJ; Li, M.; Hou, C.; Yang, S.; Glessner, J.; Hakonarson, H.; Bucan, M.; Maris, JM; Wang, K., Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms, Nucleic Acids Research, 36, e126-e126 (2008) · doi:10.1093/nar/gkn556
[16] Eichinger, B.; Kirch, C., A MOSUM procedure for the estimation of multiple random change points, Bernoulli, 24, 526-564 (2018) · Zbl 1388.62251 · doi:10.3150/16-BEJ887
[17] Fisch, A. T. M., Eckley, I. A., Fearnhead, P. (2018). A linear time method for the detection of point and collective anomalies. arXiv preprint arXiv:1806.01947.
[18] Frick, K.; Munk, A.; Sieling, H., Multiscale change point inference, Journal of the Royal Statistical Society: Series B, 76, 495-580 (2014) · Zbl 1411.62065 · doi:10.1111/rssb.12047
[19] Fromont, M., Lerasle, M., Verzelen, N. (2020). Optimal change point detection and localization. arXiv preprint, arXiv:2010.11470.
[20] Fryzlewicz, P., Wild binary segmentation for multiple change-point detection, The Annals of Statistics, 42, 2243-2281 (2014) · Zbl 1302.62075 · doi:10.1214/14-AOS1245
[21] Fryzlewicz, P. (2018). Tail-greedy bottom-up data decompositions and fast multiple change-point detection. The Annals of Statistics, 3390-3421. · Zbl 1454.62109
[22] Horváth, L.; Rice, G., Extensions of some classical methods in change point analysis, TEST, 23, 1-37 (2014) · Zbl 1305.62310 · doi:10.1007/s11749-014-0351-0
[23] Killick, R.; Fearnhead, P.; Eckley, IA, Optimal detection of changepoints with a linear computational cost, Journal of the American Statistical Association, 107, 1590-1598 (2012) · Zbl 1258.62091 · doi:10.1080/01621459.2012.737745
[24] Kirch, C. (2006). Resampling methods for the change analysis of dependent data. Universität zu Köln. PhD thesis. · Zbl 1189.62078
[25] Kirch, C., Kamgaing, J. T. (2015a). Detection of change points in discrete valued time series. In Handbook of discrete valued time series (pp. 219-244). · Zbl 1311.62122
[26] Kirch, C.; Kamgaing, JT, On the use of estimating functions in monitoring time series for change points, Journal of Statistical Planning and Inference, 161, 25-49 (2015) · Zbl 1311.62122 · doi:10.1016/j.jspi.2014.12.009
[27] Kirch, C., Klein, P. (2021). Moving sum data segmentation for stochastics processes based on invariance. Statistica Sinica (to appear).
[28] Kirch, C.; Weber, S., Modified sequential change point procedures based on estimating functions, Electronic Journal of Statistics, 12, 1579-1613 (2018) · Zbl 1392.62241 · doi:10.1214/18-EJS1431
[29] Komlós, J.; Major, P.; Tusnády, G., An approximation of partial sums of independent RV’s, and the sample DF. I, Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 32, 111-131 (1975) · Zbl 0308.60029 · doi:10.1007/BF00533093
[30] Komlós, J.; Major, P.; Tusnády, G., An approximation of partial sums of independent RV’s, and the sample DF. II, Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 34, 33-58 (1976) · Zbl 0307.60045 · doi:10.1007/BF00532688
[31] Kuelbs, J., Philipp, W. (1980). Almost sure invariance principles for partial sums of mixing \(B\)-valued random variables. The Annals of Probability, 1003-1036. · Zbl 0451.60008
[32] Kühn, C., An estimator of the number of change points based on a weak invariance principle, Statistics & Probability Letters, 51, 189-196 (2001) · Zbl 0979.62065 · doi:10.1016/S0167-7152(00)00155-3
[33] Li, H.; Munk, A.; Sieling, H., FDR-control in multiscale change-point segmentation, Electronic Journal of Statistics, 10, 918-959 (2016) · Zbl 1338.62117
[34] Li, H.; Guo, Q.; Munk, A., Multiscale change-point segmentation: Beyond step functions, Electronic Journal of Statistics, 13, 2, 3254-3296 (2019) · Zbl 1429.62145
[35] Maidstone, R.; Hocking, T.; Rigaill, G.; Fearnhead, P., On optimal multiple changepoint algorithms for large data, Statistics and Computing, 27, 519-533 (2017) · Zbl 1505.62269 · doi:10.1007/s11222-016-9636-3
[36] Meier, A.; Cho, H.; Kirch, C., mosum: Moving sum based procedures for changes in the mean, R package version, 1, 2, 5 (2021)
[37] Meier, A.; Kirch, C.; Cho, H., mosum: A package for moving sums in change point analysis, Journal of Statistical Software, 97, 8, 1-42 (2021) · doi:10.18637/jss.v097.i08
[38] Messer, M.; Kirchner, M.; Schiemann, J.; Roeper, J.; Neininger, R.; Schneider, G., A multiple filter test for the detection of rate changes in renewal processes with varying variance, The Annals of Applied Statistics, 8, 2027-2067 (2014) · Zbl 1454.62365 · doi:10.1214/14-AOAS782
[39] Messer, M.; Albert, S.; Schneider, G., The multiple filter test for change point detection in time series, Metrika, 81, 589-607 (2018) · Zbl 1415.62065 · doi:10.1007/s00184-018-0672-1
[40] Mikosch, T.; Moser, M., The limit distribution of the maximum increment of a random walk with dependent regularly varying jump sizes, Probability Theory and Related Fields, 156, 249-272 (2013) · Zbl 1282.60053 · doi:10.1007/s00440-012-0427-2
[41] Mikosch, T.; Račkauskas, A., The limit distribution of the maximum increment of a random walk with regularly varying jump size distribution, Bernoulli, 16, 1016-1038 (2010) · Zbl 1215.60018 · doi:10.3150/10-BEJ255
[42] Niu, YS; Zhang, H., The screening and ranking algorithm to detect DNA copy number variations, The Annals of Applied Statistics, 6, 1306-1326 (2012) · Zbl 1401.92145 · doi:10.1214/12-AOAS539
[43] Olshen, AB; Venkatraman, E.; Lucito, R.; Wigler, M., Circular binary segmentation for the analysis of array-based DNA copy number data, Biostatistics, 5, 557-572 (2004) · Zbl 1155.62478 · doi:10.1093/biostatistics/kxh008
[44] Page, ES, Continuous inspection schemes, Biometrika, 41, 100-115 (1954) · Zbl 0056.38002 · doi:10.1093/biomet/41.1-2.100
[45] Reckrühm, K. (2019). Estimating multiple structural breaks in time series-a generalized MOSUM approach based on estimating functions. Magdeburg, Germany: Otto von Guericke University. PhD thesis.
[46] Safikhani, A., Shojaie, A. (2020). Joint structural break detection and parameter estimation in high-dimensional non-stationary VAR models. To appear in Journal of the American Statistical Association
[47] Schlüter, S., Fischer, M. J. (2009). A tail quantile approximation formula for the student \(t\) and the symmetric generalized hyperbolic distribution. FAU Discussion Papers in Economics 05/2009, Friedrich-Alexander University Erlangen-Nuremberg, Institute for Economics.
[48] Schwarz, G., Estimating the dimension of a model, The Annals of Statistics, 6, 461-464 (1978) · Zbl 0379.62005 · doi:10.1214/aos/1176344136
[49] Seshan, V. E., Olshen, A. (2018). DNAcopy: DNA copy number data analysis. R package version, 1(54).
[50] Shao, Q-M, On a conjecture of Révész, Proceedings of the American Mathematical Society, 123, 575-582 (1995) · Zbl 0809.60036
[51] Snijders, AM; Nowak, N.; Segraves, R.; Blackwood, S.; Brown, N.; Conroy, J.; Hamilton, G.; Hindle, AK; Huey, B.; Kimura, K., Assembly of microarrays for genome-wide measurement of DNA copy number, Nature Genetics, 29, 263 (2001) · doi:10.1038/ng754
[52] Titsias, MK; Holmes, CC; Yau, C., Statistical inference in hidden Markov models using k-segment constraints, Journal of the American Statistical Association, 111, 200-215 (2016) · doi:10.1080/01621459.2014.998762
[53] Vershynin, R., High-dimensional probability: An introduction with applications in data science (2018), Cambridge: Cambridge University Press, Cambridge · Zbl 1430.60005
[54] Wang, D., Yu, Y., Rinaldo, A. (2020a). Optimal covariance change point localization in high dimension. To appear in Bernoulli.
[55] Wang, D.; Yu, Y.; Rinaldo, A., Univariate mean change point detection: Penalization, cusum and optimality, Electronic Journal of Statistics, 14, 1917-1961 (2020) · Zbl 1442.62097
[56] Wang, T.; Samworth, RJ, High dimensional change point estimation via sparse projection, Journal of the Royal Statistical Society: Series B, 80, 57-83 (2018) · Zbl 1439.62199 · doi:10.1111/rssb.12243
[57] Xiao, F.; Min, X.; Zhang, H., Modified screening and ranking algorithm for copy number variation detection, Bioinformatics, 31, 1341-1348 (2014) · doi:10.1093/bioinformatics/btu850
[58] Xiao, F., Niu, Y., Hao, N., Xu, Y., Jin, Z., Zhang, H. (2016). modSaRa: modSaRa: a computationally efficient R package for CNV identification. R package version,1.
[59] Yau, CY, Estimating the number of change-points via Schwarz’ criterion, Statistics & Probability Letters, 6, 181-189 (1988) · Zbl 0642.62016 · doi:10.1016/0167-7152(88)90118-6
[60] Yau, CY; Zhao, Z., Inference for multiple change points in time series via likelihood ratio scan statistics, Journal of the Royal Statistical Society: Series B, 78, 895-916 (2016) · Zbl 1414.62386 · doi:10.1111/rssb.12139
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.