×

Multiple change-points detection in high dimension. (English) Zbl 1437.62202

Summary: Change-point detection is an integral component of statistical modeling and estimation. For high-dimensional data, classical methods based on the Mahalanobis distance are typically inapplicable. We propose a novel testing statistic by combining a modified Euclidean distance and an extreme statistic, and its null distribution is asymptotically normal. The new method naturally strikes a balance between the detection abilities for both dense and sparse changes, which gives itself an edge to potentially outperform existing methods. Furthermore, the number of change-points is determined by a new Schwarz’s information criterion together with a pre-screening procedure, and the locations of the change-points can be estimated via the dynamic programming algorithm in conjunction with the intrinsic order structure of the objective function. Under some mild conditions, we show that the new method provides consistent estimation with an almost optimal rate. Simulation studies show that the proposed method has satisfactory performance of identifying multiple change-points in terms of power and estimation accuracy, and two real data examples are used for illustration.

MSC:

62H15 Hypothesis testing in multivariate analysis
62H12 Estimation in multivariate analysis
62G10 Nonparametric hypothesis testing
90C39 Dynamic programming

Software:

changepoint; wbs
Full Text: DOI

References:

[1] Aston, J. A. D. and Kirch, C., Evaluating stationarity via change-point alternatives with applications to FMRI data, Ann. Appl. Statist.6 (2012) 1906-1948. · Zbl 1257.62072
[2] J. A. D. Aston and C. Kirch, Change points in high dimensional settings, preprint (2014), arXiv:1409.1771.
[3] Aue, A. and Horváth, L., Structural breaks in time series, J. Time Series Anal.34 (2013) 1-16. · Zbl 1274.62553
[4] Bai, J., Common breaks in means and variances for panel data, J. Econ.157 (2010) 78-92. · Zbl 1431.62353
[5] Bai, J. and Perron, P., Estimating and testing linear models with multiple structural changes, Econometrica70 (1998) 9-38. · Zbl 1056.62523
[6] Bai, Z. and Saranadasa, H., Effect of high dimension: By an example of a two sample problem, Statist. Sin.6 (1996) 311-329. · Zbl 0848.62030
[7] Berkes, I., Gabrys, R., Horváth, L. and Kokoszka, P., Detecting changes in the mean of functional observations, J. Roy. Statist. Soc.: Ser. B (Statist. Methodol.)71 (2009) 927-946. · Zbl 1411.62153
[8] Boysen, L., Kempe, A., Liebscher, V., Munk, A. and Wittich, O., Consistencies and rates of convergence of jump-penalized least squares estimators, Ann. Statist.37 (2009) 157-183. · Zbl 1155.62034
[9] Chen, H. and Jiang, T., A study of two high-dimensional likelihood ratio tests under alternative hypotheses, Random Matrices: Theory Appl.7 (2018) 1750016. · Zbl 1485.62070
[10] Chen, S.-X. and Qin, Y.-L., A two-sample test for high-dimensional data with applications to gene-set testing, Ann. Statist.38 (2010) 808-835. · Zbl 1183.62095
[11] Cho, H. and Fryzlewicz, P., Multiple change-point detection for high dimensional time series via sparsified binary segmentation, J. Roy. Statist. Soc.: Ser. B (Statist. Methodol.)77 (2015) 475-507. · Zbl 1414.62356
[12] Csörgö, M. and Horváth, L., Limit Theorems in Change-point Analysis (John Wiley \(\&\) Sons, 1997). · Zbl 0884.62023
[13] F. Enikeeva and Z. Harchaoui, High-dimensional change-point detection with sparse alternatives, preprint (2013), arXiv:1312.1900. · Zbl 1427.62036
[14] Fan, J., Test of significance based on wavelet thresholding and Neyman’s truncation, J. Amer. Statist. Assoc.91 (1996) 674-688. · Zbl 0869.62032
[15] Fan, J., Han, F. and Liu, H., Challenges of big data analysis, Natl. Sci. Rev.1 (2014) 293-314.
[16] Fan, J., Liao, Y. and Yao, J., Power enhancement in high-dimensional cross-sectional tests, Econometrica83 (2015) 1497-1541. · Zbl 1410.62201
[17] Feng, L., Zou, C. and Wang, Z., Multivariate-sign-based high-dimensional tests for the two-sample location problem, J. Amer. Statist. Assoc.111 (2016) 721-735.
[18] Feng, L., Zou, C., Wang, Z. and Zhu, L., Two-sample Behrens-Fisher problem for high-dimensional data, Statist. Sin.25 (2015) 1297-1312. · Zbl 1377.62144
[19] Fryzlewicz, P., Wild binary segmentation for multiple change-point detection, Ann. Statist.42 (2014) 2243-2281. · Zbl 1302.62075
[20] Hall, P. and Heyde, C. C., Martingale Limit Theory and its Applications (Academic Press, 1980). · Zbl 0462.60045
[21] Hao, N., Niu, Y. and Zhang, H., Multiple change-point detection via a screening and ranking algorithm, Statist. Sin.23 (2013) 1553-1572. · Zbl 1417.62236
[22] Horváth, L. and Hušková, M., Change-point detection in panel data, J. Time Ser. Anal.33 (2012) 831-648. · Zbl 1282.62181
[23] Jandhyala, V., Fotopoulos, S., MacNeill, I. and Liu, P., Inference for single and multiple change-points in time series, J. Time Ser. Anal.34 (2013) 423-446. · Zbl 1275.62061
[24] Jirak, M., Uniform change point test in high dimension, Ann. Statist.43 (2015) 2451-2483. · Zbl 1327.62467
[25] Killick, R., Fearnhead, P. and Eckley, I. A., Optimal detection of changepoints with a linear computational cost, J. Amer. Statist. Assoc.107 (2012) 1590-1598. · Zbl 1258.62091
[26] Lavielle, M., Using penalized contrasts for the change-point problem, Signal Process.85 (2005) 1501-1510. · Zbl 1160.94341
[27] Matteson, D. S. and James, N. A., A nonparametric approach for multiple change point analysis of multivariate data, J. Amer. Statisit. Assoc.109 (2014) 334-345. · Zbl 1367.62260
[28] Mei, Y., Efficient scalable schemes for monitoring a large number of data streams, Biometrika97 (2010) 419-433. · Zbl 1406.62088
[29] Niu, Y. S., Hao, N. and Dong, B., A new reduced-rank linear discriminant analysis method and its applications, Statist. Sin.28 (2018) 189-202. · Zbl 1382.62034
[30] Niu, Y. S., Hao, N. and Zhang, H., Multiple change-point detection: A selective overview, Statist. Sci.31 (2016) 611-623. · Zbl 1442.62170
[31] Niu, Y. S. and Zhang, H., The screening and ranking algorithm to detect DNA copy number variations, Ann. Appl. Statist.6 (2012) 1306-1326. · Zbl 1401.92145
[32] Onatski, A., Detection of weak signals in high-dimensional complex-valued data, Random Matrices: Theory Appl.3 (2014) 1450001. · Zbl 1291.94026
[33] Rudelson, M. and Vershynin, R., Hanson-Wright inequality and sub-Gaussian concentration, Electron. Commun. Probab.18 (2013) 1-9. · Zbl 1329.60056
[34] Srivastava, M. S. and Worsley, K. J., Likelihood ratio tests for a change in the multivariate normal mean, J. Amer. Statist. Assoc.81 (1986) 199-204. · Zbl 0589.62037
[35] E. S. Venkatraman, Consistency results in multiple change-point situations, Unpublished Ph.D. Thesis, Department of Statistics, Stanford University (1992).
[36] Y. Xie and D. Siegmund, Sequential multi-sensor change-point detection, Information Theory and Applications Workshop, IEEE (2013), pp. 670-692. · Zbl 1267.62084
[37] Yao, Y. C., Estimating the number of change-points via Schwarz’ criterion, Statist. Probab. Lett.6 (1988) 181-189. · Zbl 0642.62016
[38] Zou, C., Wang, Z., Jiang, W. and Zi, X., An efficient online monitoring method for high-dimensional data streams, Technometrics57 (2015) 374-387.
[39] Zou, C., Yin, G., Feng, L. and Wang, Z., Nonparametric maximum likelihood approach to multiple change-point problems, Ann. Statist.42 (2014) 970-1002. · Zbl 1305.62158
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.