×

A setwise EWMA scheme for monitoring high-dimensional datastreams. (English) Zbl 1440.62208

Summary: The monitoring of high-dimensional data streams has become increasingly important for real-time detection of abnormal activities in many statistical process control (SPC) applications. Although the multivariate SPC has been extensively studied in the literature, the challenges associated with designing a practical monitoring scheme for high-dimensional processes when between-streams correlation exists are yet to be addressed well. Classical \(T^2\)-test-based schemes do not work well because the contamination bias in estimating the covariance matrix grows rapidly with the increase of dimension. We propose a test statistic which is based on the “divide-and-conquer” strategy, and integrate this statistic into the multivariate exponentially weighted moving average charting scheme for Phase II process monitoring. The key idea is to calculate the \(T^2\) statistics on low-dimensional sub-vectors and to combine them together. The proposed procedure is essentially distribution-free and computation efficient. The control limit is obtained through the asymptotic distribution of the test statistic under some mild conditions on the dependence structure of stream observations. Our asymptotic results also shed light on quantifying the size of a reference sample required. Both theoretical analysis and numerical results show that the proposed method is able to control the false alarm rate and deliver robust change detection.

MSC:

62H12 Estimation in multivariate analysis
62R07 Statistical aspects of big data and data science
62L12 Sequential estimation
62G10 Nonparametric hypothesis testing
62P30 Applications of statistics in engineering and industry; control charts
Full Text: DOI

References:

[1] J. Aston and C. Kirch, Change points in high dimensional settings, preprint (2015), arXiv:1409.1771.
[2] Bai, Z. and Saranadasa, H., Effect of high dimension: By an example of a two sample problem, Statist. Sinica6 (1996) 311-329. · Zbl 0848.62030
[3] Benjamini, Y. and Heller, R., False discovery rates for spatial signals, J. Amer. Statist. Assoc.102 (2007) 1272-1281. · Zbl 1332.94019
[4] Brook, D. and Evans, D. A., An approach to the probability distribution of CUSUM run length, Biometrika59 (1972) 539-549. · Zbl 0265.62038
[5] Cai, T., Liu, W. and Xia, Y., Two-sample test of high dimensional means under dependence, J. R. Stat. Soc. Ser. B76 (2014) 349-372. · Zbl 07555454
[6] Capizzi, G. and Masarotto, G., An adaptive exponentially weighted moving average control chart, Technometrics45 (2003) 199-207.
[7] Champ, C. W., Jones-Farmer, L. A. and Rigdon, S. E., Properties of the \(T^2\) control chart when the parameters are estimated, Technometrics47 (2005) 437-445.
[8] Chatterjee, S. and Qiu, P., Distribution free cumulative sum control charts using bootstrap-based control limits, Ann. Appl. Stat.3 (2009) 349-369. · Zbl 1160.62095
[9] S. X. Chen, J. Li and P.-S. Zhong, Two-sample tests for high dimensional means with thresholding and data transformation, preprint (2014), arXiv:1410.2848.
[10] Chen, S. X. and Qin, Y.-L., A two-sample test for high-dimensional data with applications to gene-set testing, Ann. Statist.38 (2010) 808-835. · Zbl 1183.62095
[11] Chen, L. S., Paul, D., Prentice, R. L. and Wang, P., A regularized hotelling’s \(t^2\) test for pathway analysis in proteomic studies, J. Amer. Statist. Assoc.106 (2011) 1345-1360. · Zbl 1234.62082
[12] Cho, H. and Fryzlewicz, P., Multiple change-point detection for high-dimensional time series via sparsified binary segmentation, J. R. Stat. Soc. Ser. B77 (2015) 475-507. · Zbl 1414.62356
[13] Croisier, R. B., Multivariate generalizations of cumulative sum quality-control schemes, Technometrics30 (1988) 243-251.
[14] B. Donovan, A. Mori, N. Agrawal, Y. Meng, J. Lee and D. D. Work, New York City Hourly Traffic Estimates (2010-2013) (2016), https://doi.org/10.13012/B2IDB-4900670_V1.
[15] J. Fan, Y. Liao and J. Yao, Power enhancement in high dimensional cross-sectional tests, preprint (2015), arXiv:1310.3899. · Zbl 1410.62201
[16] Feng, L., Zou, C. and Wang, Z., Multivariate-sign-based high-dimensional tests for the two-sample location problem, J. Amer. Statist. Assoc.111 (2016) 721-735.
[17] K. B. Gregory, R. J. Carroll, V. Baladandayuthapani and S. N. Lahiri, A two-sample test for equality of means in high dimension, J. Amer. Statist. Assoc. (2014), in press. · Zbl 1373.62274
[18] Guerriero, M., Willett, P. and Glaz, J., Distributed target detection in sensor networks using scan statistics, IEEE T Signal Proces.57 (2009) 2629-2639. · Zbl 1391.94517
[19] Han, D. and Tsung, F., A generalized EWMA control chart and its comparison with the optimal EWMA, CUSUM and GLR schemes, Ann. Statist.32 (2004) 316-339. · Zbl 1105.62385
[20] Han, D. and Tsung, F., A reference-free cuscore chart for dynamic mean change detection and a unified framework for charting performance comparison, J. Amer. Statist. Assoc.101 (2006) 368-386. · Zbl 1118.62384
[21] Hawkins, D. M. and Maboudou-Tchao, E. M., Self-starting multivariate exponentially weighted moving average control charting, Technometrics49 (2007) 199-209.
[22] Hawkins, D. M. and Olwell, D. H., Cumulative Sum Charts and Charting for Quality Improvement (Springer-Verlag, New York, 1998). · Zbl 0990.62537
[23] Horváth, L. and Hušková, M., Change-point detection in panel data, J. Time Ser. Anal.33 (2012) 631-648. · Zbl 1282.62181
[24] Huang, Y. and Li, R., Projection test for high-dimensional mean with optimal direction, Manuscript (2015).
[25] Jiang, W., Wang, K. and Tsung, F., A variable-selection-based multivariate EWMA chart for process monitoring and diagnosis, J. Qual. Technol.44 (2012) 209-230.
[26] Jones, L. A., Champ, C. W. and Rigdon, S. E., The performance of exponentially weighted moving average charts with estimated parameters, Technometrics43 (2001) 156-167.
[27] Li, W., Pu, X., Tsung, F. and Xiang, D., A robust self-starting spatial rank multivariate EWMA chart based on forward variable selection, Comput. Ind. Eng.103 (2017) 116-130.
[28] Liang, W., Xiang, D. and Pu, X., A robust multivariate ewma control chart for detecting sparse mean shifts, J. Qual. Technol.48 (2016) 265-265.
[29] Lopes, M. E., Jacob, L. and Wainwright, M. J., A more powerful two-sample test in high dimensions using random projection, in Advances in Neural Information Processing Systems (2011), pp. 1206-1214.
[30] Lowry, C. A., Woodall, W. H., Champ, C. W. and Rigdon, S. E., Multivariate exponentially weighted moving average control chart, Technometrics34 (1992) 46-53. · Zbl 0761.62144
[31] Lucas, J. M. and Saccucci, M. S., Exponentially weighted moving average control scheme properties and enhancements, Technometrics32 (1990) 1-29.
[32] McCulloh, I. A., Johnson, A. N. and Carley, K. M., Spectral analysis of social networks to identify periodicity, J. Math. Sociol.36 (2012) 80-96.
[33] Mei, Y., Efficient scalable schemes for monitoring a large number of data streams, Biometrika97 (2010) 419-433. · Zbl 1406.62088
[34] Ro, K., Zou, C., Wang, Z. and Yin, G., Outlier detection for high dimensional data, Biometrika102 (2015) 589-599. · Zbl 1452.62378
[35] Runger, G. C. and Prabhu, S. S., A markov chain model for the multivariate exponentially weighted moving averages control chart, J. Amer. Statist. Assoc.91 (1996) 1701-1706. · Zbl 0881.62105
[36] Spiegelhalter, D., Sherlaw-Johnson, C., Bardsley, M., Blunt, I., Wood, C. and Grigg, O., Statistical methods for healthcare regulation: Rating, screening, and surveillance, J. R. Stat. Soc. Ser. A175 (2012) 1-47.
[37] Srivastava, M. S. and Du, M., A test for the mean vector with fewer observations than the dimension, J. Multivar. Anal.99 (2008) 386-402. · Zbl 1148.62042
[38] Stoumbos, Z. G., Marion, R., Reynolds, J., Ryan, T. P. and Woodall, W. H., The state of statistical process control as we proceed into the 21st century, J. Amer. Statist. Assoc.95 (2000) 992-998.
[39] Sun, W. G., Reich, B. J., Cai, T., Guindani, M. and Schwartzman, A., False discovery control in large scale spatial multiple testing, J. R. Stat. Soc. Ser. B77 (2015) 59-83. · Zbl 1414.62043
[40] Tartakovsky, A. G., Rozovskii, B. L., Blazek, R. B. and Kim, H., Detection of intrusions in information systems by sequential change-point methods (with discussion), Stat. Meth.3 (2006) 252-340. · Zbl 1248.94032
[41] Tsung, F., Zhou, Z. H. and Jiang, W., Applying manufacturing batch techniques to fraud detection with incomplete customer information, IIE Trans.39 (2007) 671-680.
[42] Veeravalli, V. V., Decentralized quickest change detection, IEEE T. Inform. Theory47 (2001) 1657-1665. · Zbl 1017.94516
[43] Wang, L., Peng, B. and Li, R., A high-dimensional nonparametric multivariate test for mean vector, J. Amer. Statist. Assoc.110(512) (2015) 1658-1669. · Zbl 1373.62280
[44] Woodall, W. H. and Montgomery, D. C., Some current directions in the theory and application of statistical process monitoring, J. Qual. Technol.46 (2014) 79-94.
[45] Xie, Y. and Siegmund, D., Sequential multi-sensor change-point detection, Ann. Statist.41 (2013) 670-692. · Zbl 1267.62084
[46] C. Xu, Y. Q. Zhang and R. Li, On the feasibility of distributed kernel regression for big data, preprint (2015), arXiv:1505.00869.
[47] Zhong, P.-S., Chen, S. X. and Xu, M., Tests alternative to higher criticism for high-dimensional means under sparsity and column-wise dependence, Ann. Statist.41 (2013) 2820-2851. · Zbl 1294.62128
[48] Zou, C. and Qiu, P., Multivariate statistical process control using LASSO, J. Amer. Statist. Assoc.104 (2009) 1586-1596. · Zbl 1205.62214
[49] Zou, C., Wang, Z., Jiang, W. and Zi, X., An efficient on-line monitoring method for high-dimensional data streams, Technometrics57 (2015) 374-387.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.