×

Changepoint detection in the presence of outliers. (English) Zbl 1478.62238

Summary: Many traditional methods for identifying changepoints can struggle in the presence of outliers, or when the noise is heavy-tailed. Often they will infer additional changepoints to fit the outliers. To overcome this problem, data often needs to be preprocessed to remove outliers, though this is difficult for applications where the data needs to be analyzed online. We present an approach to changepoint detection that is robust to the presence of outliers. The idea is to adapt existing penalized cost approaches for detecting changes so that they use loss functions that are less sensitive to outliers. We argue that loss functions that are bounded, such as the classical biweight loss, are particularly suitable – as we show that only bounded loss functions are robust to arbitrarily extreme outliers. We present an efficient dynamic programming algorithm that can find the optimal segmentation under our penalized cost criteria. Importantly, this algorithm can be used in settings where the data needs to be analyzed online. We show that we can consistently estimate the number of changepoints, and accurately estimate their locations, using the biweight loss function. We demonstrate the usefulness of our approach for applications such as analyzing well-log data, detecting copy number variation, and detecting tampering of wireless devices.

MSC:

62M07 Non-Markovian processes: hypothesis testing
62F35 Robustness and adaptive procedures (parametric inference)

Software:

changepoint; wbs

References:

[1] Adams, R. P.; Mackay, D. J., Bayesian Online Changepoint Detection,, arXiv: 0710.3742 (2007)
[2] Bagci, I. E., Novel Security Mechanisms for Wireless Sensor Networks, (2016)
[3] Bagci, I. E.; Roedig, U.; Martinovic, I.; Schulz, M.; Hollick, M., Using Channel State Information for Tamper Detection in the Internet of Things, (2015)
[4] Bai, J., Estimating Multiple Breaks One at a Time, Econometric Theory, 13, 315-352 (1997)
[5] Baranowski, R.; Chen, Y.; Fryzlewicz, P., Narrowest-Over-Threshold Detection of Multiple Change-Points and Change-Point-Like Features, (2016)
[6] Bengtsson, H., Neuvial, P., Seshan, V. E., Olshen, A. B., Spellman, P. T., and Olshen, R. A. (2016), “Package pscbs,” available at
[7] Cao, H.; Wu, W. B., Changepoint Estimation: Another Look at Multiple Testing Problems, Biometrika, 102, 974-980 (2015) · Zbl 1390.62138
[8] Fearnhead, P., Exact and Efficient Inference for Multiple Changepoint Problems, Statistics and Computing, 16, 203-213 (2006)
[9] Frick, K.; Munk, A.; Sieling, H., Multiscale Change-Point Inference, (2014) · Zbl 1411.62065
[10] Fryzlewicz, P., Wild Binary Segmentation for Multiple Change-Point Detection, Annals of Statistics, 42, 2243-2281 (2014) · Zbl 1302.62075
[11] Futschik, A.; Hotz, T.; Munk, A.; Sieling, H., Multiscale DNA Partitioning: Statistical Evidence for Segments, Bioinformatics, 30, 2255-2262 (2014)
[12] Haynes, K.; Eckley, I. A.; Fearnhead, P., Computationally Efficient Changepoint Detection for a Range of Penalties, Journal of Computational and Graphical Statistics, 26, 134-143 (2017)
[13] Haynes, K.; Fearnhead, P.; Eckley, I., A Computationally Efficient Nonparametric Approach for Changepoint Detection, Statistics and Computing, 27, 1293-1305 (2017) · Zbl 1505.62181
[14] Hinkley, D. V., Inference About the Change-Point From Cumulative Sum Tests, Biometrika, 58, 509-523 (1971) · Zbl 0254.62019
[15] Hotz, T.; Schütte, O. M.; Sieling, H.; Polupanow, T.; Diederichsen, U.; Steinem, C.; Munk, A., Idealizing Ion Channel Recordings by a Jump Segmentation Multiresolution Filter, IEEE Transactions on Nanobioscience, 12, 376-386 (2013)
[16] Huber, P. J.; Lovric, M., Robust Statistics. International Encyclopedia of Statistical Science (2011), Berlin · Zbl 1241.62001
[17] Hušková, M.; Hackl, P.; Westlund, A. H., Recursive M-Tests for the Change-Point Problem, Economic Structural Change, 13-33 (1991), Berlin: Springer, Berlin
[18] ———, Robust Change Point Analysis, Robustness and Complex Data Structures, 171-190 (2013), Springer
[19] Hušková, M.; Marušiaková, M., M-Procedures for Detection of Changes for Dependent Observations,, Communications in Statistics-Simulation and Computation, 41, 1032-1050 (2012) · Zbl 1347.62071
[20] Hušková, M.; Picek, J., Bootstrap in Detection of Changes in Linear Regression,, Sankhyā: The Indian Journal of Statistics, 67, 200-226 (2005) · Zbl 1193.62111
[21] Hušková, M.; Sen, P. K., Nonparametric Tests for Shift and Change in Regression at an Unknown Time Point,, Statistical Analysis and Forecasting of Economic Structural Change, 71-85 (1989), Springer
[22] Johnson, N. A., A Dynamic Programming Algorithm for the Fused Lasso and \(\textit{l_0\)
[23] Killick, R.; Eckley, I. A.; Ewans, K.; Jonathan, P., Detection of Changes in Variance of Oceanographic Time-Series Using Changepoint Analysis, Ocean Engineering, 37, 1120-1126 (2010)
[24] Killick, R.; Fearnhead, P.; Eckley, I. A., Optimal Detection of Changepoints With a Linear Computational Cost, Journal of the American Statistical Association, 107, 1590-1598 (2012) · Zbl 1258.62091
[25] Kim, C.-J.; Morley, J. C.; Nelson, C. R., The Structural Break in the Equity Premium, Journal of Business & Economic Statistics, 23, 181-191 (2005)
[26] Lavielle, M.; Moulines, E., Least-Squares Estimation of an Unknown Number of Shifts in a Time Series, Journal of Time Series Analysis, 21, 33-59 (2000) · Zbl 0974.62070
[27] Ma, T. F.; Yau, C. Y., A Pairwise Likelihood-Based Approach for Changepoint Detection in Multivariate Time Series Models, Biometrika, 103, 409-421 (2016) · Zbl 1499.62314
[28] Maidstone, R.; Hocking, T.; Rigaill, G.; Fearnhead, P., On Optimal Multiple Changepoint Algorithms for Large Data, Statistics and Computing, 27, 519-533 (2017) · Zbl 1505.62269
[29] National Research Council (2013), Frontiers in Massive Data Analysis, Washington, DC: The National Academies Press, .
[30] Olshen, A. B.; Venkatraman, E. S.; Lucito, R.; Wigler, M., Circular Binary Segmentation for the Analysis of Array-Based DNA Copy Number Data,, Biostatistics, 5, 557-72 (2004) · Zbl 1155.62478
[31] Óruanaidh, J. J. K.; Fitzgerald, W. J., Numerical Bayesion Methods Applied to Signal Processing (1996) · Zbl 0871.62025
[32] Page, E., Continuous Inspection Schemes, Biometrika, 41, 100-115 (1954) · Zbl 0056.38002
[33] Pierre-Jean, M.; Rigaill, G.; Neuvial, P., Performance Evaluation of DNA Copy Number Segmentation Methods, Briefings in Bioinformatics, 16, 600-615 (2015)
[34] Reeves, J.; Chen, J.; Wang, X. L.; Lund, R.; Lu, Q. Q., A Review and Comparison of Changepoint Detection Techniques for Climate Data, Journal of Applied Meteorology and Climatology, 46, 900-915 (2007)
[35] Rigaill, G., A Pruned Dynamic Programming Algorithm to Recover the Best Segmentations With 1 to \(K_{max \) · Zbl 1381.90094
[36] Rigaill, G.; Hocking, T. D.; Bach, F.; Vert, J.-P., Learning Sparse Penalties for Change-Point Detection Using Max Margin Interval Regression,, Proceedings of the 30th International Conference on Machine Learning, JMLR W&CP, 28, 172-180 (2013)
[37] Ruggieri, E.; Antonellis, M., An Exact Approach to Bayesian Sequential Change Point Detection, Computational Statistics & Data Analysis, 97, 71-86 (2016) · Zbl 1468.62168
[38] Vostrikova, L., Detection of the Disorder in Multidimensional Random-Processes, Doklady Akademii Nauk SSSR, 259, 270-274 (1981)
[39] Worsley, K., On the Likelihood Ratio Test for a Shift in Location of Normal Populations, Journal of the American Statistical Association, 74, 365-367 (1979) · Zbl 0413.62016
[40] Wyse, J.; Friel, N., Approximate Simulation-Free Bayesian Inference for Multiple Changepoint Models With Dependence Within Segments, Bayesian Analysis, 6, 501-528 (2011) · Zbl 1330.62160
[41] Yao, Y.-C., Estimation of a Noisy Discrete-Time Step Function: Bayes and Empirical Bayes Approaches,, The Annals of Statistics, 12, 1434-1447 (1984) · Zbl 0551.62069
[42] Yao, Y.-C.; Au, S., Least-Squares Estimation of a Step Function,, Sankhyā: The Indian Journal of Statistics, 51, 370-381 (1989) · Zbl 0711.62031
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.