×

Segmentation and estimation of change-point models: false positive control and confidence regions. (English) Zbl 1451.62035

Segmentation and estimation methods of change-point models are known to detect multiple change-points with high statistical accuracy. The goal of segmentation is to obtain those intervals in which the sequences behave as approximately stationary.
In this paper, the authors consider the following problem. Let \(X_{1},X_{2},\dots,X_{m}\) be independent and normally distributed with variances equal 1. Let us assume that there is \(M\ge 0\) and integers \(0=\tau_{0}<\tau_{1}<\cdots<\tau_{M}<\tau_{M+1}=m\) such that the mean \(\mu_{i}\) of \(X_{i}\), \(1\le i\le m\), is a step function with constant values on each of intervals \(\left(\tau_{k-1},\tau_{k} \right]\), \(1\le k \le M+1 \), but different values on adjacent intervals. The purpose of segmentation is to determine the value of \(M\), the \(\tau_{k}\) and also the \(\mu_{i}\).
Authors’ abstract: “To segment a sequence of independent random variables at an unknown number of change-points, we introduce new procedures that are based on thresholding the likelihood ratio statistic, and give approximations for the probability of a false positive error when there are no change-points. We also study confidence regions based on the likelihood ratio statistic for the change-points and joint confidence regions for the change-points and the parameter values. Applications to segment array CGH data are discussed.”

MSC:

62G05 Nonparametric estimation
62G15 Nonparametric tolerance and confidence regions
62P10 Applications of statistics to biology and medical sciences; meta analysis
92D20 Protein sequences, DNA sequences
62G10 Nonparametric hypothesis testing

Software:

wbs

References:

[1] Aston, J. A. D. and Kirch, C. (2012). Evaluating stationarity via change-point alternatives with applications to fMRI data. Ann. Appl. Stat. 6 1906-1948. · Zbl 1257.62072 · doi:10.1214/12-AOAS565
[2] Baranowski, R., Chen, Y. and Fryzlewicz, P. (2019). Narrowest-over-threshold detection of multiple change points and change-point-like features. J. R. Stat. Soc. Ser. B. Stat. Methodol. 81 649-672. · Zbl 1420.62157 · doi:10.1111/rssb.12322
[3] Chan, H. P. and Chen, H. (2017). Multi-sequence segmentation via score and higher-criticism tests. Available at arXiv:1706.07586v1.
[4] Churchill, G. A. (1989). Stochastic models for heterogeneous DNA sequences. Bull. Math. Biol. 51 79-94. · Zbl 0662.92012 · doi:10.1016/S0092-8240(89)80049-7
[5] Du, C., Kao, C.-L. M. and Kou, S. C. (2016). Stepwise signal extraction via marginal likelihood. J. Amer. Statist. Assoc. 111 314-330.
[6] Dümbgen, L. and Spokoiny, V. G. (2001). Multiscale testing of qualitative hypotheses. Ann. Statist. 29 124-152. · Zbl 1029.62070 · doi:10.1214/aos/996986504
[7] Elhaik, E., Graur, D. and Josic, K. (2010). Comparative testing of DNA segmentation algorithms using benchmark simulations. Mol. Biol. Evol. 27 1015-1024.
[8] Fang, X., Li, J. and Siegmund, D. (2020). Supplement to “Segmentation and estimation of change-point models: False positive control and confidence regions.” https://doi.org/10.1214/19-AOS1861SUPP.
[9] Frick, K., Munk, A. and Sieling, H. (2014). Multiscale change point inference. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 495-580. · Zbl 1411.62065 · doi:10.1111/rssb.12047
[10] Fryzlewicz, P. (2014). Wild binary segmentation for multiple change-point detection. Ann. Statist. 42 2243-2281. · Zbl 1302.62075 · doi:10.1214/14-AOS1245
[11] Hao, N., Niu, Y. S. and Zhang, H. (2013). Multiple change-point detection via a screening and ranking algorithm. Statist. Sinica 23 1553-1572. · Zbl 1417.62236
[12] Lai, W. R., Johnson, M. D., Kucherlapati, R. and Park, P. J. (2005). Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics 21 3763-3770.
[13] Niu, Y. S. and Zhang, H. (2012). The screening and ranking algorithm to detect DNA copy number variations. Ann. Appl. Stat. 6 1306-1326. · Zbl 1401.92145 · doi:10.1214/12-AOAS539
[14] Olshen, A. B., Venkatraman, E. S., Lucito, R. and Wigler, M. (2004). Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5 557-572. · Zbl 1155.62478 · doi:10.1093/biostatistics/kxh008
[15] Picard, F., Robin, S., Lavielle, M., Vaisse, C. and Daudin, J. J. (2005). A statistical approach for array CGH data analysis. BMC Bioinform. 6 27.
[16] Pollack, J. R., Perou, C. M., Alizadeh, A. A., Eisen, M. B., Pergamenschikov, A., Williams, C. F., Jeffrey, S. S., Botstein, D. and Brown, P. O. (1999). Genome-wide analysis of DNA copy-number changes using cDNA microarrays. Nat. Genet. 23 41-46.
[17] Pollack, J. R., Sørlie, T., Perou, C. M., Rees, C. A., Jeffrey, S. S., Lonning, P. E., Tibshirani, R., Botstein, D., Børresen-Dale, A. L. et al. (2002). Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors. Proc. Natl. Acad. Sci. USA 99 12963-12968.
[18] Robbins, M. W., Gallagher, C. M. and Lund, R. B. (2016). A general regression changepoint test for time series data. J. Amer. Statist. Assoc. 111 670-683.
[19] Schwartzman, A., Gavrilov, Y. and Adler, R. J. (2011). Multiple testing of local maxima for detection of peaks in 1D. Ann. Statist. 39 3290-3319. · Zbl 1246.62173 · doi:10.1214/11-AOS943
[20] Siegmund, D. (1985). Sequential Analysis: Tests and Confidence Intervals. Springer Series in Statistics. Springer, New York. · Zbl 0573.62071
[21] Siegmund, D. (1988a). Approximate tail probabilities for the maxima of some random fields. Ann. Probab. 16 487-501. · Zbl 0646.60032 · doi:10.1214/aop/1176991769
[22] Siegmund, D. (1988b). Confidence sets in change-point problems. Int. Stat. Rev. 56 31-48. · Zbl 0684.62028 · doi:10.2307/1403360
[23] Siegmund, D. and Yakir, B. (2000). Tail probabilities for the null distribution of scanning statistics. Bernoulli 6 191-213. · Zbl 0976.62048 · doi:10.2307/3318574
[24] Siegmund, D. and Yakir, B. (2007). The Statistics of Gene Mapping. Statistics for Biology and Health. Springer, New York. · Zbl 1280.62012
[25] Siegmund, D. O., Zhang, N. R. and Yakir, B. (2011). False discovery rate for scanning statistics. Biometrika 98 979-985. · Zbl 1228.62090 · doi:10.1093/biomet/asr057
[26] Snijders, A. M., Fridlyand, J., Mans, D. A., Segraves, R., Jain, A. N., Pinkel, D. and Albertsonn, D. G. (2003). Shaping of tumor and drug-resistant genomes by instability and selection. Oncogene 22 4370-4379.
[27] Tu, I. and Siegmund, D. (1999). The maximum of a function of a Markov chain and application to linkage analysis. Adv. in Appl. Probab. 31 510-531. · Zbl 0941.60088 · doi:10.1239/aap/1029955145
[28] Vostrikova, L. (1981). Detecting ‘disorder’ in multidimensional random processes. Sov. Math., Dokl. 24 55-59. · Zbl 0487.62072
[29] Worsley, K. J. (1986). Confidence regions and test for a change-point in a sequence of exponential family random variables. Biometrika 73 91-104. · Zbl 0589.62016 · doi:10.1093/biomet/73.1.91
[30] Yakir, B. (2013). Extremes in Random Fields: A Theory and Its Applications. Wiley Series in Probability and Statistics. Wiley, Chichester. · Zbl 1320.60004
[31] Zhang, Y. and Liu, J. S. (2011). Fast and accurate approximation to significance tests in genome-wide association studies. J. Amer. Statist. Assoc. 106 846-857. · Zbl 1229.62150 · doi:10.1198/jasa.2011.ap10657
[32] Zhang, N. R. and Siegmund, D. O. (2007). A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization data. Biometrics 63 22-32. · Zbl 1206.62174 · doi:10.1111/j.1541-0420.2006.00662.x
[33] Zhang, N. R., Siegmund, D. O., Ji, H. and Li, J. Z. (2010). Detecting simultaneous changepoints in multiple sequences. With supplementary data available online. Biometrika 97 631-645. · Zbl 1195.62168 · doi:10.1093/biomet/asq025
[34] Zhang, N. R., Yakir, B., Xia, L. C. and Siegmund, D. (2016). Scan statistics on Poisson random fields with applications in genomics. Ann. Appl. Stat. 10 726-755. · Zbl 1400.62300 · doi:10.1214/15-AOAS892
[35] Zhao, X.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.