×

Signal detection via Phi-divergences for general mixtures. (English) Zbl 1428.62412

Summary: The family of goodness-of-fit tests based on \(\Phi\)-divergences is known to be optimal for detecting signals hidden in high-dimensional noise data when the heterogeneous normal mixture model is underlying. This test family includes Tukey’s popular higher criticism test and the famous Berk-Jones test. In this paper we address the open question whether the tests’ optimality is still present beyond the prime normal mixture model. On the one hand, we transfer the known optimality of the higher criticism test for different models, for example, for the heteroscedastic normal, general Gaussian and exponential-\(\chi^2\)-mixture models, to the whole test family. On the other hand, we discuss the optimality for new model classes based on exponential families including the scale exponential, the scale Fréchet and the location Gumbel models. For all these examples we apply a general machinery which might be used to show the tests’ optimality for further models/model classes in future.

MSC:

62G10 Nonparametric hypothesis testing
62E20 Asymptotic distribution theory in statistics

References:

[1] Ali, S.M. and Silvey, S.D. (1966). A general class of coefficients of divergence of one distribution from another. J. Roy. Statist. Soc. Ser. B28 131-142. · Zbl 0203.19902 · doi:10.1111/j.2517-6161.1966.tb00626.x
[2] Arias-Castro, E., Candès, E.J. and Plan, Y. (2011). Global testing under sparse alternatives: ANOVA, multiple comparisons and the higher criticism. Ann. Statist.39 2533-2556. · Zbl 1231.62136 · doi:10.1214/11-AOS910
[3] Arias-Castro, E. and Wang, M. (2015). The sparse Poisson means model. Electron. J. Stat.9 2170-2201. · Zbl 1337.62088 · doi:10.1214/15-EJS1066
[4] Arias-Castro, E. and Wang, M. (2017). Distribution-free tests for sparse heterogeneous mixtures. TEST26 71-94. · Zbl 1422.62159 · doi:10.1007/s11749-016-0499-x
[5] Barndorff-Nielsen, O. (1978). Information and Exponential Families in Statistical Theory. Chichester: Wiley. · Zbl 0387.62011
[6] Berk, R.H. and Jones, D.H. (1979). Goodness-of-fit test statistics that dominate the Kolmogorov statistics. Z. Wahrsch. Verw. Gebiete47 47-59. · Zbl 0379.62026 · doi:10.1007/BF00533250
[7] Bingham, N.H., Goldie, C.M. and Teugels, J.L. (1987). Regular Variation. Encyclopedia of Mathematics and Its Applications27. Cambridge: Cambridge Univ. Press.
[8] Cai, T.T., Jeng, X.J. and Jin, J. (2011). Optimal detection of heterogeneous and heteroscedastic mixtures. J. R. Stat. Soc. Ser. B. Stat. Methodol.73 629-662. · Zbl 1228.62020 · doi:10.1111/j.1467-9868.2011.00778.x
[9] Cai, T.T. and Wu, Y. (2014). Optimal detection of sparse mixtures against a given null distribution. IEEE Trans. Inform. Theory60 2217-2232. · Zbl 1360.94108 · doi:10.1109/TIT.2014.2304295
[10] Cayon, L., Jin, J. and Treaster, A. (2004). Higher criticism statistic: Detecting and identifying non-Gaussianity in the WMAP first year data. Mon. Not. R. Astron. Soc.362 826-832.
[11] Chang, L.-C. (1955). On the ratio of an empirical distribution function to the theoretical distribution function. Acta Math. Sinica5 347-368. · Zbl 0068.33204
[12] Cressie, N. and Read, T.R.C. (1984). Multinomial goodness-of-fit tests. J. Roy. Statist. Soc. Ser. B46 440-464. · Zbl 0571.62017 · doi:10.1111/j.2517-6161.1984.tb01318.x
[13] Csiszár, I. (1967). Information-type measures of difference of probability distributions and indirect observations. Studia Sci. Math. Hungar.2 299-318. · Zbl 0157.25802
[14] Dai, H., Charnigo, R., Srivastava, T., Talebizadeh, Z. and Qing, S. (2012). Integrating P-values for genetic and genomic data analysis. J. Biom. Biostat. 3-7.
[15] Ditzhaus, M. (2017). The power of tests for signal detection under high-dimensional data. Ph.D. thesis, Heinrich-Heine-Univ. Duesseldorf. Available at https://docserv.uni-duesseldorf.de/servlets/DocumentServlet?id=42808.
[16] Ditzhaus, M. and Janssen, A. (2017). Detectability of nonparametric signals: Higher criticism versus likelihood ratio. Available at 1709.07264v2. · Zbl 1409.62096 · doi:10.1214/18-EJS1502
[17] Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist.32 962-994. · Zbl 1092.62051 · doi:10.1214/009053604000000265
[18] Donoho, D. and Jin, J. (2009). Feature selection by higher criticism thresholding achieves the optimal phase diagram. Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci.367 4449-4470. · Zbl 1185.62113 · doi:10.1098/rsta.2009.0129
[19] Donoho, D. and Jin, J. (2015). Higher criticism for large-scale inference, especially for rare and weak effects. Statist. Sci.30 1-25. · Zbl 1332.62019 · doi:10.1214/14-STS506
[20] Eicker, F. (1979). The asymptotic distribution of the suprema of the standardized empirical processes. Ann. Statist.7 116-138. · Zbl 0398.62014 · doi:10.1214/aos/1176344559
[21] Feller, W. (1966). An Introduction to Probability Theory and Its Applications. Vol. II. New York: Wiley. · Zbl 0138.10207
[22] Goldstein, D.B. (2009). Common genetic variation and human traits. N. Engl. J. Med.360 1696-1698.
[23] Gontscharuk, V., Landwehr, S. and Finner, H. (2015). The intermediates take it all: Asymptotics of higher criticism statistics and a powerful alternative based on equal local levels. Biom. J.57 159-180. · Zbl 1309.62082 · doi:10.1002/bimj.201300255
[24] Gontscharuk, V., Landwehr, S. and Finner, H. (2016). Goodness of fit tests in terms of local levels with special emphasis on higher criticism tests. Bernoulli22 1331-1363. · Zbl 1388.62129 · doi:10.3150/14-BEJ694
[25] Hall, P., Pittelkow, Y. and Ghosh, M. (2008). Theoretical measures of relative performance of classifiers for high dimensional data with small sample sizes. J. R. Stat. Soc. Ser. B. Stat. Methodol.70 159-173. · Zbl 1400.62094 · doi:10.1111/j.1467-9868.2007.00631.x
[26] Ingster, Y.I., Tsybakov, A.B. and Verzelen, N. (2010). Detection boundary in sparse regression. Electron. J. Stat.4 1476-1526. · Zbl 1329.62314 · doi:10.1214/10-EJS589
[27] Ingster, Yu.I. (1997). Some problems of hypothesis testing leading to infinitely divisible distributions. Math. Methods Statist.6 47-69. · Zbl 0878.62005
[28] Iyengar, S.K. and Elston, R.C. (2007). The genetic basis of complex traits: Rare variants or “common gene, common disease”? Methods Mol. Biol.376 71-84.
[29] Jaeschke, D. (1979). The asymptotic distribution of the supremum of the standardized empirical distribution function on subintervals. Ann. Statist.7 108-115. · Zbl 0398.62013 · doi:10.1214/aos/1176344558
[30] Jager, L. and Wellner, J.A. (2007). Goodness-of-fit tests via phi-divergences. Ann. Statist.35 2018-2053. · Zbl 1126.62030 · doi:10.1214/0009053607000000244
[31] Jin, J. (2009). Impossibility of successful classification when useful features are rare and weak. Proc. Natl. Acad. Sci. USA106 8859-8864. · Zbl 1203.68064 · doi:10.1073/pnas.0903931106
[32] Jin, J., Starck, J.-L., Donoho, D.L., Aghanim, N. and Forni, O. (2005). Cosmological non-Gaussian signature detection: Comparing performance of different statistical tests. EURASIP J. Appl. Signal Process.15 2470-2485. · Zbl 1127.94335
[33] Khmaladze, E. and Shinjikashvili, E. (2001). Calculation of noncrossing probabilities for Poisson processes and its corollaries. Adv. in Appl. Probab.33 702-716. · Zbl 1158.60365 · doi:10.1239/aap/1005091361
[34] Khmaladze, E.V. (1998). Goodness of fit tests for “chimeric” alternatives. Stat. Neerl.52 90-111. · Zbl 0953.62042 · doi:10.1111/1467-9574.00070
[35] Kulldorff, M., Heffernan, R., Hartman, J., Assunção, R. and Mostashari, F. (2005). A space – time permutation scan statistic for disease outbreak detection. PLoS Med.2 e59.
[36] Mukherjee, R., Pillai, N.S. and Lin, X. (2015). Hypothesis testing for high-dimensional sparse binary regression. Ann. Statist.43 352-381. · Zbl 1308.62094 · doi:10.1214/14-AOS1279
[37] Neill, D. and Lingwall, J. (2007). A nonparametric scan statistic for multivariate disease surveillance. Advances in Disease Surveillance4 106-116.
[38] Saligrama, V. and Zhao, M. (2012). Local anomaly detection. JMLR W&CP22 969-983.
[39] Shorack, G.R. and Wellner, J.A. (1986). Empirical Processes with Applications to Statistics. New York: Wiley. · Zbl 1170.62365
[40] Strasser, H. (1985). Mathematical Theory of Statistics. De Gruyter Studies in Mathematics7. Berlin: de Gruyter. · Zbl 0594.62017
[41] Tukey, J.W. (1976). T13 N: The Higher Criticism. Coures Notes. Stat411. Princeton: Princeton University Press.
[42] Tukey, J.W. (1994). The Collected Works of John W. Tukey. Vol. VIII. London: Chapman and Hall. · Zbl 0807.01035
[43] Wellner, J.A. (1978). Limit theorems for the ratio of the empirical distribution function to the true distribution function. Z. Wahrsch. Verw. Gebiete45 73-88. · Zbl 0382.60031 · doi:10.1007/BF00635964
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.