×

Optimal detection of weak positive latent dependence between two sequences of multiple tests. (English) Zbl 1397.62266

Summary: It is frequently of interest to jointly analyze two paired sequences of multiple tests. This paper studies the problem of detecting whether there are more pairs of tests that are significant in both sequences than would be expected by chance. The asymptotic detection boundary is derived in terms of parameters such as the sparsity of non-null cases in each sequence, the effect sizes of the signals, and the magnitude of the dependence between the two sequences. A new test for detecting weak dependence is also proposed, shown to be asymptotically adaptively optimal, studied in simulations, and applied to study genetic pleiotropy in 10 pediatric autoimmune diseases.

MSC:

62J15 Paired and multiple comparisons; multiple testing
62G10 Nonparametric hypothesis testing
62G20 Asymptotic properties of nonparametric inference
62P10 Applications of statistics to biology and medical sciences; meta analysis

References:

[1] Arias-Castro, E.; Candès, E. J.; Plan, Y., Global testing under sparse alternatives: ANOVA, multiple comparisons and the higher criticism, Ann. Statist., 39, 5, 2533-2556 (2011) · Zbl 1231.62136
[2] Arias-Castro, E.; Wang, M., Distribution-free tests for sparse heterogeneous mixtures, TEST, 26, 1, 71-94 (2017) · Zbl 1422.62159
[3] Barnett, I.; Mukherjee, R.; Lin, X., The generalized higher criticism for testing SNP-set effects in genetic association studies, J. Amer. Statist. Assoc., 112, 517, 64-76 (2017)
[4] Barnett, I. J.; Lin, X., Analytical \(p\)-value calculation for the higher criticism test in finite-\(d\) problems, Biometrika, 101, 4, 964-970 (2014) · Zbl 1306.62219
[5] Benjamini, Y.; Hochberg, Y., Controlling the false discovery rate: a practical and powerful approach to multiple testing, J. R. Stat. Soc. Ser. B Stat. Methodol., 57, 289-300 (1995) · Zbl 0809.62014
[6] Bickel, P.; Chernoff, H., Asymptotic distribution of the likelihood ratio statistic in a prototypical nonregular problem, (Ghosh, J.; Mitra, S.; Parthasarathy, K.; Prakasa Rao, B., Statistics and Probability: A Raghu Raj Bahadur Festschrift (1983), Wiley Eastern: Wiley Eastern New Dehli), 83-96
[7] Blum, J.; Kiefer, J.; Rosenblatt, M., Distribution free tests of independence based on the sample distribution function, Ann. Math. Stat., 32, 485-498 (1961) · Zbl 0139.36301
[8] Brown, B. C.; Ye, C. J.; Price, A. L.; Zaitlen, N.; Asian Genetic Epidemiology Network Type 2. Diabetes Consortium, Transethnic genetic-correlation estimates from summary statistics, Am. J. Hum. Genet., 99, 1, 76-88 (2016)
[9] Cai, T. T.; Jeng, X. J.; Jin, J., Optimal detection of heterogeneous and heteroscedastic mixtures, J. R. Stat. Soc. Ser. B Stat. Methodol., 73, 5, 629-662 (2011) · Zbl 1228.62020
[10] Cai, T. T.; Wu, Y., Optimal detection of sparse mixtures against a given null distribution, IEEE Trans. Inform. Theory, 60, 4, 2217-2232 (2014) · Zbl 1360.94108
[11] Chung, D.; Yang, C.; Li, C.; Gelernter, J.; Zhao, H., GPA: A statistical approach to prioritizing GWAS Results by integrating pleiotropy and annotation, PLoS Genet., 10, 11, e1004787 (2014)
[12] Cross-Disorder Group of the Psychiatric Genomics Consortium, Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs, Nature Genet., 45, 9, 984-994 (2013)
[13] Cross-Disorder Group of the Psychiatric Genomics Consortium, Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis, Lancet, 381, 9875, 1371 (2013)
[14] Delaigle, A.; Hall, P.; Jin, J., Robustness and accuracy of methods for high dimensional data analysis based on student’s \(t\)-statistic, J. R. Stat. Soc. Ser. B Stat. Methodol., 73, 3, 283-301 (2011) · Zbl 1411.62222
[15] Donoho, D.; Jin, J., Higher criticism for detecting sparse heterogeneous mixtures, Ann. Statist., 32, 3, 962-994 (2004) · Zbl 1092.62051
[16] Efron, B., Large-scale inference: Empirical bayes methods for estimation, testing, and prediction (2010), Cambridge University Press: Cambridge University Press Cambridge · Zbl 1277.62016
[17] Einmahl, J. H., Extension to higher dimensions of the Jaeschke-Eicker result on the standardized empirical process, Commun. Stat. Theory Methods, 25, 4, 813-822 (1996) · Zbl 0875.62073
[18] Einmahl, J. H.; Mason, D. M., Bounds for weighted multivariate empirical distribution functions, Probab. Theory Relat. Fields, 70, 4, 563-571 (1985) · Zbl 0554.60036
[19] Fan, Y.; de Micheaux, P. L.; Penev, S.; Salopek, D., Multivariate nonparametric test of independence, J. Multivariate Anal., 153, 189-210 (2017) · Zbl 1351.62104
[20] Genest, C.; Rémillard, B., Tests of independence and randomness based on the empirical copula process, Test, 13, 335-370 (2004) · Zbl 1069.62039
[21] Genovese, C.; Wasserman, L., Operating characteristics and extensions of the false discovery rate procedure, J. R. Stat. Soc. Ser. B Stat. Methodol., 64, 3, 499-517 (2002) · Zbl 1090.62072
[23] Hall, P.; Jin, J., Properties of higher criticism under strong dependence, Ann. Statist., 36, 381-402 (2008) · Zbl 1139.62049
[24] Hall, P.; Jin, J., Innovated higher criticism for detecting sparse signals in correlated noise, Ann. Statist., 38, 3, 1686-1732 (2010) · Zbl 1189.62080
[25] Hartigan, J., A failure of likelihood asymptotics for normal mixtures, Proceedings of the Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer, 807-810 (1985), Wadsworth, Belmont, CA · Zbl 1373.62070
[26] He, X.; Fuller, C. K.; Song, Y.; Meng, Q.; Zhang, B.; Yang, X.; Li, H., Sherlock: Detecting gene-disease associations by matching patterns of expression QTL and GWAS, Am. J. Hum. Genet., 92, 5, 667-680 (2013)
[27] Heller, R.; Bogomolov, M.; Benjamini, Y., Deciding whether follow-up studies have replicated findings in a preliminary large-scale omics study, Proc. Natl. Acad. Sci., 111, 46, 16262-16267 (2014)
[28] Heller, R.; Heller, Y.; Kaufman, S.; Brill, B.; Gorfine, M., Consistent distribution-free \(k\)-sample and independence tests for univariate random variables, J. Mach. Learn. Res., 17, 29, 1-54 (2016) · Zbl 1360.62217
[29] Heller, R.; Yaacoby, S.; Yekutieli, D., repfdr: A tool for replicability analysis for genome-wide association studies, Bioinformatics, 30, 2971-2972 (2014)
[30] Heller, R.; Yekutieli, D., Replicability analysis for genome-wide association studies, Ann. Appl. Stat., 8, 1, 481-498 (2014) · Zbl 1454.62340
[31] Hoeffding, W., A non-parametric test of independence, Ann. Math. Stat., 19, 546-557 (1948) · Zbl 0032.42001
[32] Huang, D. W.; Sherman, B. T.; Lempicki, R. A., Bioinformatics enrichment tools: Paths toward the comprehensive functional analysis of large gene lists, Nucleic Acids Res., 37, 1, 1-13 (2009)
[33] Ingster, Y. I., Some problems of hypothesis testing leading to infinitely divisible distributions, Math. Methods Statist., 6, 1, 47-69 (1997) · Zbl 0878.62005
[34] Ingster, Y. I., Adaptive detection of a signal of growing dimension, I, Math. Methods Statist., 10, 395-421 (2002) · Zbl 1005.62051
[35] Ingster, Y. I., Adaptive detection of a signal of growing dimension, II, Math. Methods Statist., 11, 1, 37-68 (2002) · Zbl 1005.62052
[36] Jager, L.; Wellner, J. A., Goodness-of-fit tests via phi-divergences, Ann. Statist., 35, 5, 2018-2053 (2007) · Zbl 1126.62030
[37] Ledwina, T.; Wyłupek, G., Validation of positive quadrant dependence, Insurance Math. Econom., 56, 38-47 (2014) · Zbl 1304.62081
[38] Lee, S. H.; DeCandia, T. R.; Ripke, S.; Yang, J.; Sullivan, P. F.; Goddard, M. E.; Keller, M. C.; Visscher, P. M.; Wray, N. R.; Consortium, S. P. G.-W. A. S., Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs, Nature Genet., 44, 3, 247-250 (2012)
[39] Lehmann, E. E.L.; Romano, J. P., Testing statistical hypotheses (2005), Springer Science+Business Media: Springer Science+Business Media New York · Zbl 1076.62018
[40] Li, J.; Siegmund, D., Higher criticism: \(p\)-values and criticism, Ann. Statist., 43, 3, 1323-1350 (2015) · Zbl 1320.62039
[41] Li, Y. R.; Li, J.; Zhao, S. D.; Bradfield, J. P.; Mentch, F. D.; Maggadottir, S. M.; Hou, C.; Abrams, D. J.; Chang, D.; Gao, F.; Guo, D.; Wei, Z.; Connoly, J. J.; C., C.; Bakay, M.; Glessner, J.and Kao, C.; Thomas, K. A.; Qiu, H.; Chiavacci, R.; Kim, C.; Wang, F.; Snyder, J.and Richie, M. D.; Flatø, B.; Førre, Ø.; Denson, L.; Thompson, S. D.; Becker, M. L.; Guthery, S. L.; Latiano, A.; Perez, E.; Resnick, E.; Russell, R. D.; Wilson, D.; Silverberg, M. S.; Annese, V.; Lie, B. A.; Punaro, M.; Dubinsky, M. C.; Monos, D. S.; Strisciuglio, C.; Staiano, A.; Miele, E.; Kugathasan, S.; Ellis, J. A.; Munro, J.; Sullivan, K.; Wise, C.; Chapel, H.; Cunningham-Rundles, C.; Grant, S. F.A.; Orange, J.; Sleiman, P. M.A.; Behrens, E.; Griffiths, A.; Satsangi, J.; Finkel, T.; Keinan, A.; Luning Prak, E. T.; Polychronakos, C.; Baldassano, B.; Li, H.; Keating, B. J.; Hakonarson, H., Meta-analysis of shared genetic architecture across ten pediatric autoimmune diseases, Nature Med., 21, 1018-1027 (2015)
[42] Li, Y. R.; Zhao, S. D.; Mohebnasab, M.; Li, J.; Bradfield, J.; Steel, L.; Abrams, D.; Kobie, J.; Mentch, F.; Glessner, J.; Guo, Y.; Wei, Z.; Cardinale, C.; Bakay, M.; Connoly, J.; Li, D.; Maggadottir, S. M.; Thomas, K. A.; Qiu, H.; Chiavacci, R.; Kim, C.; Wang, F.; Snyder, J.; Flatø, B.; Førre, Ø.; Denson, L.; Thompson, S. D.; Becker, M.; Guthery, S. L.; Latiano, A.; Perez, E.; Resnick, E.; Strisciuglio, C.; Staiano, A.; Miele, E.; Silverberg, M.; Lie, B. A.; Punaro, M.; Russell, R.; Wilson, D.; Dubinsky, M. C.; Monos, D. S.; Annese, V.; Munro, J.; Wise, C.; Chapel, H.; Cunningham-Rundles, C.; Orange, J.; Behrens, E. M.; Sullivan, K.; Kugathasan, S.; Griffiths, A.; Satsangi, J.; Grant, S.; Sleiman, P.; Finkel, T.; Polychronakos, C.; Baldassano, R. N.; Luning Prak, E.; Ellis, J.; Li, H.; Keating, B. J.; Hakonarson, H., Genetic sharing and heritability of paediatric age of onset autoimmune diseases, Nature Commun., 6 (2015)
[43] Mukherjee, R.; Pillai, N. S.; Lin, X., Hypothesis testing for high-dimensional sparse binary regression, Ann. Statist., 43, 1, 352 (2015) · Zbl 1308.62094
[44] Nicolae, D. L.; Gamazon, E.; Zhang, W.; Duan, S.; Dolan, M. E.; Cox, N. J., Trait-associated SNPs are more likely to be eQTLs: Annotation to enhance discovery from GWAS, PLoS Genet., 6, 4, e1000888 (2010)
[45] Phillips, D.; Ghosh, D., Testing the disjunction hypothesis using voronoi diagrams with applications to genetics, Ann. Appl. Stat., 8, 2, 801-823 (2014) · Zbl 1454.62376
[46] Reshef, D. N.; Reshef, Y. A.; Finucane, H. K.; Grossman, S. R.; McVean, G.; Turnbaugh, P. J.; Lander, E. S.; Mitzenmacher, M.; Sabeti, P. C., Detecting novel associations in large data sets, Sci., 334, 6062, 1518-1524 (2011) · Zbl 1359.62216
[47] Rivals, I.; Personnaz, L.; Taing, L.; Potier, M.-C., Enrichment or depletion of a GO category within a class of genes: Which test?, Bioinformatics, 23, 4, 401-407 (2007)
[48] Scaillet, O., A Kolmogorov-Smirnov type test for positive quadrant dependence, Canad. J. Statist., 33, 415-427 (2005) · Zbl 1077.62036
[49] Storey, J. D.; Tibshirani, R., Statistical significance for genomewide studies, Proc. Natl. Acad. Sci., 100, 16, 9440-9445 (2003) · Zbl 1130.62385
[50] Sun, W.; Cai, T. T., Oracle and adaptive compound decision rules for false discovery rate control, J. Amer. Statist. Assoc., 102, 479, 901-912 (2007) · Zbl 1469.62318
[51] Székely, G. J.; Rizzo, M. L., Brownian distance covariance, Ann. Appl. Stat., 3, 4, 1236-1265 (2009) · Zbl 1196.62077
[52] Thas, O.; Ottoy, J.-P., A nonparametric test for independence based on sample space partitions, Commun. Stat. Simul. Comput., 33, 3, 711-728 (2004) · Zbl 1101.62333
[53] Yekutieli, D., False discovery rate control for non-positively regression dependent test statistics, J. Statist. Plann. Inference, 138, 2, 405-415 (2008) · Zbl 1138.62040
[54] Zaykin, D. V.; Kozbur, D. O., P-value based analysis for shared controls design in genome-wide association studies, Genet. Epidemiol., 34, 7, 725-738 (2010)
[56] Zhao, S. D.; Cai, T. T.; Cappola, T. P.; Margulies, K. B.; Li, H., Sparse simultaneous signal detection for identifying genetically controlled disease genes, J. Amer. Statist. Assoc., in press (2017)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.