×

The role of the \(p\)-value in the multitesting problem. (English) Zbl 1521.62403

Summary: Modern science frequently involves the analysis of large amount of quantitative information and the simultaneous testing of thousands or even hundreds of thousands null hypotheses. In this context, sometimes, naive deductions derived from the statistical reports substitute the rational thinking. The reproducibility crisis is a direct consequence of the misleading statistical conclusions. In this paper, the authors revisit some of the controversies on the implications derived from the statistical hypothesis testing. They focus on the role of the \(p\)-value on the massive multitesting problem and the loss of its standard probabilistic interpretation. The analogy between the hypothesis tests and the usual diagnostic process (both involve a decision-making) is used to point out some limitations in the probabilistic \(p\)-value interpretation and to introduce the receiver-operating characteristic, ROC, curve as a useful tool in the large-scale multitesting context. The analysis of the well-known Hedenfalk data illustrates the problem.

MSC:

62-XX Statistics
Full Text: DOI

References:

[1] Alon, N.; Spencer, J., The Probabilistic Method (2016), Wiley Publishing: Wiley Publishing, Hoboken · Zbl 1333.05001
[2] Baker, M., 1,500 scientists lift the lid on reproducibility, Nature, 553, 452-454 (2016) · doi:10.1038/533452a
[3] Bender, R.; Lange, S., What’s wrong with arguments against multiplicity adjustments, Letter to the Editor Concerning BMJ, 316, 1236-1238 (1998)
[4] Bender, R.; Lange, S., Adjusting for multiple testing-when and how?, J. Clin. Epidemiol., 54, 343-349 (2001) · doi:10.1016/S0895-4356(00)00314-0
[5] Benjamini, Y.; Hochberg, Y., Controlling the false discovery rate: A practical and powerful approach to multiple testing, J Royal Stat Soc. Ser B (Methodological), 57, 289-300 (1995) · Zbl 0809.62014
[6] Benjamini, Y.; Yekutieli, D., The control of the false discovery rate in multiple testing under dependency, Ann Stat, 29, 1165-1188 (2001) · Zbl 1041.62061 · doi:10.1214/aos/1013699998
[7] Bland, J.; Altman, D., Multiple significance tests: the Bonferroni method, BMJ, 310, 170-170 (1995) · doi:10.1136/bmj.310.6973.170
[8] Carvajal-Rodríguez, A.; de Uña-Alvarez, J.; Rolán-Alvarez, E., A new multitest correction (SGoF) that increases its statistical power when increasing the number of tests, BMC Bioinformatics., 10, 209-220 (2009) · doi:10.1186/1471-2105-10-209
[9] Castro-Conde, I.; Dohler, S.; de Uña Álvarez, J., An extended sequential goodness-of-fit multiple testing method for discrete data, Stat. Methods. Med. Res., 26, 2356-2375 (2017) · doi:10.1177/0962280215597580
[10] Clarke, S.; Hall, P., Robustness of multiple testing procedures against dependence, Ann Stat, 37, 332-358 (2009) · Zbl 1155.62031 · doi:10.1214/07-AOS557
[11] Cohen, J., The earth is round (p<.05), American Psychologist, 49, 997-1003 (1994) · doi:10.1037/0003-066X.49.12.997
[12] Dalmasso, C.; Broët, P.; Moreau, T., A simple procedure for estimating the false discovery rate, Bioinformatics, 21, 660-668 (2005) · doi:10.1093/bioinformatics/bti063
[13] Demidenko, E., The p-value you can’t buy, Am. Stat., 70, 33-38 (2016) · Zbl 07665849 · doi:10.1080/00031305.2015.1069760
[14] Dudoit, S., van der Laan, M., and Birkner, M., Multiple testing procedure for controlling tail probability error rates, Tech. Rep. 166, Division of Biostatistics, California University, Berkeley, 2004.
[15] Efron, B., Correlation and large-scale simultaneous significance testing, J. Am. Stat. Assoc., 102, 93-103 (2007) · Zbl 1284.62340 · doi:10.1198/016214506000001211
[16] Farcomeni, A., A review of modern multiple hypothesis testing, with particular attention to the false discovery proportion, Stat. Methods. Med. Res., 17, 347-388 (2008) · Zbl 1156.62048 · doi:10.1177/0962280206079046
[17] Feigelson, E. and Babu, G., Statistical methods for Astronomy, in Planets, Stars and Stellar Systems: Volume 2: Astronomical techniques, software, and data, T.D. Oswalt and H.E. Bond, eds., Springer, Dordrecht, 2013, pp. 445-480
[18] Genovese, C.; Wasserman, L., Exceedance control of the false discovery proportion, J. Am. Stat. Assoc., 101, 1408-1417 (2006) · Zbl 1171.62338 · doi:10.1198/016214506000000339
[19] Green, D.; Swets, J., Signal Detection Theory and Psychophysics (1966), Wiley: Wiley, New York
[20] Hedenfalk, I.; Duggan, D.; Chen, Y.; Radmacher, M.; Bittner, M.; Simon, R.; Meltzer, P.; Gusterson, B.; Esteller, M.; Kallioniemi, O.; Wilfond, B.; Borg, A.; Trent, J.; Raffeld, M.; Yakhini, Z.; Ben-Dor, A.; Dougherty, E.; Kononen, J.; Bubendorf, L.; Fehrle, W.; Pittaluga, S.; Gruvberger, S.; Loman, N.; Johannsson, O.; Olsson, H.; Sauter, G., Gene-expression profiles in hereditary breast cancer, New England J Med, 344, 539-548 (2001) · doi:10.1056/NEJM200102223440801
[21] Ioannidis, J., Why most published research findings are false, PLoS Med., 2, e124 (2005) · doi:10.1371/journal.pmed.0020124
[22] Martínez-Camblor, P., On correlated z-values distribution in hypothesis testing, Comput. Stat. Data. Anal., 79, 30-43 (2014) · Zbl 1506.62125 · doi:10.1016/j.csda.2014.05.006
[23] Perneger, T., What’s wrong with Bonferroni adjustments, BMJ, 316, 1236-1238 (1998) · doi:10.1136/bmj.316.7139.1236
[24] Rothman, K., No adjustments are needed for multiple comparisons, Epidemiology, 1, 43-46 (1990) · doi:10.1097/00001648-199001000-00010
[25] Sankoh, A.; Huque, M.; Dubey, S., Some comments on frequently used multiple endpoint adjustment methods in clinical trials, Stat. Med., 16, 2529-2542 (1997) · doi:10.1002/(SICI)1097-0258(19971130)16:22<2529::AID-SIM692>3.0.CO;2-J
[26] Seeger, P., A note on a method for the analysis of significances en masse, Technometrics, 10, 586-593 (1968) · doi:10.1080/00401706.1968.10490605
[27] Shen, J.; Wang, S.; Zhang, Y.; Kappil, M.; Wu, H.; Kibriya, M.; Wang, Q.; Jasmine, F.; Ahsan, H.; Lee, P.; Yu, M.; Chen, C.; Santella, R., Genome-wide DNA methylation profiles in hepatocellular carcinoma, Hepatology, 55, 1799-1808 (2012) · doi:10.1002/hep.25569
[28] Simes, R., An improved Bonferroni procedure for multiple tests of significance, Biometrika, 73, 751-754 (1986) · Zbl 0613.62067 · doi:10.1093/biomet/73.3.751
[29] Storey, J., The positive false discovery rate: a Bayesian interpretation and the q-value, The Annals of Statistics, 31, 2013-2035 (2003) · Zbl 1042.62026 · doi:10.1214/aos/1074290335
[30] Tarran, B.; Wininger, M., Editorial: A psychology journal bans p-values, Significance, 12, 2-7 (2015) · doi:10.1111/j.1740-9713.2015.00837.x
[31] Trafimow, D.; Marks, M., Editorial, Basic. Appl. Soc. Psych., 37, 1-2 (2015) · doi:10.1080/01973533.2015.1012991
[32] van Dyk, D., The role of statistics in the discovery of a Higgs Boson, Annu. Rev. Stat. Appl., 1, 41-59 (2014) · doi:10.1146/annurev-statistics-062713-085841
[33] Vexler, A.; Yu, J.; Zhao, Y.; Hutson, A.; Gurevich, G., Expected p-values in light of an ROC curve analysis applied to optimal multiple testing procedures, Stat. Methods. Med. Res., 0 (2017)
[34] von Elm, E.; Altman, D.; Egger, M.; Pocock, S.; Gøtzsche, P.; Vandenbroucke, J., The strengthening the reporting of observational studies in epidemiology (STROBE) statement: guidelines for reporting observational studies, PLoS. Med., 4, 1-5 (2007) · doi:10.1371/journal.pmed.0040296
[35] Wasserstein, R.; Lazar, N., The ASA’s statement on p-values: context, process, and purpose, Am. Stat., 70, 129-133 (2016) · Zbl 07665862 · doi:10.1080/00031305.2016.1154108
[36] Wellek, S., A critical evaluation of the current p-value controversy, Biometr. J., 1, 1-19 (2017) · Zbl 1391.62254
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.