×

Simple estimators of false discovery rates given as few as one or two \(p\)-values without strong parametric assumptions. (English) Zbl 1311.62109

Stat. Appl. Genet. Mol. Biol. 12, No. 4, 529-543 (2013); corrigendum ibid. 14, No. 2, 225 (2015).
Summary: Multiple comparison procedures that control a family-wise error rate or false discovery rate provide an achieved error rate as the adjusted \(p\)-value or \(q\)-value for each hypothesis tested. However, since achieved error rates are not understood as probabilities that the null hypotheses are true, empirical Bayes methods have been employed to estimate such posterior probabilities, called local false discovery rates (LFDRs) to emphasize that their priors are unknown and of the frequency type. The main approaches to LFDR estimation, relying either on fully parametric models to maximize likelihood or on the presence of enough hypotheses for nonparametric density estimation, lack the simplicity and generality of adjusted \(p\)-values. To begin filling the gap, this paper introduces simple methods of LFDR estimation with proven asymptotic conservatism without assuming the parameter distribution is in a parametric family. Simulations indicate that they remain conservative even for very small numbers of hypotheses. One of the proposed procedures enables interpreting the original FDR control rule in terms of LFDR estimation, thereby facilitating practical use of the former. The most conservative of the new procedures is applied to measured abundance levels of 20 proteins.

MSC:

62J15 Paired and multiple comparisons; multiple testing
62C12 Empirical decision procedures; empirical Bayes procedures
62F15 Bayesian inference

References:

[1] Abadir, K. (2005): “The mean-median-mode inequality: counterexamples,” Economet. Theory, 21(2), 477-482.; · Zbl 1062.62015
[2] Basu, S. and A. Dasgupta (1997): “The mean, median, and mode of unimodal distributions: a characterization,” Theor. Probab. Appl+, 41(2), 210-223.; · Zbl 0881.60011
[3] Benjamini, Y. and Y. Hochberg (1995): “Controlling the false discovery rate: a practical and powerful approach to multiple testing,” J. Roy. Stat. Soc. B, 57, 289-300.; · Zbl 0809.62014
[4] Bickel, D. R. (2011a): “Estimating the null distribution to adjust observed confidence levels for genome-scale screening,” Biometrics, 67, 363-370.; · Zbl 1219.62164
[5] Bickel, D. R. (2011b): Small-scale inference: Empirical Bayes and confidence methods for as few as a single comparison. Technical Report, Ottawa Institute of Systems Biology, arXiv:1104.0341.; · Zbl 1416.62430
[6] Bickel, D. R. (2012a): “Coherent frequentism: a decision theory based on confidence sets,” Commun. Stat. Theory, 41, 1478-1496.; · Zbl 1319.62007
[7] Bickel, D. R. (2012b): “Empirical Bayes interval estimates that are conditionally equal to unadjusted confidence intervals or to default prior credibility intervals,” Stat. Applications Genet. Mol. Biol., 11(3), art.7.; · Zbl 1296.92018
[8] Clopper, C. J. and E. S. Pearson (1934): “The use of confidence or fiducial limits illustrated in the case of the binomial,” Biometrika, 26, 404-413.; · JFM 60.1175.02
[9] Dudoit, S. and M. J. van der Laan (2008): Multiple testing procedures with applications to genomics, New York: Springer.; · Zbl 1261.62014
[10] Edwards, A. W. F. (1992): Likelihood, Baltimore: Johns Hopkins Press.; · Zbl 0833.62004
[11] Edwards, D., L. Wang, and P. Sørensen (2012): “Network-enabled gene expression analysis,” BMC Bioinformatics, 13(art. 167).;
[12] Efron, B. (1986): “Why isn′t everyone a Bayesian,” Am. Stat., 40(1), 1-5.; · Zbl 0587.62003
[13] Efron, B. (2004): “Large-scales imultaneous hypothesis testing: the choice of a null hypothesis,” J. Am. Stat. Assoc., 99, 96-104.; · Zbl 1089.62502
[14] Efron, B. (2010a): “Correlated z-values and the accuracy of large-scale statistical estimates,” J. Am. Stat. Assoc., 105, 1042-1055.; · Zbl 1390.62139
[15] Efron, B. (2010b): Large-scale inference: empirical bayes methods for estimation, testing, and prediction, Cambridge: Cambridge University Press.; · Zbl 1277.62016
[16] Efron, B. (2010c): “Rejoinder to comments on B. Efron, “Correlated z-values and the accuracy of large-scale statistical estimates,” J. Am. Stat. Assoc., 105, 1067-1069.; · Zbl 1390.62140
[17] Efron, B. and R. Tibshirani (2002): “Empirical Bayes methods and false discovery rates for microarrays,” Genet. Epidemiol., 23, 70-86.;
[18] Efron, B., R. Tibshirani, J. D. Storey, and V. Tusher (2001): “Empirical Bayes analysis of a microarray experiment,” J. Am. Stat. Assoc., 96, 1151-1160.; · Zbl 1073.62511
[19] Fisher, R. A. (1973): Statistical methods and scientific inference, New York: Hafner Press.; · Zbl 0281.62002
[20] Gentleman, R. C., V. J. Carey, D. M. Bates, et al., (2004): “Bioconductor: open software development for computational biology and bioinformatics,” Genome Biol., 5, R80.;
[21] Hald, A. (2007): A history of parametric statistical inference from bernoulli to fisher, New York: Springer, 1713-1935.; · Zbl 1107.01006
[22] Kyburg, H. E. and C. M. Teng (2006): “Non monotonic logic and statistical inference,” Comput. Intell. 22, 26-51.;
[23] Li, X. (2009): ProData. Bioconductor.org documentation for the ProData package. .;
[24] Morris, C. N. (1983a): “Parametric empirical Bayes inference: theory and applications,” J. Am. Stat. Assoc., 78, 47-55.; · Zbl 0506.62005
[25] Morris, C. N. (1983b): “Parametric empirical Bayes inference: theory and applications: rejoinder,” J. Am. Stat. Assoc., 78, 63-65.; · Zbl 0506.62005
[26] Morris, J. S., P. J. Brown, R. C. Herrick, K. A. Baggerly, and K.R. Coombes (2008): “Bayesian analysis of mass spectrometry proteomic data using wavelet-based functional mixed models,” Biometrics, 64(2), 479-489.; · Zbl 1137.62399
[27] Muralidharan, O. (2010): “An empirical Bayes mixture method for effect size and false discovery rate estimation,” Ann. Appl. Stat., 4, 422-438.; · Zbl 1189.62004
[28] Padilla, M. and D. R. Bickel (2012): “Empirical Bayes methods corrected for small numbers of tests,” Stat. Applications Genet. Mol. Biol., 11(5), art.4.; · Zbl 1296.92063
[29] R Development Core Team (2008): R:a language and environment for statistical computing, Vienna, Austria: R foundation for statistical computing.;
[30] Singh, K., M. Xie, and W. E. Strawderman (2007): “Confidence distribution (CD)-distribution estimator of a parameter,” IMS Lecture Notes Monograph Series, 54, 132-150.;
[31] Storey, J. D. (2002): “A direct approach to false discovery rates,” J. Roy. Stat. Soc. B, 64, 479-498.; · Zbl 1090.62073
[32] Westfall, P. H. (2010): “Comment on B. Efron,”Correlated z-values and the accuracy of large-scale statistical estimates,“ J. Am. Stat. Assoc., 105, 1063-1066.<pub-id pub-id-type=”doi”>10.1198/jasa.2010.tm10239; · Zbl 1390.62145
[33] Westfall, P. H. and S. S. Young (1993): Resampling-Based Multiple Testing. Hoboken: John Wiley & Sons.; · Zbl 0850.62368
[34] Whittemore, A. S. (2007): “A Bayesian false discovery rate for multiple testing,” J. Appl. Stat., 34(1), 1-9.; · Zbl 1119.62379
[35] Wilkinson, G. N. (1977): “On resolving the controversy instatistical inference(with discussion),” J. Roy. Stat. Soc. B, 39, 119-171.; · Zbl 0373.62002
[36] Yuan, B. (2009): “Bayesian frequentist hybrid inference,” Ann. Stat., 37, 2458-2501.; · Zbl 1173.62012
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.