×

Locally simultaneous inference. (English) Zbl 07916578

Summary: Selective inference is the problem of giving valid answers to statistical questions chosen in a data-driven manner. A standard solution to selective inference is simultaneous inference, which delivers valid answers to the set of all questions that could possibly have been asked. However, simultaneous inference can be unnecessarily conservative if this set includes many questions that were unlikely to be asked in the first place. We introduce a less conservative solution to selective inference that we call locally simultaneous inference, which only answers those questions that could plausibly have been asked in light of the observed data, all the while preserving rigorous type I error guarantees. For example, if the objective is to construct a confidence interval for the “winning” treatment effect in a clinical trial with multiple treatments, and it is obvious in hindsight that only one treatment had a chance to win, then our approach will return an interval that is nearly the same as the uncorrected, standard interval. Locally simultaneous inference is implemented by refining any method for simultaneous inference of interest. Under mild conditions satisfied by common confidence intervals, locally simultaneous inference strictly dominates its underlying simultaneous inference method, meaning it can never yield less statistical power but only more. Compared to conditional selective inference, which demands stronger guarantees, locally simultaneous inference is more easily applicable in nonparametric settings and is more numerically stable.

MSC:

62J15 Paired and multiple comparisons; multiple testing
62F25 Parametric tolerance and confidence regions
62G15 Nonparametric tolerance and confidence regions

Software:

QSIMVN

References:

[1] ANDREWS, I., KITAGAWA, T. and MCCLOSKEY, A. (2024). Inference on winners. Q. J. Econ. 139 305-358. · Zbl 1533.91269
[2] Bartlett, P. L., Bousquet, O. and Mendelson, S. (2005). Local Rademacher complexities. Ann. Statist. 33 1497-1537. Digital Object Identifier: 10.1214/009053605000000282 Google Scholar: Lookup Link MathSciNet: MR2166554 · Zbl 1083.62034 · doi:10.1214/009053605000000282
[3] BENJAMINI, Y., HECHTLINGER, Y. and STARK, P. B. (2019). Confidence intervals for selected parameters. arXiv preprint. Available at arXiv:1906.00505.
[4] BENNETT, G. (1962). Probability inequalities for the sum of independent random variables. J. Amer. Statist. Assoc. 57 33-45. · Zbl 0104.11905
[5] BENTKUS, V. (2004). On Hoeffding’s inequalities. Ann. Probab. 32 1650-1673. Digital Object Identifier: 10.1214/009117904000000360 Google Scholar: Lookup Link MathSciNet: MR2060313 · Zbl 1062.60011 · doi:10.1214/009117904000000360
[6] Berk, R., Brown, L., Buja, A., Zhang, K. and Zhao, L. (2013). Valid post-selection inference. Ann. Statist. 41 802-837. Digital Object Identifier: 10.1214/12-AOS1077 Google Scholar: Lookup Link MathSciNet: MR3099122 · Zbl 1267.62080 · doi:10.1214/12-AOS1077
[7] BERNSTEIN, S. (1924). On a modification of Chebyshev’s inequality and of the error formula of Laplace. Ann. Sci. Inst. Sav. Ukr., Sect. Math. 1 38-49.
[8] BRETZ, F., GENZ, A. and HOTHORN, L. A. (2001). On the numerical availability of multiple comparison procedures. Biom. J. 43 645-656. Digital Object Identifier: 10.1002/1521-4036(200109)43:5<645::AID-BIMJ645>3.0.CO;2-F Google Scholar: Lookup Link MathSciNet: MR1863491 · Zbl 0978.62058 · doi:10.1002/1521-4036(200109)43:5<645::AID-BIMJ645>3.0.CO;2-F
[9] DICKHAUS, T. (2014). Simultaneous Statistical Inference: With Applications in the Life Sciences. Springer, Heidelberg. Digital Object Identifier: 10.1007/978-3-642-45182-9 Google Scholar: Lookup Link MathSciNet: MR3184277 · Zbl 1296.62062 · doi:10.1007/978-3-642-45182-9
[10] EL GHAOUI, L., VIALLON, V. and RABBANI, T. (2012). Safe feature elimination in sparse supervised learning. Pac. J. Optim. 8 667-698. MathSciNet: MR3026449 · Zbl 1259.65010
[11] FITHIAN, W., SUN, D. and TAYLOR, J. (2014). Optimal inference after model selection. arXiv preprint. Available at arXiv:1410.2597.
[12] FUENTES, C., CASELLA, G. and WELLS, M. T. (2018). Confidence intervals for the means of the selected populations. Electron. J. Stat. 12 58-79. Digital Object Identifier: 10.1214/17-EJS1374 Google Scholar: Lookup Link MathSciNet: MR3743737 · Zbl 1384.62058 · doi:10.1214/17-EJS1374
[13] GENZ, A. (1992). Numerical computation of multivariate normal probabilities. J. Comput. Graph. Statist. 1 141-149.
[14] GENZ, A. and BRETZ, F. (1999). Numerical computation of multivariate \(t\)-probabilities with application to power calculation of multiple contrasts. J. Stat. Comput. Simul. 63 361-378. Digital Object Identifier: 10.1080/00949659908811962 Google Scholar: Lookup Link MathSciNet: MR1718625 · Zbl 0934.62020 · doi:10.1080/00949659908811962
[15] GOEMAN, J. J. and SOLARI, A. (2024). On selection and conditioning in multiple testing and selective inference. Biometrika 111 393-416. Digital Object Identifier: 10.1093/biomet/asad078 Google Scholar: Lookup Link MathSciNet: MR4745573 · doi:10.1093/biomet/asad078
[16] HOCHBERG, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75 800-802. Digital Object Identifier: 10.1093/biomet/75.4.800 Google Scholar: Lookup Link MathSciNet: MR0995126 · Zbl 0661.62067 · doi:10.1093/biomet/75.4.800
[17] Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6 65-70. zbMATH: 0402.62058 MathSciNet: MR0538597 · Zbl 0402.62058
[18] HOMMEL, G. (1990). A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika 75 383-386. · Zbl 0639.62025
[19] HOTHORN, T., BRETZ, F. and WESTFALL, P. (2008). Simultaneous inference in general parametric models. Biom. J. 50 346-363. Digital Object Identifier: 10.1002/bimj.200810425 Google Scholar: Lookup Link MathSciNet: MR2521547 · Zbl 1442.62415 · doi:10.1002/bimj.200810425
[20] KIVARANOVIC, D. and LEEB, H. (2021). On the length of post-model-selection confidence intervals conditional on polyhedral constraints. J. Amer. Statist. Assoc. 116 845-857. Digital Object Identifier: 10.1080/01621459.2020.1732989 Google Scholar: Lookup Link MathSciNet: MR4270029 · Zbl 1464.62244 · doi:10.1080/01621459.2020.1732989
[21] Koltchinskii, V. (2006). Local Rademacher complexities and oracle inequalities in risk minimization. Ann. Statist. 34 2593-2656. Digital Object Identifier: 10.1214/009053606000001019 Google Scholar: Lookup Link MathSciNet: MR2329442 MathSciNet: MR2329463 · Zbl 1118.62065 · doi:10.1214/009053606000001019
[22] KOLTCHINSKII, V. (2011). Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems: École D’Été de Probabilités de Saint-Flour XXXVIII-2008. Lecture Notes in Math. 2033. Springer, Heidelberg. Lectures from the 38th Probability Summer School held in Saint-Flour [Saint-Flour Probability Summer School]. Digital Object Identifier: 10.1007/978-3-642-22147-7 Google Scholar: Lookup Link MathSciNet: MR2829871 · Zbl 1223.91002 · doi:10.1007/978-3-642-22147-7
[23] KOLTCHINSKII, V. and PANCHENKO, D. (2000). Rademacher processes and bounding the risk of function learning. In High Dimensional Probability, II (Seattle, WA, 1999). Progress in Probability 47 443-457. Birkhäuser, Boston, MA. MathSciNet: MR1857339 · Zbl 1106.68385
[24] Lee, J. D., Sun, D. L., Sun, Y. and Taylor, J. E. (2016). Exact post-selection inference, with application to the lasso. Ann. Statist. 44 907-927. Digital Object Identifier: 10.1214/15-AOS1371 Google Scholar: Lookup Link MathSciNet: MR3485948 · Zbl 1341.62061 · doi:10.1214/15-AOS1371
[25] LEEB, H. and PÖTSCHER, B. M. (2017). Testing in the presence of nuisance parameters: Some comments on tests post-model-selection and random critical values. In Big and Complex Data Analysis. Contrib. Stat. 69-82. Springer, Cham. MathSciNet: MR3644121 · Zbl 1380.62089
[26] MCCLOSKEY, A. (2024). Hybrid confidence intervals for informative uniform asymptotic inference after model selection. Biometrika 111 109-127. Digital Object Identifier: 10.1093/biomet/asad023 Google Scholar: Lookup Link MathSciNet: MR4704561 · doi:10.1093/biomet/asad023
[27] RASP, S., DUEBEN, P. D., SCHER, S., WEYN, J. A., MOUATADID, S. and THUEREY, N. (2020). WeatherBench: A benchmark data set for data-driven weather forecasting. J. Adv. Model. Earth Syst. 12 e2020MS002203.
[28] ROMANO, J. P., SHAIKH, A. M. and WOLF, M. (2014). A practical two-step method for testing moment inequalities. Econometrica 82 1979-2002. Digital Object Identifier: 10.3982/ECTA11011 Google Scholar: Lookup Link MathSciNet: MR3268401 · Zbl 1410.62044 · doi:10.3982/ECTA11011
[29] STRASSBURGER, K. and BRETZ, F. (2008). Compatible simultaneous lower confidence bounds for the Holm procedure and other Bonferroni-based closed tests. Stat. Med. 27 4914-4927. Digital Object Identifier: 10.1002/sim.3338 Google Scholar: Lookup Link MathSciNet: MR2528773 · doi:10.1002/sim.3338
[30] Tian, X. and Taylor, J. (2018). Selective inference with a randomized response. Ann. Statist. 46 679-710. Digital Object Identifier: 10.1214/17-AOS1564 Google Scholar: Lookup Link MathSciNet: MR3782381 · Zbl 1392.62144 · doi:10.1214/17-AOS1564
[31] TIBSHIRANI, R., BIEN, J., FRIEDMAN, J., HASTIE, T., SIMON, N., TAYLOR, J. and TIBSHIRANI, R. J. (2012). Strong rules for discarding predictors in lasso-type problems. J. R. Stat. Soc. Ser. B. Stat. Methodol. 74 245-266. Digital Object Identifier: 10.1111/j.1467-9868.2011.01004.x Google Scholar: Lookup Link MathSciNet: MR2899862 · Zbl 1411.62213 · doi:10.1111/j.1467-9868.2011.01004.x
[32] Tibshirani, R. J., Taylor, J., Lockhart, R. and Tibshirani, R. (2016). Exact post-selection inference for sequential regression procedures. J. Amer. Statist. Assoc. 111 600-620. Digital Object Identifier: 10.1080/01621459.2015.1108848 Google Scholar: Lookup Link MathSciNet: MR3538689 · doi:10.1080/01621459.2015.1108848
[33] VENTER, J. (1988). Confidence bounds based on the largest treatment mean. South Afr. J. Sci. 84 340.
[34] WAUDBY-SMITH, I. and RAMDAS, A. (2024). Estimating means of bounded random variables by betting. J. R. Stat. Soc. Ser. B. Stat. Methodol. 86 1-27. Digital Object Identifier: 10.1093/jrsssb/qkad009 Google Scholar: Lookup Link MathSciNet: MR4716192 · Zbl 07878546 · doi:10.1093/jrsssb/qkad009
[35] ZHAO, Q., SMALL, D. S. and SU, W. (2019). Multiple testing when many \(p\)-values are uniformly conservative, with application to testing qualitative interaction in educational interventions. J. Amer. Statist. Assoc. 114 1291-1304. Digital Object Identifier: 10.1080/01621459.2018.1497499 Google Scholar: Lookup Link MathSciNet: MR4011780 zbMATH: 1428.62348 · Zbl 1428.62348 · doi:10.1080/01621459.2018.1497499
[36] ZRNIC, T. and FITHIAN, W. (2024). Supplement to “Locally simultaneous inference.” https://doi.org/10.1214/24-AOS2391SUPP
[37] ZRNIC, T. and JORDAN, M. I. (2023). Post-selection inference via algorithmic stability. Ann. Statist. 51 1666-1691. Digital Object Identifier: 10.1214/23-aos2303 Google Scholar: Lookup Link MathSciNet: MR4658572 · Zbl 1539.62221 · doi:10.1214/23-aos2303
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.