×

Exhaustive goodness of fit via smoothed inference and graphics. (English) Zbl 07547618

Summary: Classical tests of goodness of fit aim to validate the conformity of a postulated model to the data under study. Given their inferential nature, they can be considered a crucial step in confirmatory data analysis. In their standard formulation, however, they do not allow exploring how the hypothesized model deviates from the truth nor do they provide any insight into how the rejected model could be improved to better fit the data. The main goal of this work is to establish a comprehensive framework for goodness of fit which naturally integrates modeling, estimation, inference and graphics. Modeling and estimation focus on a novel formulation of smooth tests that easily extends to arbitrary distributions, either continuous or discrete. Inference and adequate post-selection adjustments are performed via a specially designed smoothed bootstrap and the results are summarized via an exhaustive graphical tool called CD-plot. Technical proofs, codes and data are provided in the .

MSC:

62-XX Statistics

Software:

R; reldist; LPMode; LPsmooth

References:

[1] Algeri, S., “Detecting New Signals Under Background Mismodeling, Physical Review D, 101 (2020) · doi:10.1103/PhysRevD.101.015003
[2] Algeri, S. (2021)
[3] Anderson, T. W.; Darling, D. A., “A Test of Goodness of Fit, Journal of the American Statistical Association, 49, 765-769 (1954) · Zbl 0059.13302 · doi:10.1080/01621459.1954.10501232
[4] Barton, D. E., “On Neyman’s Smooth Test of Goodness of Fit and Its Power With Respect to a Particular System of Alternatives, Scandinavian Actuarial Journal, 1953, 24-63 (1953) · Zbl 0053.10203 · doi:10.1080/03461238.1953.10419457
[5] Barton, D. E., “A Form of Neyman’s ψ2 k Test of Goodness of Fit Applicable to Grouped and Discrete Data, Scandinavian Actuarial Journal, 1-16 (1955) · Zbl 0067.37002
[6] Barton, D. E., “Neyman’s Test of Goodness of Fit When the Null Hypothesis is Composite, Scandinavian Actuarial Journal, 216-245 (1956) · Zbl 0081.36003
[7] De Angelis, D.; Young, G. A., “Smoothing the Bootstrap, International Statistical Review/Revue Internationale de Statistique, 45-56 (1992) · Zbl 0781.62048 · doi:10.2307/1403500
[8] Devroye, L.; Györfi, L., Nonparametric Density Estimation. the l_1 View (1985), New York: Wiley, New York · Zbl 0546.62015
[9] Efron, B., “Bootstrap Methods: Another Look at the Jackknife,”, Annals of Statistics, 7, 1-26 (1979) · Zbl 0406.62024
[10] Gajek, L., “On Improving Density Estimators Which Are Not Bona Fide Functions, The Annals of Statistics, 14, 1612-1618 (1986) · Zbl 0623.62034 · doi:10.1214/aos/1176350182
[11] Garg, S.; Kim, L.; Whitaker, M., “Hospitalization Rates and Characteristics of Patients Hospitalized With Laboratory-Confirmed Coronavirus Disease 2019-Covid-Net, 14 States, MMWR. Morbidity and Mortality Weekly Report, 69 (2020) · doi:10.15585/mmwr.mm6915e3
[12] Guerra, R.; Polansky, A. M.; Schucany, W. R., “Smoothed Bootstrap Confidence Intervals With Discrete Data, Computational Statistics & Data Analysis, 26, 163-176 (1997) · Zbl 0908.62052
[13] Hall, P.; DiCiccio, T. J.; Romano, J. P., “On Smoothing and the Bootstrap, The Annals of Statistics, 17, 692-704 (1989) · Zbl 0672.62051 · doi:10.1214/aos/1176347135
[14] Handcock, M. S.; Morris, M., Relative Distribution Methods in the Social Sciences (1999), New York: Springer, New York · Zbl 0949.91029
[15] Hjort, N. L.; Glad, I. K., “Nonparametric Density Estimation With a Parametric Start, The Annals of Statistics, 882-904 (1995) · Zbl 0838.62027 · doi:10.1214/aos/1176324627
[16] Jitkrittum, W.; Kanagawa, H.; Sangkloy, P.; Hays, J.; Schölkopf, B.; Gretton, A. (2018)
[17] Kallenberg, W. C. M.; Ledwina, T., “Data-Driven Smooth Tests When the Hypothesis is Composite, Journal of the American Statistical Association, 92, 1094-1104 (1997) · Zbl 1067.62534 · doi:10.1080/01621459.1997.10474065
[18] Kałuszka, M., “On the Devroye-Györfi Methods of Correcting Density Estimators, Statistics & Probability Letters, 37, 249-257 (1998) · Zbl 1246.62098
[19] Kim, I.; Lee, A. B.; Lei, J., “Global and Local Two-Sample Tests Via Regression, Electronic Journal of Statistics, 13, 5253-5305 (2019) · Zbl 1435.62199 · doi:10.1214/19-EJS1648
[20] Ledwina, T., “Data-Driven Version of Neyman’s Smooth Test of Fit, Journal of the American Statistical Association, 89, 1000-1005 (1994) · Zbl 0805.62022 · doi:10.1080/01621459.1994.10476834
[21] Mukhopadhyay, S., “Large-Scale Signal Detection: A Unified Perspective, Biometrics, 72, 325-334 (2016) · Zbl 1419.62413 · doi:10.1111/biom.12423
[22] Mukhopadhyay, S., “Large-Scale Mode Identification and Data-Driven Sciences, Electronic Journal of Statistics, 11, 215-240 (2017) · Zbl 1356.62052
[23] Mukhopadhyay, S.; Parzen, E. (2014)
[24] Mukhopadhyay, S., “Nonparametric Universal Copula Modeling, Applied Stochastic Models in Business and Industry, 36, 77-94 (2020) · Zbl 07885124
[25] Mukhopadhyay, S.; Wang, K., “Nonparametric High-Dimensional k-Sample Comparison, Biometrika (2020) · Zbl 1451.62065 · doi:10.1093/biomet/asaa015
[26] Neyman, J., “Smooth Test for Goodness of Fit, Scandinavian Actuarial Journal, 1937, 149-199 (1937) · Zbl 0018.03403 · doi:10.1080/03461238.1937.10404821
[27] Parzen, E., Fun. stat Quantile Approach to Two Sample Statistical Data Analysis, Technical Report (1983), Texas A&M University, College Station, Institute of Statistics
[28] Parzen, E., “Quantile Probability and Statistical Data Modeling, Statistical Science, 19, 652-662 (2004) · Zbl 1100.62500 · doi:10.1214/088342304000000387
[29] Parzen, E.; Mukhopadhyay, S., 59th ISI World Statistics Congress (WSC), “United Statistical Algorithms, lp Comments, Copula Density, Nonparametric Modelling,” (2013), Hong Kong
[30] Pearson, K., “On the Criterion That a Given System of Deviations From the Probable in the Case of a Correlated System of Variables is Such That it Can be Reasonably Supposed to Have Arisen From Random Sampling, The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 50, 157-175 (1900) · JFM 31.0238.04 · doi:10.1080/14786440009463897
[31] Core Team, R., R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing (2020), Vienna, Austria
[32] Rayner, J. C. W.; Best, D. J., “Neyman-Type Smooth Tests for Location-Scale Families, Biometrika, 73, 437-446 (1986) · Zbl 0604.62040 · doi:10.1093/biomet/73.2.437
[33] Rayner, J. C. W., “Smooth Tests of Goodness of Fit for Regular Distributions, Communications in Statistics-Theory and Methods, 17, 3235-3267 (1988) · Zbl 0696.62068
[34] Rayner, J. C. W.; Thas, O.; Best, D. J., Smooth Tests of Goodness of Fit: Using R (2009), Wiley · Zbl 1171.62015
[35] Shapiro, S. S.; Wilk, M. B., “An Analysis of Variance Test for Normality (Complete Samples), Biometrika, 52, 591-611 (1965) · Zbl 0134.36501 · doi:10.1093/biomet/52.3-4.591
[36] Silverman, B. W.; Young, G. A., “The Bootstrap: To Smooth or not to Smooth?, Biometrika, 74, 469-479 (1987) · Zbl 0654.62034 · doi:10.1093/biomet/74.3.469
[37] Thas, O., Comparing Distributions (2010), New York: Springer, New York · Zbl 1234.62014
[38] Weaver, C. G.; Ravani, P.; Oliver, M. J.; Austin, P. C.; Quinn, R. R., “Analyzing Hospitalization Data: Potential Limitations of Poisson Regression, Nephrology Dialysis Transplantation, 30, 1244-1249 (2015) · doi:10.1093/ndt/gfv071
[39] Xu, B.; Gutierrez, B.; Mekaru, S.; Sewalk, K.; Goodwin, L.; Loskill, A.; Cohn, E. L.; Hswen, Y.; Hill, S. C.; Cobo, M. M., “Epidemiological Data From the Covid-19 Outbreak, Real-Time Case Information, Scientific Data, 7, 1-6 (2020) · doi:10.1038/s41597-020-0448-0
[40] Young, G. A., “Bootstrap: More Than a Stab in the Dark?, Statistical Science, 9, 3, 382-395 (1994) · Zbl 0955.62573 · doi:10.1214/ss/1177010383
[41] Zhang, X.; Algeri, S., LPsmooth: LP Smoothed Inference and Graphics, R package version 0.1.0 (2020)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.