×

Increasing the power: a practical approach to goodness-of-fit test for logistic regression models with continuous predictors. (English) Zbl 1452.62323

Summary: When continuous predictors are present, classical Pearson and deviance goodness-of-fit tests to assess logistic model fit break down. The Hosmer-Lemeshow test can be used in these situations. While simple to perform and widely used, it does not have desirable power in many cases and provides no further information on the source of any detectable lack of fit. Tsiatis proposed a score statistic to test for covariate regional effects. While conceptually elegant, its lack of a general rule for how to partition the covariate space has, to a certain degree, limited its popularity. We propose a new method for goodness-of-fit testing that uses a very general partitioning strategy (clustering) in the covariate space and either a Pearson statistic or a score statistic. Properties of the proposed statistics are discussed, and a simulation study demonstrates increased power to detect model misspecification in a variety of settings. An application of these different methods on data from a clinical trial illustrates their use. Discussions on further improvement of the proposed tests and extending this new method to other data situations, such as ordinal response regression models are also included.

MSC:

62G10 Nonparametric hypothesis testing
62J12 Generalized linear models (logistic models)
62P10 Applications of statistics to biology and medical sciences; meta analysis
62-08 Computational methods for problems pertaining to statistics
Full Text: DOI

References:

[1] Agresti, A., Categorical Data Analysis (1990), Wiley: Wiley New York, USA · Zbl 0716.62001
[2] Anderson, J. A., Separate sample logistic discrimination, Biometrika, 59, 1, 19-35 (1972) · Zbl 0231.62080
[3] Barnhart, H. X.; Williamson, J. M., Goodness-of-fit tests for GEE modeling with binary responses, Biometrics, 54, 720-729 (1998) · Zbl 1058.62524
[4] Chernoff, H.; Lehmann, E. L., The use of maximum likelihood estimates in chi-square tests for goodness of fit, Ann. Math. Statist., 25, 579-586 (1954) · Zbl 0056.37103
[5] Copas, J. B., Plotting \(p\) against \(x\). Appl. Statist., 32, 25-31 (1983)
[6] Copas, J. B., Unweighted sum of squares test for proportions, Appl. Statist., 38, 1, 71-80 (1989)
[7] Cox, D. R.; Snell, E. J., The Analysis of Binary Data (1989), Chapman and Hall: Chapman and Hall London, UK · Zbl 0729.62004
[8] Cramer, H., Mathematical Methods of Statistics (1946), Princeton University Press: Princeton University Press NJ, USA · Zbl 0063.01014
[9] Cressie, N.; Read, T. R.C., Multinomial goodness-of-fit tests, J. Roy. Statist. Soc., 46, 440-464 (1984) · Zbl 0571.62017
[10] Farrington, C. P., On assessing goodness of fit of generalized linear models to sparse data, J. Roy. Statist. Soc. Ser. B, 58, 2, 349-360 (1996) · Zbl 0866.62040
[11] Gail, M. H.; Wieand, S.; Piantadosi, S., Biased estimates of treatment effect in randomized experiments with nonlinear regressions and omitted covariates, Biometrika, 71, 3, 431-444 (1984) · Zbl 0565.62094
[12] Gail, M. H.; Tan, W.-Y.; Piantadosi, S., Tests for no treatment effect in randomized clinical trials when needed covariates are omitted, Biometrika, 75, 1, 57-64 (1988) · Zbl 0635.62108
[13] Hauck, W. W.; Neuhaus, J. M.; Kalbfleisch, J. D.; Anderson, S., A consequence of omitted covariates when estimating odds ratios, J. Clin. Epidemiol., 44, 1, 77-81 (1991)
[14] Horton, N. J.; Bebchuk, J. D.; Jones, C. L.; Lipsitz, S. R.; Catalano, P. J.; Zahner, G. E.P.; Fitzmaurice, G. M., Goodness-of-fit for GEE: an example with mental health service utilization, Statist. Med., 18, 213-222 (1999)
[15] Hosmer, D. W.; Hjort, N. L., Goodness-of-fit processes for logistic regression: simulation results, Statist. Med., 21, 2723-2738 (2002)
[16] Hosmer, D. W.; Hosmer, T.; Le Cessie, S.; Lemeshow, S., A comparison of goodness-of-fit tests for the logistic regression model, Statist. Med., 16, 965-980 (1997)
[17] Hosmer, D. W.; Lemeshow, S., Goodness of fit tests for the multiple logistic regression model, Commun. Statist. Part A—Theory and Methods, 9, 1043-1069 (1980) · Zbl 0447.62025
[18] Hosmer, D. W.; Lemeshow, S., Applied Logistic Regression (1989), Wiley: Wiley New York, USA · Zbl 0715.62125
[19] IHAST2 web site, \(2004. \langle;\) http://ctsdmc.public-health.uiowa.edu/Ihast2/Home.htm \(\rangle;\); IHAST2 web site, \(2004. \langle;\) http://ctsdmc.public-health.uiowa.edu/Ihast2/Home.htm \(\rangle;\)
[20] Kuss, O., Global goodness-of-fit tests in logistic regression with sparse data, Statist. Med., 21, 3789-3801 (2002)
[21] le Cessie, S.; van Houwelingen, C., A goodness-of-fit test for binary regression models, based on smoothing methods, Biometrics, 47, 1267-1282 (1991) · Zbl 0825.62833
[22] McCullagh, P., On the asymptotic distribution of Pearson’s statistic in linear exponential-family models, Internat. Statist. Rev., 53, 1, 61-67 (1985) · Zbl 0575.62020
[23] McCullagh, P., The conditional distribution of goodness-of-fit statistics for discrete data, J. Amer. Statist. Assoc., 81, 393, 104-107 (1986)
[24] Milligan, G., An examination of the effect of six types of error perturbation on fifteen clustering algorithms, Psychometrika, 45, 325-342 (1980)
[25] Milligan, G., A Monte Carlo study of thirty internal criterion measures for cluster analysis, Psychometrika, 46, 187-199 (1981) · Zbl 0472.62070
[26] Molinari, L., Distribution of the chi-squared test in nonstandard situations, Biometrika, 64, 1, 115-121 (1977) · Zbl 0353.62022
[27] Nagelkerke, N.; Moses, S.; Plummer, F. A.; Brunham, R. C.; Fish, D., Logistic regression in case-control studies: the effect of using independent as dependent variables, Statist. Med., 14, 769-775 (1995)
[28] Nagelkerke, N.; Smits, J.; le Cessie, S.; van Houwelingen, H., Testing goodness-of-fit of the logistic regression model in case-control studies using sample reweighting, Statist. Med., 24, 121-130 (2005)
[29] Nelder, J. A.; Wedderburn, R. W.M., Generalized linear models, J. Roy. Statist. Soc. A, 135, 370-384 (1972)
[30] Orme, C., The calculation of the information matrix test for binary data models, The Manchester School, 54, 4, 370-376 (1988)
[31] Osius, G.; Rojek, D., Normal goodness-of-fit tests for multinomial models with large degrees of freedom, J. Amer. Statist. Assoc., 87, 420, 1145-1152 (1992) · Zbl 0765.62052
[32] Pigeon, J. G.; Heyse, J. F., A cautionary note about assessing the fit of logistic regression models, J. Appl. Statist., 26, 7, 847-853 (1999) · Zbl 1072.62615
[33] Pulkstenis, E.; Robinson, T. J., Two goodness-of-fit tests for logistic regression models with continuous covariates, Statist. Med., 21, 79-93 (2002)
[34] Rao, C. R., Linear Statistical Inference and Its Applications (1973), Wiley: Wiley New York · Zbl 0256.62002
[35] Read, T. R.C.; Cressie, N. A.C., Goodness-of-Fit Statistics for Discrete Multivariate Data (1988), Springer: Springer New York, USA · Zbl 0663.62065
[36] Robinson, L. D.; Jewell, N. P., Some surprising results about covariate adjustment in logistic regression models, Internat. Statist. Rev., 59, 227-240 (1991) · Zbl 0742.62067
[37] Todd, M.M., Hindman, B.J., Clarke, W.R., Torner, J.C., for the Intraoperative Hypothermia for Aneurysm Surgery Trial (IHAST) Investigators. 2005. Mild intraoperative hypothermia during surgery for intracranial aneurysm. New England J. Med. 352, 135-145.; Todd, M.M., Hindman, B.J., Clarke, W.R., Torner, J.C., for the Intraoperative Hypothermia for Aneurysm Surgery Trial (IHAST) Investigators. 2005. Mild intraoperative hypothermia during surgery for intracranial aneurysm. New England J. Med. 352, 135-145.
[38] Tsiatis, A. A., A note on a goodness-of-fit test for the logistic regression model, Biometrika, 67, 1, 250-251 (1980) · Zbl 0424.62030
[39] Watson, G. S., On chi-square goodness-of-fit tests for continuous distributions, J. Roy. Statist. Soc. B, 20, 44-61 (1958) · Zbl 0086.12701
[40] Watson, G. S., Some recent results in the chi square goodness of fit tests, Biometrics, 15, 440-468 (1959) · Zbl 0095.33503
[41] White, H., Maximum likelihood estimation of misspecified models, Econometrica, 50, 1, 1-25 (1982) · Zbl 0478.62088
[42] Williams, D. A., Generalized linear models diagnostics using the deviance and single case deletions, Appl. Statist., 36, 2, 181-191 (1987) · Zbl 0646.62062
[43] Xie, X.-J., 2005. A goodness-of-fit test for logistic regression models with continuous predictors. Ph.D. Thesis, the University of Iowa.; Xie, X.-J., 2005. A goodness-of-fit test for logistic regression models with continuous predictors. Ph.D. Thesis, the University of Iowa.
[44] Xie, Y.; Manski, C. F., The logit model and response-based samples, Sociol. Methods Res., 17, 283-302 (1988)
[45] Zheng, B., An information matrix test for logistic regression models based on case-control data, Biometrika, 88, 4, 921-932 (2001) · Zbl 1099.62512
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.