×

Categorizing a continuous predictor subject to measurement error. (English) Zbl 1409.62028

Summary: Epidemiologists often categorize a continuous risk predictor, even when the true risk model is not a categorical one. Nonetheless, such categorization is thought to be more robust and interpretable, and thus their goal is to fit the categorical model and interpret the categorical parameters. We address the question: with measurement error and categorization, how can we do what epidemiologists want, namely to estimate the parameters of the categorical model that would have been estimated if the true predictor was observed? We develop a general methodology for such an analysis, and illustrate it in linear and logistic regression. Simulation studies are presented and the methodology is applied to a nutrition data set. Discussion of alternative approaches is also included.

MSC:

62C25 Compound decision problems in statistical decision theory
62J05 Linear regression; mixed models
62J12 Generalized linear models (logistic models)
62P10 Applications of statistics to biology and medical sciences; meta analysis

References:

[1] Arem, H., Reedy, J., Sampson, J., Jiao, L., Hollenbeck, A. R., Risch, H., Mayne, S. T., and Stolzenberg-Solomon, R. Z. (2013). The Healthy Eating Index 2005 and risk for pancreatic cancer in the NIH-AARP Study., Journal of the National Cancer Institute105, 1298-1305.
[2] Berry, S. M., Carroll, R. J., and Ruppert, D. (2002). Bayesian smoothing and regression splines for measurement error problems., Journal of the American Statistical Association97, 457, 160-169. · Zbl 1073.62524 · doi:10.1198/016214502753479301
[3] Buonaccorsi, J. P. (2010)., Measurement Error: Models, Methods and Applications. Chapman & Hall. · Zbl 1277.62014
[4] Carroll, R. J., Ruppert, D., Stefanski, L. A., and Crainiceanu, C. M. (2006)., Measurement Error in Nonlinear Models: A Modern Perspective, Second Edition. Chapman and Hall. · Zbl 1119.62063
[5] Chaix, B., Kestens, Y., Duncan, D. T., Brondeel, R., Méline, J., Aarbaoui, T. E., Pannier, B., and Merlo, J. (2016). A gps-based methodology to analyze environment-health associations at the trip level: case-crossover analyses of built environments and walking., American Journal of Epidemiology184, 8, 579-589.
[6] Cook, J. R. and Stefanski, L. (1994). Simulation-extrapolation estimation in parametric measurement error models., Journal of the American Statistical Association89, 1314-1328. · Zbl 0810.62028 · doi:10.1080/01621459.1994.10476871
[7] Cordy, C. B. and Thomas, D. R. (1997). Deconvolution of a distribution function., Journal of the American Statistical Association92, 1459-1465. · Zbl 0912.62030 · doi:10.1080/01621459.1997.10473667
[8] Devanarayan, V. and Stefanski, L. A. (2002). Empirical simulation extrapolation for measurement error models with replicate measurements., Statistics & Probability Letters59, 219-225. · Zbl 1092.62558 · doi:10.1016/S0167-7152(02)00098-6
[9] Eckert, R. S., Carroll, R. J., and Wang, N. (1997). Transformations to additivity in measurement error models., Biometrics53, 262-272. · Zbl 0874.62141 · doi:10.2307/2533112
[10] Evenson, K. R., Wen, F., and Herring, A. H. (2016). Associations of accelerometry-assessed and self-reported physical activity and sedentary behavior with all-cause and cardiovascular mortality among us adults., American Journal of Epidemiology184, 10, 621-632.
[11] Flegal, K. M., Keyl, P. M., and Nieto, F. J. (1991). Differential misclassification arising from nondifferential errors in exposure measurement., American Journal of Epidemiology134, 10, 1233-1246.
[12] Ganguli, B., Staudenmayer, J., and Wand, M. P. (2005). Additive models with predictors subject to measurement error., Australian & New Zealand Journal of Statistics47, 2, 193-202. · Zbl 1134.62344 · doi:10.1111/j.1467-842X.2005.00383.x
[13] Gustafson, P. (2004)., Measurement Error and Misclassication in Statistics and Epidemiology. Chapman and Hall/CRC. · Zbl 1039.62019
[14] Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. In, Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. Vol. 1, 221-233. · Zbl 0212.21504
[15] Kauermann, G. and Carroll, R. J. (2001). A note on the efficiency of sandwich covariance matrix estimation., Journal of the American Statistical Association96, 456, 1387-1396. · Zbl 1073.62539 · doi:10.1198/016214501753382309
[16] Lederer, W. and Küchenhoff, H. (2006). A short introduction to the simex and mcsimex., The Newsletter of the R Project Volume 6/4, October 2006, 26.
[17] Nakamura, T. (1990). Corrected score function for errors-in-variables models: methodology and application to generalized linear models., Biometrika77, 127-137. · Zbl 0691.62066 · doi:10.1093/biomet/77.1.127
[18] Pham, T. H., Ormerod, J. T., and Wand, M. P. (2013). Mean field variational Bayesian inference for nonparametric regression with measurement error., Computational Statistics & Data Analysis68, 375-387. · Zbl 1471.62160 · doi:10.1016/j.csda.2013.07.014
[19] Reedy, J., Wirfält, E., Flood, A., Mitrou, P. N., Krebs-Smith, S. M., Kipnis, V., Midthune, D., Leitzmann, M., Hollenbeck, A., Schatzkin, A., and others. (2010). Comparing 3 dietary pattern methods – cluster analysis, factor analysis, and index analysis – with colorectal cancer risk: the NIH-AARP Diet and Health Study., American Journal of Epidemiology171, 479-487.
[20] Reedy, J. R., Mitrou, P. N., Krebs-Smith, S. M., Wirfält, E., Flood, A. V., Kipnis, V., Leitzmann, M., Mouwand, T., Hollenbeck, A., Schatzkin, A., and Subar, A. F. (2008). Index-based dietary patterns and risk of colorectal cancer: the NIH-AARP Diet and Health Study., American Journal of Epidemiology168, 38-48.
[21] Sarkar, A., Mallick, B. K., Staudenmayer, J., Pati, D., and Carroll, R. J. (2014). Bayesian semiparametric density deconvolution in the presence of conditionally heteroscedastic measurement errors., Journal of Computational and Graphical Statistics23, 1101-1125.
[22] Stefanski, L. A. and Cook, J. R. (1995). Simulation-extrapolation: the measurement error jackknife., Journal of the American Statistical Association90, 1247-1256. · Zbl 0868.62062 · doi:10.1080/01621459.1995.10476629
[23] Subar, A. F., Thompson, F. E., Kipnis, V., Mithune, D., Hurwitz, P., McNutt, S., McIntosh, A., and Rosenfeld, S. (2001). Comparative validation of the Block, Willett, and National Cancer Institute food frequency questionnaires: The Eating at America’s Table Study., American Journal of Epidemiology154, 1089-1099.
[24] Trentham-Dietz, A., Newcomb, P. A., B, E. S., Longnecker, M. P., Baron, J., Greenberg, E. R., and Willett, W. C. (1997). Body size and risk of breast cancer., American Journal of Epidemiology145, 11, 1011-1019.
[25] Wang, Y., Wellenius, G. A., Hickson, D. A., Gjelsvik, A., Eaton, C. B., and Wyatt, S. B. (2016). Residential proximity to traffic-related pollution and atherosclerosis in 4 vascular beds among African-American adults: Results from the Jackson Heart Study., American Journal of Epidemiology184, 10, 732-743.
[26] White, H. (1982). Maximum likelihood estimation of misspecified models., Econometrica50, 1-25. · Zbl 0478.62088 · doi:10.2307/1912526
[27] Yi, G. Y. (2017)., Statistical Analysis with Measurement Error or Misclassification: Strategy, Method and Application. Springer. · Zbl 1377.62012
[28] Zeger, S. L., Liang, K.-Y., and Albert, P. S. (1988). Models for longitudinal data: a generalized estimating equation approach., Biometrics44, 1049-1060. · Zbl 0715.62136 · doi:10.2307/2531734
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.