×

Joint regression analysis of mixed-type outcome data via efficient scores. (English) Zbl 1469.62119

Summary: Joint analysis of multivariate outcomes composed of mixed data types (continuous, count, binary, survival, etc.) induces special complexity in model specification and analysis. When the scientific question of interest involves a joint effect of covariate(s) of interest on the set of outcome variables, specifying a full probability model may be infeasible, undesirably complex, or computationally intractable. A flexible method to estimate and conduct inference on such joint effects is presented which accounts for correlation among the outcomes without needing to explicitly specify their joint distribution. Simulation studies and an analysis of health care data illustrate the approach and its operating characteristics vis-à-vis other methods.

MSC:

62-08 Computational methods for problems pertaining to statistics
62H12 Estimation in multivariate analysis
62J12 Generalized linear models (logistic models)
62P10 Applications of statistics to biology and medical sciences; meta analysis

Software:

Rcpp; geepack; R; RcppEigen
Full Text: DOI

References:

[1] Agency for Healthcare Research and Quality (HCUP), National inpatient sample (NIS), (2006), US Department of Health and Human Services, URL www.hcup-us.ahrq.gov/databases.jsp
[2] Bates, D.; Eddelbuettel, D., Fast and elegant numerical linear algebra using the rcppeigen package, J. Stat. Soft., 52, 5, 1-24, (2013), URL http://www.jstatsoft.org/v52/i05/
[3] Bickel, P.; Klaassen, C.; Ritov, Y.; Wellner, J., Efficient and adaptive estimation for semiparametric models, (1998), Springer · Zbl 0894.62005
[4] Diao, G.; Hanlon, B.; Vidyahshankar, A., Multiple testing for high-dimensional data, Contemp. Math., 622, 95-108, (2014), URL http://dx.doi.org/10.1090/conm/622/12440 · Zbl 1320.62129
[5] Diao, G.; Ning, J.; Qin, J., Maximum likelihood estimation for semiparametric density ratio model, Int. J. Biostat., 8, 370-384, (2012)
[6] Diao, G.; Vidyashankar, A. N., Assessing genome-wide statistical significance for large p small n problems, Genetics, 194, 3, 781-783, (2013), URL http://www.genetics.org/content/194/3/781
[7] Diggle, P. J.; Heagerty, P.; Liang, K.-Y.; Zeger, S. L., Analysis of longitudinal data, (2002), Oxford University Press
[8] Eddelbuettel, D.; François, R., Rcpp: seamless R and C++ integration, J. Stat. Soft., 40, 8, 1-18, (2011), URL http://www.jstatsoft.org/v40/i08
[9] Fujikoshi, Y.; Ulyanov, V. V.; Shimizu, R., Multivariate statistics: high-dimensional and large-sample approximations, (2010), Springer · Zbl 1304.62016
[10] Genest, C.; Nešlehová, J., A primer on copulas for count data, Astin Bull., 37, 475-515, (2007) · Zbl 1274.62398
[11] Gueorguieva, R., Random effects models for joint analysis of repeatedly measured discrete and continuous outcomes, (de Leon, A. R.; Chough, K. C., Analysis of Mixed Data: Methods and Applications, (2013), Taylor and Francis Group), 109-124
[12] Højsgaard, S.; Halekoh, U.; Yan, J., The R package geepack for generalized estimating equations, J. Stat. Soft., 15, 1, 1-11, (2005), URL https://www.jstatsoft.org/index.php/jss/article/view/v015i02
[13] Huang, A., Joint estimation of the mean and error distribution in generalized linear models, J. Amer. Statist. Assoc., 109, 186-196, (2014) · Zbl 1367.62227
[14] Huang, A., On generalised estimating equations for vector regression, Aust. N. Z. J. Stat, 59, 2, 195-213, (2017) · Zbl 1381.62101
[15] Kauermann, G., Carroll, R.J., 2000. The sandwich variance estimator: Efficiency properties and coverage probability of confidence intervals. URL http://nbn-resolving.de/urn/resolver.pl?urn=nbn:de:bvb:19-epub-1579-4; Kauermann, G., Carroll, R.J., 2000. The sandwich variance estimator: Efficiency properties and coverage probability of confidence intervals. URL http://nbn-resolving.de/urn/resolver.pl?urn=nbn:de:bvb:19-epub-1579-4
[16] Koltchinskii, V., Local Rademacher complexities and oracle inequalities in risk minimization, Ann. Statist., 34, 6, 2593-2656, (2006) · Zbl 1118.62065
[17] Kuelbs, J.; Vidyashankar, A. N., Asymptotic inference for high-dimensional data, Ann. Statist., 38, 2, 836-869, (2010) · Zbl 1184.62094
[18] Kwak, M.; Zheng, G.; Wu, C. O., Joint tests for mixed traits in genetic association studies, (de Leon, A. R.; Chough, K. C., Analysis of Mixed Data: Methods and Applications, (2013), Taylor and Francis Group), 31-42, Ch. 3
[19] Ledoit, O.; Wolf, M., Nonlinear shrinkage estimation of large-dimensional covariance matrices, Ann. Statist., 40, 2, 1024-1060, (2012) · Zbl 1274.62371
[20] Liang, K.-Y.; Zeger, S. L., Longitudinal data analysis using generalized linear models, Biometrika, 73, 1, 13-22, (1986), URL http://www.jstor.org/stable/2336267 · Zbl 0595.62110
[21] Marchese, S.; Diao, G., Density ratio model for multivariate outcomes, J. Multivariate Anal., 154, 249-261, (2017), URL http://www.sciencedirect.com/science/article/pii/S0047259X16301622 · Zbl 1352.62122
[22] R Core Team, R: A language and environment for statistical computing, (2016), R Foundation for Statistical Computing Vienna, Austria, URL https://www.R-project.org/
[23] Rebai, A.; Goffinet, B.; Mangin, B., Approximate thresholds of interval mapping tests for QTL detection, Genetics, 138, 1, 235-240, (1994)
[24] Song, X.-K., Correlated data analysis: modeling, analysis, and applications, (2007), Springer Science &Business Media · Zbl 1132.62002
[25] Teixeira-Pinto, A.; Harezlak, J., Factorization and latent variable models for joint analysis of binary and continuous outcomes, (de Leon, A. R.; Chough, K. C., Analysis of Mixed Data: Methods and Applications, (2013), Taylor and Francis Group), 81-92, Ch. 6
[26] Teixeira-Pinto, A.; Normand, S.-L. T., Correlated bivariate continuous and binary outcomes: issues and applications, Stat. Med., 28, 13, 1753-1773, (2009)
[27] van der Vaart, A. W.; Wellner, J. A., Weak convergence and empirical processes, (1996), Springer Series in Statistics · Zbl 0862.60002
[28] Wang, W.-L.; Lin, T.-I.; Lachos, V. H., Extending multivariate-t linear mixed models for multiple longitudinal data with censored responses and heavy tails, Stat. Methods Med. Res., 1-20, (2015)
[29] Zou, F.; Fine, J. P.; Hu, J.; Lin, D., An efficient resampling method for assessing genome-wide statistical significance in mapping quantitative trait loci, Genetics, 168, 4, 2307-2316, (2004), URL http://www.genetics.org/content/168/4/2307
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.