×

Covariate-adjusted Spearman’s rank correlation with probability-scale residuals. (English) Zbl 1414.62456

Summary: It is desirable to adjust Spearman’s rank correlation for covariates, yet existing approaches have limitations. For example, the traditionally defined partial Spearman’s correlation does not have a sensible population parameter, and the conditional Spearman’s correlation defined with copulas cannot be easily generalized to discrete variables. We define population parameters for both partial and conditional Spearman’s correlation through concordance-discordance probabilities. The definitions are natural extensions of Spearman’s rank correlation in the presence of covariates and are general for any orderable random variables. We show that they can be neatly expressed using probability-scale residuals (PSRs). This connection allows us to derive simple estimators. Our partial estimator for Spearman’s correlation between \(X\) and \(Y\) adjusted for \(Z\) is the correlation of PSRs from models of \(X\) on \(Z\) and of \(Y\) on \(Z\), which is analogous to the partial Pearson’s correlation derived as the correlation of observed-minus-expected residuals. Our conditional estimator is the conditional correlation of PSRs. We describe estimation and inference, and highlight the use of semiparametric cumulative probability models, which allow preservation of the rank-based nature of Spearman’s correlation. We conduct simulations to evaluate the performance of our estimators and compare them with other popular measures of association, demonstrating their robustness and efficiency. We illustrate our method in two applications, a biomarker study and a large survey.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
62H20 Measures of association (correlation, canonical correlation, etc.)
62G35 Nonparametric robustness

References:

[1] Andrade, B. B., Singh, A., Narendran, G., Schechter, M. E., Nayak, K., Subramanian, S., et al. (2014). Mycobacterial antigen driven activation of \(\text{cd} 14^+ + \text{cd} 16^-\) monocytes is a predictor of tuberculosis‐associated immune reconstitution inflammatory syndrome. PLOS Pathogens10, e1004433.
[2] Bross, I. D. J. (1958). How to use ridit analysis. Biometrics14, 18-38.
[3] Genest, C. and Nešlehová, J. (2007). A primer on copulas for count data. Astin Bulletin37, 475-515. · Zbl 1274.62398
[4] Gijbels, I., Veraverbeke, N., and Omelka, M. (2011). Conditional copulas, association measures and their applications. Computational Statistics and Data Analysis55, 1919-1932. · Zbl 1328.62366
[5] Gripenberg, G. (1992). Confidence intervals for partial rank correlations. Journal of the American Statistical Association87, 546-551. · Zbl 0781.62088
[6] Hall, P., DiCiccio, T. J., and Romano, J. P. (1989). On smoothing and the bootstrap. The Annals of Statistics17, 692-704. · Zbl 0672.62051
[7] Harrell, F. E. (2015a). Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis, 2nd edition. Switzerland: Springer. · Zbl 1330.62001
[8] Harrell, F. E. (2015b). rms: Regression Modeling Strategies. R package version 4.2-1. · Zbl 1330.62001
[9] Kendall, M. G. (1942). Partial rank correlation. Biometrika32, 277-283. · Zbl 0063.03212
[10] Kendall, M. G. (1970). Rank Correlation Methods. London: Charles Griffin & Company. · Zbl 0199.53501
[11] Kim, S. (2012). ppcor: Partial and Semi‐partial (Part) correlation. R package version 1.0.
[12] Koethe, J. R., Aian, A., Shintani, A. K., Boger, M. S., Mitchell, V. J., Erdem, H., et al. (2012). Serum leptin level mediates the association of body composition and serum c‐reactive protein in HIV‐infected persons on antiretroviral therapy. AIDS Research and Human Retroviruses28, 83-91.
[13] Koethe, J. R., Grome, H., Jenkins, C. A., Kalamas, S. A., and Sterling, T. R. (2016). The metabolic and cardiovascular consequences of obesity in persons with HIV on long‐term antiretroviral therapy. AIDS30, 83-91.
[14] Korn, E. L. (1984). The ranges of limiting values of some partial correlations under conditional independence. The American Statistician38, 61-62.
[15] Kruskal, W. H. (1958). Ordinal measures of association. Journal of the American Statistical Association53, 814-861. · Zbl 0087.15403
[16] Li, C. and Shepherd, B. E. (2010). Test of association between two ordinal variables while adjusting for covariates. Journal of the American Statistical Association105, 612-620. · Zbl 1392.62168
[17] Li, C. and Shepherd, B. E. (2012). A new residual for ordinal outcomes. Biometrika99, 473-480. · Zbl 1239.62042
[18] Liu, Q., Shepherd, B. E., Li, C., and Harrell, F. E. (2017). Modeling continuous response variables using ordinal regression. Statistics in Medicine (in press).
[19] McCullagh, P. (1980). Regression models for ordinal data. Journal of the Royal Statistical Society, Series B42, 109-142. · Zbl 0483.62056
[20] Murphy, S. A., Rossini, A. J., and van der Vaart, A. W. (1997). Maximum likelihood estimation in the proportional odds model. Journal of the American Statistical Association92, 968-976. · Zbl 0887.62038
[21] Nelsen, R. B. (2006). An Introduction to Copulas, 2nd edition. New York: Springer Science & Business Media. · Zbl 1152.62030
[22] Nešlehová, J. (2007). On rank correlation measures for non‐continuous random variables. Journal of Multivariate Analysis98, 544-567. · Zbl 1107.62047
[23] Pearson, K. (1907). On Further Methods of Determining Correlation. London: Cambridge University Press. · JFM 38.0290.04
[24] Shepherd, B. E., Li, C., and Liu, Q. (2016). Probability‐scale residuals for continuous, discrete, and censored data. Canadian Journal of Statistics44, 463-479. · Zbl 1357.62180
[25] Stefanski, L. A. and Boos, D. D. (2002). The calculus of M‐estimation. The American Statistician56, 29-38.
[26] Walker, S. H. and Duncan, D. B. (1967). Estimation of the probability of an event as a function of several independent variables. Biometrika54, 167-179. · Zbl 0159.47604
[27] Wand, M. and Jones, M. (1995). Kernel Smoothing. London: Chapman & Hall. · Zbl 0854.62043
[28] Zeng, D. and Lin, D. (2007). Maximum likelihood estimation in semiparametric regression models with censored data, with Discussion. Journal of the Royal Statistical Society, Series B (Statistical Methodology)69, 507-564. · Zbl 07555364
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.