×

Conservative confidence intervals on multiple correlation coefficient for high-dimensional elliptical data using random projection methodology. (English) Zbl 07484710

Summary: So called multiple correlation coefficient (MCC) is a measure of linear relationship between a given variable and set of covariates. In the multiple correlation and regression analysis, it is common practice to construct a confidence interval for the population MCC. In high-dimensional data settings, by which the data dimension \(p\) is much larger than the sample size \(n\), due to the singularity of the sample covariance matrix, the classical confidence intervals for the MCC are no longer useable. For high-dimensional elliptical data, some (conservative) confidence intervals for the population MCC are presented using the random projection methodology. To evaluate and compare the performance of the proposed confidence intervals, some simulations are conducted in terms of the coverage probability and average interval length. Experimental validation of the proposed intervals is carried out on two real gene expression datasets.

MSC:

62-XX Statistics

References:

[1] Achlioptas, D., Database-friendly Random Projections, Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, New York, 2001, pp. 274-281.
[2] Algina, J.; Moulder, B. C., Sample sizes for confidence intervals on the increase in the squared multiple correlation coefficient, Educ. Psychol. Meas., 61, 633-649 (2001) · doi:10.1177/00131640121971400
[3] Anderson, T. W., An Introduction to Multivariate Statistical Analysis (2003), John Wiley & Sons: John Wiley & Sons, Hoboken, NJ · Zbl 1039.62044
[4] Ash, R. B.; Doleans-Dade, C., Probability and Measure Theory (2000), Academic Press: Academic Press, London · Zbl 0944.60004
[5] Croux, C.; Dehon, C., Estimators of the multiple correlation coefficient: Local robustness and confidence intervals, Stat. Pap., 44, 315-334 (2003) · Zbl 1052.62028 · doi:10.1007/s00362-003-0158-7
[6] Darlington, R., Regression and Linear Models (1990), McGraw-Hill Publishing Company: McGraw-Hill Publishing Company, Toronto
[7] Dasgupta, S.; Gupta, A., An elementary proof of a theorem of Johnson and Lindenstrauss, Random Struct. Algor., 22, 60-65 (2003) · Zbl 1018.51010 · doi:10.1002/rsa.10073
[8] Davison, A. C.; Hinkley, D. V., Bootstrap Methods and Their Application (1997), Cambridge University Press: Cambridge University Press, New York · Zbl 0886.62001
[9] Desmedt, C.; Piette, F.; Loi, S.; Wang, Y.; Lallemand, F.; Haibe-Kains, B.; Viale, G.; Delorenzi, M.; Zhang, Y.; d’Assignies, M. S.; Bergh, J., Strong time dependence of the 76-gene prognostic signature for node-negative breast cancer patients in the transbig multicenter independent validation series, Clin. Cancer Res., 13, 3207-3214 (2007) · doi:10.1158/1078-0432.CCR-06-2765
[10] Devlin, S. J.; Gnanadesikan, R.; Kettenring, J. R., Some multivariate applications of elliptical distributions, Essays Probab. Stat., 24, 365-393 (1976) · Zbl 0364.62015
[11] DLBCL Package, 2010; software available at http://bioconductor.org/packages/DLBCL.
[12] breastCancerTRANSBIG Package, 2011; software available at https://bioconductor.org/packages/breastCancerTRANSBIG.
[13] Harville, D. A., Matrix Algebra From a Statistician’s Perspective (1997), Springer: Springer, New York · Zbl 0881.15001
[14] Indyk, P. and Motwani, R., Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality, Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, New York, 1998, pp. 604-613. · Zbl 1029.68541
[15] Johnson, R. A.; Wichern, D. W., Applied Multivariate Statistical Analysis (2007), Pearson Education: Pearson Education, New Jersey · Zbl 1269.62044
[16] Kelley, K., Sample size planning for the squared multiple correlation coefficient: Accuracy in parameter estimation via narrow confidence intervals, Multivariate Behav. Res., 43, 524-555 (2008) · doi:10.1080/00273170802490632
[17] Lee, Y.-S., Tables of upper percentage points of the multiple correlation coefficient, Biometrika, 59, 175-189 (1972) · Zbl 0246.62103
[18] Mendoza, J. L.; Stafford, K. L., Confidence intervals, power calculation, and sample size estimation for the squared multiple correlation coefficient under the fixed and random regression models: A computer programme and useful standard tables, Educ. Psychol. Meas., 61, 650-667 (2001) · doi:10.1177/00131640121971419
[19] Muirhead, R. J., Aspects of Multivariate Statistical Theory (2005), John Wiley & Sons: John Wiley & Sons, Hoboken, NJ
[20] Ogasawara, H., Asymptotic expansion and conditional robustness for the sample multiple correlation coefficient under nonnormality, Commun. Stat. Simul. Comput., 35, 177-199 (2006) · Zbl 1086.62020 · doi:10.1080/03610910500416207
[21] Renaud, O.; Victoria-Feser, M.-P., A robust coefficient of determination for regression, J. Stat. Plan. Inference, 140, 1852-1862 (2010) · Zbl 1184.62119 · doi:10.1016/j.jspi.2010.01.008
[22] Rosenwald, A.; Wright, G.; Chan, W. C.; Connors, J. M.; Campo, E.; Fisher, R. I.; Gascoyne, R. D.; Muller-Hermelink, H. K.; Smeland, E. B.; Giltnane, J. M.; Hurt, E. M., The use of molecular profiling to predict survival after chemotherapy for diffuse large-b-cell lymphoma, N. Engl. J. Med., 346, 1937-1947 (2002) · doi:10.1056/NEJMoa012914
[23] Steiger, J. H.; Fouladi, R. T., R2: A computer program for interval estimation, power calculations, sample size estimation, and hypothesis testing in multiple regression, Behav. Res. Methods, 24, 581-582 (1992) · doi:10.3758/BF03203611
[24] Vempala, S.S., The Random Projection Method, DIMACS: Series in Discrete Mathematics and Theoretical Computer Science Vol. 65, American Mathematical Society, Providence, Rhode Island, 2005. · Zbl 1048.68131
[25] Withers, C. S.; Nadarajah, S., Confidence intervals for the correlation from a bivariate normal, J. Stat. Comput. Simul., 82, 1591-1606 (2012) · Zbl 1431.62251 · doi:10.1080/00949655.2011.585990
[26] Withers, C. S.; Nadarajah, S., Confidence intervals for the normal multiple correlation, Commun. Stat. Simul. Comput., 44, 2003-2022 (2015) · Zbl 1319.62117 · doi:10.1080/03610918.2013.835406
[27] Zheng, S.; Jiang, D.; Bai, Z.; He, X., Inference on multiple correlation coefficients with moderately high dimensional data, Biometrika, 101, 748-754 (2014) · Zbl 1336.62157 · doi:10.1093/biomet/asu023
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.