×

A distribution-free test of independence based on a modified mean variance index. (English) Zbl 07740724

Summary: H. Cui and W. Zhong [Comput. Stat. Data Anal. 139, 117–133 (2019; Zbl 1507.62039)] proposed a test based on the mean variance (MV) index to test independence between a categorical random variable \(Y\) with \(R\) categories and a continuous random variable \(X\). They ingeniously proved the asymptotic normality of the MV test statistic when \(R\) diverges to infinity, which brings many merits to the MV test, including making it more convenient for independence testing when \(R\) is large. This paper considers a new test called the integral Pearson chi-square (IPC) test, whose test statistic can be viewed as a modified MV test statistic. A central limit theorem of the martingale difference is used to show that the asymptotic null distribution of the standardized IPC test statistic when \(R\) is diverging is also a normal distribution, rendering the IPC test sharing many merits with the MV test. As an application of such a theoretical finding, the IPC test is extended to test independence between continuous random variables. The finite sample performance of the proposed test is assessed by Monte Carlo simulations, and a real data example is presented for illustration.

MSC:

62-XX Statistics

Citations:

Zbl 1507.62039

References:

[1] Csörgö, S., Testing for independence by the empirical characteristic function, Journal of Multivariate Analysis, 16, 3, 290-299 (1985) · Zbl 0585.62097 · doi:10.1016/0047-259X(85)90022-3
[2] Cui, H.; Li, R.; Zhong, W., Model-free feature screening for ultrahigh dimensional discriminant analysis, Journal of the American Statistical Association, 110, 510, 630-641 (2015) · Zbl 1373.62305 · doi:10.1080/01621459.2014.920256
[3] Cui, H., & Zhong, W. (2018). A distribution-free test of independence and its application to variable selection. Available at arXiv:1801.10559.
[4] Cui, H.; Zhong, W., A distribution-free test of independence based on mean variance index, Computational Statistics & Data Analysis, 139, 117-133 (2019) · Zbl 1507.62039 · doi:10.1016/j.csda.2019.05.004
[5] Dvoretzky, A.; Kiefer, J.; Wolfowitz, J., Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator, The Annals of Mathematical Statistics, 27, 3, 642-669 (1956) · Zbl 0073.14603 · doi:10.1214/aoms/1177728174
[6] Gretton, A., Bousquet, O., Smola, A., & Schölkopf, B. (2005). Measuring statistical dependence with hilbert-schmidt norms. In S. Jain, H. U. Simon, & E. Tomita (Eds.), Algorithmic learning theory (pp. 63-77). Springer Berlin Heidelberg. · Zbl 1168.62354
[7] Gretton, A., Fukumizu, K., Teo, C. H., Song, L., Schölkopf, B., & Smola, A. J. (2007). A kernel statistical test of independence. In Proceedings of the 20th International Conference on Neural Information Processing Systems (pp 585-592). Curran Associates Inc. NIPS’07.
[8] Hall, P.; Heyde, C. C., Martingale limit theory and its application (1980), Academic Press [Harcourt Brace Jovanovich, Publishers] · Zbl 0462.60045
[9] Hammer, S. M.; Katzenstein, D. A.; Hughes, M. D.; Gundacker, H.; Schooley, R. T.; Haubrich, R. H.; Henry, W. K.; Lederman, M. M.; Phair, J. P.; Niu, M.; Hirsch, M. S.; Merigan, T. C., A trial comparing nucleoside monotherapy with combination therapy in hiv-infected adults with cd4 cell counts from 200 to 500 per cubic millimeter, New England Journal of Medicine, 335, 15, 1081-1090 (1996) · doi:10.1056/NEJM199610103351501
[10] He, S.; Ma, S.; Xu, W., A modified mean-variance feature-screening procedure for ultrahigh-dimensional discriminant analysis, Computational Statistics & Data Analysis, 137, 155-169 (2019) · Zbl 1507.62074 · doi:10.1016/j.csda.2019.02.003
[11] Heller, R.; Heller, Y.; Gorfine, M., A consistent multivariate test of association based on ranks of distances, Biometrika, 100, 2, 503-510 (2012) · Zbl 1284.62332 · doi:10.1093/biomet/ass070
[12] Heller, R.; Heller, Y.; Kaufman, S.; Brill, B.; Gorfine, M., Consistent distribution-free k-sample and independence tests for univariate random variables, Journal of Machine Learning Research, 17, 29, 1-54 (2016) · Zbl 1360.62217
[13] Hoeffding, W., A non-parametric test of independence, The Annals of Mathematical Statistics, 19, 4, 546-557 (1948) · Zbl 0032.42001 · doi:10.1214/aoms/1177730150
[14] Jiang, B.; Ye, C.; Liu, J. S., Nonparametric k-sample tests via dynamic slicing, Journal of the American Statistical Association, 110, 510, 642-653 (2015) · Zbl 1373.62195 · doi:10.1080/01621459.2014.920257
[15] Lu, W.; Zhang, H. H.; Zeng, D., Variable selection for optimal treatment decision, Statistical Methods in Medical Research, 22, 5, 493-504 (2013) · doi:10.1177/0962280211428383
[16] Ma, W.; Xiao, J.; Yang, Y.; Ye, F., Model-free feature screening for ultrahigh dimensional data via a Pearson chi-square based index, Journal of Statistical Computation and Simulation, 92, 15, 3222-3248 (2022) · Zbl 07602439 · doi:10.1080/00949655.2022.2062358
[17] Mai, Q.; Zou, H., Sparse semiparametric discriminant analysis, Journal of Multivariate Analysis, 135, 175-188 (2015) · Zbl 1307.62166 · doi:10.1016/j.jmva.2014.12.009
[18] Mai, Q.; Zou, H., The fused Kolmogorov filter: A nonparametric model-free screening method, The Annals of Statistics, 43, 4, 1471-1497 (2015) · Zbl 1431.62216 · doi:10.1214/14-AOS1303
[19] Ni, L.; Fang, F., Entropy-based model-free feature screening for ultrahigh-dimensional multiclass classification, Journal of Nonparametric Statistics, 28, 3, 515-530 (2016) · Zbl 1349.62279 · doi:10.1080/10485252.2016.1167206
[20] Ni, L.; Fang, F.; Shao, J., Feature screening for ultrahigh dimensional categorical data with covariates missing at random, Computational Statistics & Data Analysis, 142, Article 106824 (2020) · Zbl 1507.62139 · doi:10.1016/j.csda.2019.106824
[21] Ni, L.; Fang, F.; Wan, F., Adjusted Pearson chi-square feature screening for multi-classification with ultrahigh dimensional data, Metrika, 80, 6-8, 805-828 (2017) · Zbl 1390.62113 · doi:10.1007/s00184-017-0629-9
[22] Pfister, N.; Bühlmann, P.; Schölkopf, B.; Peters, J., Kernel-based tests for joint independence, Journal of the Royal Statistical Society. Series B. Statistical Methodology, 80, 1, 5-31 (2018) · Zbl 1381.62105 · doi:10.1111/rssb.12235
[23] Rosenblatt, M., A quadratic measure of deviation of two-dimensional density estimates and a test of independence, The Annals of Statistics, 3, 1, 1-14 (1975) · Zbl 0325.62030 · doi:10.1214/aos/1176342996
[24] Scholz, F.-W.; Stephens, M. A., k-sample Anderson-Darling tests, Journal of the American Statistical Association, 82, 399, 918-924 (1987) · doi:10.2307/2288805
[25] Székely, G. J.; Rizzo, M. L., Brownian distance covariance, The Annals of Applied Statistics, 3, 4, 1236-1265 (2009) · Zbl 1196.62077 · doi:10.1214/09-AOAS312
[26] Székely, G. J.; Rizzo, M. L.; Bakirov, N. K., Measuring and testing dependence by correlation of distances, The Annals of Statistics, 35, 6, 2769-2794 (2007) · Zbl 1129.62059 · doi:10.1214/009053607000000505
[27] Tsiatis, A. A.; Davidian, M.; Zhang, M.; Lu, X., Covariate adjustment for two-sample treatment comparisons in randomized clinical trials: A principled yet flexible approach, Statistics in Medicine, 27, 23, 4658-4677 (2008) · doi:10.1002/sim.3113
[28] Xu, K.; Shen, Z.; Huang, X.; Cheng, Q., Projection correlation between scalar and vector variables and its use in feature screening with multi-response data, Journal of Statistical Computation and Simulation, 90, 11, 1923-1942 (2020) · Zbl 07480144 · doi:10.1080/00949655.2020.1753057
[29] Yan, X.; Tang, N.; Xie, J.; Ding, X.; Wang, Z., Fused mean-variance filter for feature screening, Computational Statistics & Data Analysis, 122, 18-32 (2018) · Zbl 1469.62169 · doi:10.1016/j.csda.2017.10.008
[30] Zhang, M.; Tsiatis, A. A.; Davidian, M., Improving efficiency of inferences in randomized clinical trials using auxiliary covariates, Biometrics, 64, 3, 707-715 (2008) · Zbl 1170.62082 · doi:10.1111/j.1541-0420.2007.00976.x
[31] Zhang, Y.; Chen, C.; Zhu, L., Sliced independence test, Statistica Sinica, 32, Special onlline issue, 2477-2496 (2022) · Zbl 07602350 · doi:10.5705/ss.202021.0203
[32] Zhong, W.; Wang, J.; Chen, X., Censored mean variance sure independence screening for ultrahigh dimensional survival data, Computational Statistics & Data Analysis, 159, Article 107206 (2021) · Zbl 1510.62089 · doi:10.1016/j.csda.2021.107206
[33] Zhou, N., Guo, X., & Zhu, L. (2020). A projection-based model checking for heterogeneous treatment effect. Available at arXiv:2009.10900.
[34] Zhou, Y.; Zhu, L., Model-free feature screening for ultrahigh dimensional datathrough a modified Blum-Kiefer-Rosenblatt correlation, Statistica Sinica, 28, 3, 1351-1370 (2018) · Zbl 1394.62075 · doi:10.5705/ss.202016.0264
[35] Zhu, L.; Xu, K.; Li, R.; Zhong, W., Projection correlation between two random vectors, Biometrika, 104, 4, 829-843 (2017) · Zbl 07072331 · doi:10.1093/biomet/asx043
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.