×

Kernel-based tests for joint independence. (English) Zbl 1381.62105

Summary: We investigate the problem of testing whether \(d\) possibly multivariate random variables, which may or may not be continuous, are jointly (or mutually) independent. Our method builds on ideas of the two-variable Hilbert-Schmidt independence criterion but allows for an arbitrary number of variables. We embed the joint distribution and the product of the marginals in a reproducing kernel Hilbert space and define the \(d\)-variable Hilbert-Schmidt independence criterion dHSIC as the squared distance between the embeddings. In the population case, the value of dHSIC is 0 if and only if the \(d\) variables are jointly independent, as long as the kernel is characteristic. On the basis of an empirical estimate of dHSIC, we investigate three non-parametric hypothesis tests: a permutation test, a bootstrap analogue and a procedure based on a gamma approximation. We apply non-parametric independence testing to a problem in causal discovery and illustrate the new methods on simulated and real data sets.

MSC:

62H12 Estimation in multivariate analysis
62G10 Nonparametric hypothesis testing

References:

[1] Chwialkowski, K. P.; Sejdinovic, D.; Gretton, A., 3608-3616, (2014)
[2] Davison, A. C.; Hinkley, D. V., (1997) · doi:10.1017/CBO9780511802843
[3] Pfister, N., (2016)
[4] Romano, J. P., (1986)
[5] Bakirov, N. K.; Rizzo, M. L.; Székely, G. J., A multivariate nonparametric test of independence, J. Multiv. Anal., 97, 1742-1756, (2006) · Zbl 1099.62042 · doi:10.1016/j.jmva.2005.10.005
[6] Beran, R.; Millar, P. W., Stochastic estimation and testing, Ann. Statist., 15, 1131-1154, (1987) · Zbl 0644.62028 · doi:10.1214/aos/1176350497
[7] Bergsma, W.; Dassios, A., A consistent test of independence based on a sign covariance related to Kendall’s tau, Bernoulli, 20, 1006-1028, (2014) · Zbl 1400.62091 · doi:10.3150/13-BEJ514
[8] Bühlmann, P.; Peters, J.; Ernest, J., CAM: causal additive models, high-dimensional order search and penalized regression, Ann. Statist., 42, 2526-2556, (2014) · Zbl 1309.62063 · doi:10.1214/14-AOS1260
[9] Chen, A.; Bickel, P. J., Efficient independent component analysis, Ann. Statist., 34, 2825-2855, (2006) · Zbl 1114.62033 · doi:10.1214/009053606000000939
[10] Feuerverger, A., A consistent test for bivariate dependence, Int. Statist. Rev., 61, 419-433, (1993) · Zbl 0826.62032 · doi:10.2307/1403753
[11] Fukumizu, K.; Gretton, A.; Sun, X.; Schölkopf, B., 489-496, (2007)
[12] Gaißer, S.; Ruppert, M.; Schmid, F., A multivariate version of Hoeffding’s phi-square, J. Multiv. Anal., 101, 2571-2586, (2010) · Zbl 1198.62056 · doi:10.1016/j.jmva.2010.07.006
[13] Gretton, A.; Borgwardt, K.; Rasch, M.; Schölkopf, B.; Smola, A., A kernel two-sample test, J. Mach. Learn. Res., 13, 723-773, (2012) · Zbl 1283.62095
[14] Gretton, A.; Bousquet, O.; Smola, A.; Schölkopf, B., 63-77, (2005) · doi:10.1007/11564089_7
[15] Gretton, A.; Fukumizu, K.; Harchaoui, Z.; Sriperumbudur, B. K., 673-681, (2009)
[16] Gretton, A.; Fukumizu, K.; Teo, C. H.; Song, L.; Schölkopf, B.; Smola, A. J., 585-592, (2007)
[17] Gretton, A.; Sejdinovic, D.; Strathmann, H.; Balakrishnan, S.; Pontil, M.; Fukumizu, K.; Sriperumbudur, B. K., 1205-1213, (2012)
[18] Kankainen, A., (1995)
[19] Lehmann, E. L.; Romano, J. P., (2005)
[20] Leung, D.; Drton, M., (2016)
[21] Liu, H.; Han, F.; Yuan, M.; Lafferty, J.; Wasserman, L., High-dimensional semi-parametric gaussian copula graphical models, Ann. Statist., 40, 2293-2326, (2012) · Zbl 1297.62073 · doi:10.1214/12-AOS1037
[22] Matteson, D. S.; Tsay, R. S., Independent component analysis via distance covariance, J. Am. Statist. Ass., (2016)
[23] Sriperumbudur, B. K.; Gretton, A.; Fukumizu, K.; Lanckriet, G.; Schölkopf, B., (2008)
[24] Mooij, J. M.; Peters, J.; Janzing, D.; Zscheischler, J.; Schölkopf, B., Distinguishing cause from effect using observational data: methods and benchmarks, J. Mach. Learn. Res., 17, 1-102, (2016) · Zbl 1360.68700
[25] Nandy, P.; Weihs, L.; Drton, M., Large-sample theory for the Bergsma-Dassios sign covariance, Electron. J. Statist., 10, 2287-2311, (2016) · Zbl 1346.62094 · doi:10.1214/16-EJS1166
[26] Pearl, J., (2009) · doi:10.1017/CBO9780511803161
[27] Peters, J.; Bühlmann, P., Structural intervention distance (SID) for evaluating causal graphs, Neurl Computn, 27, 771-799, (2015) · Zbl 1414.05094 · doi:10.1162/NECO_a_00708
[28] Peters, J.; Janzing, D.; Schölkopf, B., Causal inference on discrete data using additive noise models, IEEE Trans. Pattn Anal. Mach. Intell., 33, 2436-2450, (2011) · doi:10.1109/TPAMI.2011.71
[29] Peters, J.; Mooij, J. M.; Janzing, D.; Schölkopf, B., Causal discovery with continuous additive noise models, J. Mach. Learn. Res., 15, 2009-2053, (2014) · Zbl 1318.68151
[30] Romano, J. P., A bootstrap revival of some nonparametric distance tests, J. Am. Statist. Ass., 83, 698-708, (1988) · Zbl 0658.62059 · doi:10.1080/01621459.1988.10478650
[31] Romano, J. P., Bootstrap and randomization tests of some nonparametric hypotheses, Ann. Statist., 17, 141-159, (1989) · Zbl 0688.62031 · doi:10.1214/aos/1176347007
[32] Satterthwaite, F. E., An approximate distribution of estimates of variance components, Biometr. Bull., 2, 110-114, (1946) · doi:10.2307/3002019
[33] Sejdinovic, D.; Gretton, A; Bergsma, W., 1124-1132, (2013)
[34] Serfling, R. J., (1980) · Zbl 0538.62002 · doi:10.1002/9780470316481
[35] Smola, A.; Gretton, A.; Song, L.; Schölkopf, B., 13-31, (2007) · Zbl 1142.68407 · doi:10.1007/978-3-540-75225-7_5
[36] Székely, G. J.; Rizzo, M. L., Brownian distance covariance, Ann. Appl. Statist., 3, 1236-1265, (2009) · Zbl 1196.62077 · doi:10.1214/09-AOAS312
[37] Székely, G. J.; Rizzo, M. L., Partial distance correlation with methods for dissimilarities, Ann. Statist., 42, 2382-2412, (2014) · Zbl 1309.62105 · doi:10.1214/14-AOS1255
[38] Unser, M.; Tafti, P. D., (2014) · Zbl 1329.60002 · doi:10.1017/CBO9781107415805
[39] Wegkamp, M.; Zhao, Y., Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas, Bernoulli, 22, 1184-1226, (2016) · Zbl 1388.62162 · doi:10.3150/14-BEJ690
[40] Wood, S. N.; Augustin, N. H., GAMs with integrated model selection using penalized regression splines and applications to environmental modelling, Ecol. Modllng, 157, 157-177, (2002) · doi:10.1016/S0304-3800(02)00193-X
[41] Xue, L.; Zou, H., Regularized rank-based estimation of high-dimensional non-paranormal graphical models, Ann. Statist., 40, 2541-2571, (2012) · Zbl 1373.62138 · doi:10.1214/12-AOS1041
[42] Zhang, Q.; Filippi, S.; Gretton, A.; Sejdinovic, D., Large-scale kernel methods for independence testing, Statist. Comput., 27, 1-18, (2017)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.