November 2024 Kernel two-sample tests for manifold data
Xiuyuan Cheng, Yao Xie
Author Affiliations +
Bernoulli 30(4): 2572-2597 (November 2024). DOI: 10.3150/23-BEJ1685

Abstract

We present a study of a kernel-based two-sample test statistic related to the Maximum Mean Discrepancy (MMD) in the manifold data setting, assuming that high-dimensional observations are close to a low-dimensional manifold. We characterize the test level and power in relation to the kernel bandwidth, the number of samples, and the intrinsic dimensionality of the manifold. Specifically, when data densities p and q are supported on a d-dimensional sub-manifold M embedded in an m-dimensional space and are Hölder with order β (up to 2) on M, we prove a guarantee of the test power for finite sample size n that exceeds a threshold depending on d, β, and Δ2 the squared L2-divergence between p and q on the manifold, and with a properly chosen kernel bandwidth γ. For small density departures, we show that with large n they can be detected by the kernel test when Δ2 is greater than n2β(d+4β) up to a certain constant and γ scales as n1(d+4β). The analysis extends to cases where the manifold has a boundary and the data samples contain high-dimensional additive noise. Our results indicate that the kernel two-sample test has no curse-of-dimensionality when the data lie on or near a low-dimensional manifold. We validate our theory and the properties of the kernel test for manifold data through a series of numerical experiments.

Funding Statement

The work was supported by NSF DMS-2134037. X.C. was also partially supported by NSF DMS-2237842 and DMS-2007040. Y.X. was also partially supported by an NSF CAREER CCF-1650913, NSF DMS-2134037, CMMI-2015787, CMMI-2112533, DMS-1938106, and DMS-1830210.

Acknowledgments

The authors would like to thank the anonymous referees and the Associate Editor for their constructive comments that improved the quality of this paper.

Citation

Download Citation

Xiuyuan Cheng. Yao Xie. "Kernel two-sample tests for manifold data." Bernoulli 30 (4) 2572 - 2597, November 2024. https://doi.org/10.3150/23-BEJ1685

Information

Received: 1 April 2023; Published: November 2024
First available in Project Euclid: 30 July 2024

Digital Object Identifier: 10.3150/23-BEJ1685

Keywords: kernel methods , manifold data , maximum mean discrepancy , two-sample test

Vol.30 • No. 4 • November 2024
Back to Top