A SIMPLE AND FLEXIBLE TEST OF SAMPLE EXCHANGEABILITY WITH APPLICATIONS TO STATISTICAL GENOMICS
- PMID: 38784669
- PMCID: PMC11115382
- DOI: 10.1214/23-aoas1817
A SIMPLE AND FLEXIBLE TEST OF SAMPLE EXCHANGEABILITY WITH APPLICATIONS TO STATISTICAL GENOMICS
Abstract
In scientific studies involving analyses of multivariate data, basic but important questions often arise for the researcher: Is the sample exchangeable, meaning that the joint distribution of the sample is invariant to the ordering of the units? Are the features independent of one another, or perhaps the features can be grouped so that the groups are mutually independent? In statistical genomics, these considerations are fundamental to downstream tasks such as demographic inference and the construction of polygenic risk scores. We propose a non-parametric approach, which we call the V test, to address these two questions, namely, a test of sample exchangeability given dependency structure of features, and a test of feature independence given sample exchangeability. Our test is conceptually simple, yet fast and flexible. It controls the Type I error across realistic scenarios, and handles data of arbitrary dimensions by leveraging large-sample asymptotics. Through extensive simulations and a comparison against unsupervised tests of stratification based on random matrix theory, we find that our test compares favorably in various scenarios of interest. We apply the test to data from the 1000 Genomes Project, demonstrating how it can be employed to assess exchangeability of the genetic sample, or find optimal linkage disequilibrium (LD) splits for downstream analysis. For exchangeability assessment, we find that removing rare variants can substantially increase the -value of the test statistic. For optimal LD splitting, the V test reports different optimal splits than previous approaches not relying on hypothesis testing. Software for our methods is available in R (CRAN: flintyR) and Python (PyPI: flintyPy).
Keywords: LD splitting; exchangeability; feature independence; non-parametric test; population stratification.
Figures
Similar articles
-
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217. Cochrane Database Syst Rev. 2022. PMID: 36321557 Free PMC article.
-
Testing exchangeability of multivariate distributions.J Appl Stat. 2022 Jul 26;50(15):3142-3156. doi: 10.1080/02664763.2022.2102158. eCollection 2023. J Appl Stat. 2022. PMID: 37969545 Free PMC article.
-
mixIndependR: a R package for statistical independence testing of loci in database of multi-locus genotypes.BMC Bioinformatics. 2021 Jan 6;22(1):12. doi: 10.1186/s12859-020-03945-0. BMC Bioinformatics. 2021. PMID: 33407074 Free PMC article.
-
Authors' response: Occupation and SARS-CoV-2 infection risk among workers during the first pandemic wave in Germany: potential for bias.Scand J Work Environ Health. 2022 Sep 1;48(7):588-590. doi: 10.5271/sjweh.4061. Epub 2022 Sep 25. Scand J Work Environ Health. 2022. PMID: 36153787 Free PMC article. Review.
-
Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification.In: Kobeissy FH, editor. Brain Neurotrauma: Molecular, Neuropsychological, and Rehabilitation Aspects. Boca Raton (FL): CRC Press/Taylor & Francis; 2015. Chapter 25. In: Kobeissy FH, editor. Brain Neurotrauma: Molecular, Neuropsychological, and Rehabilitation Aspects. Boca Raton (FL): CRC Press/Taylor & Francis; 2015. Chapter 25. PMID: 26269925 Free Books & Documents. Review.
References
-
- ANGELOPOULOS AN and BATES S. (2023). Conformal prediction: A gentle introduction. Foundations and Trends® in Machine Learning 16 494–591.
-
- BAI Z. and SILVERSTEIN JW (2010). Spectral Analysis of Large Dimensional Random Matrices, 2 ed. Springer Series in Statistics. Springer.
-
- BALASUBRAMANIAN V, HO S-S and VOVK V. (2014). Conformal prediction for reliable machine learning: theory, adaptations and applications. Morgan Kaufmann.
-
- BARTELS R. (1982). The rank version of von Neumann’s ratio test for randomness. Journal of the American Statistical Association 77 40–46.
Grants and funding
LinkOut - more resources
Full Text Sources
Research Materials
Miscellaneous