Abstract
Compositional data arise in many areas of research in the natural and biomedical sciences. One prominent example is in the study of the human gut microbiome, where one can measure the relative abundance of many distinct microorganisms in a subject’s gut. Often, practitioners are interested in learning how the dependencies between microbes vary across distinct populations or experimental conditions. In statistical terms, the goal is to estimate a covariance matrix for the (latent) log-abundances of the microbes in each of the populations. However, the compositional nature of the data prevents the use of standard estimators for these covariance matrices. In this article, we propose an estimator of multiple covariance matrices which allows for information sharing across distinct populations of samples. Compared to some existing estimators, which estimate the covariance matrices of interest indirectly, our estimator is direct, ensures positive definiteness, and is the solution to a convex optimization problem. We compute our estimator using a proximal-proximal gradient descent algorithm. Asymptotic properties of our estimator reveal that it can perform well in high-dimensional settings. We show that our method provides more reliable estimates than competitors in an analysis of microbiome data from subjects with myalgic encephalomyelitis/chronic fatigue syndrome and through simulation studies.
Funding Statement
Aaron J. Molstad was supported in part by NSF DMS-2113589. Piotr M. Suder was supported in part by University Scholars Program at the University of Florida.
Acknowledgments
The authors thank the associate editor and two referees for their insightful comments and suggestions.
Citation
Aaron J. Molstad. Karl Oskar Ekvall. Piotr M. Suder. "Direct covariance matrix estimation with compositional data." Electron. J. Statist. 18 (1) 1702 - 1748, 2024. https://doi.org/10.1214/24-EJS2222
Information